Skip to Main content Skip to Navigation

2D and 3D Geometric Attributes Estimation in Images via deep learning

Abstract : The visual perception of 2D and 3D geometric attributes (e.g. translation, rotation, spatial size and etc.) is important in robotic applications. It helps robotic system build knowledge about its surrounding environment and can serve as the input for down-stream tasks such as motion planning and physical intersection with objects.The main goal of this thesis is to automatically detect positions and poses of interested objects for robotic manipulation tasks. In particular, we are interested in the low-level task of estimating occlusion relationship to discriminate different objects and the high-level tasks of object visual tracking and object pose estimation.The first focus is to track the object of interest with correct locations and sizes in a given video. We first study systematically the tracking framework based on discriminative correlation filter (DCF) and propose to leverage semantics information in two tracking stages: the visual feature encoding stage and the target localization stage. Our experiments demonstrate that the involvement of semantics improves the performance of both localization and size estimation in our DCF-based tracking framework. We also make an analysis for failure cases.The second focus is using object shape information to improve the performance of object 6D pose estimation and do object pose refinement. We propose to estimate the 2D projections of object 3D surface points with deep models to recover object 6D poses. Our results show that the proposed method benefits from the large number of 3D-to-2D point correspondences and achieves better performance. As a second part, we study the constraints of existing object pose refinement methods and develop a pose refinement method for objects in the wild. Our experiments demonstrate that our models trained on either real data or generated synthetic data can refine pose estimates for objects in the wild, even though these objects are not seen during training.The third focus is studying geometric occlusion in single images to better discriminate objects in the scene. We first formalize geometric occlusion definition and propose a method to automatically generate high-quality occlusion annotations. Then we propose a new occlusion relationship formulation (i.e. abbnom) and the corresponding inference method. Experiments on occlusion reasoning benchmarks demonstrate the superiority of the proposed formulation and method. To recover accurate depth discontinuities, we also propose a depth map refinement method and a single-stage monocular depth estimation method.All the methods that we propose leverage on the versatility and power of deep learning. This should facilitate their integration in the visual perception module of modern robotic systems.Besides the above methodological advances, we also made available software (for occlusion and pose estimation) and datasets (of high-quality occlusion information) as a contribution to the scientific community
Document type :
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Tuesday, May 11, 2021 - 5:20:10 PM
Last modification on : Thursday, May 13, 2021 - 2:03:57 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03224484, version 1



Xuchong Qiu. 2D and 3D Geometric Attributes Estimation in Images via deep learning. Signal and Image Processing. École des Ponts ParisTech, 2021. English. ⟨NNT : 2021ENPC0005⟩. ⟨tel-03224484⟩



Record views


Files downloads