M. Aubry, D. Maturana, A. Efros, B. Russell, and J. Sivic, Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.3762-3769, 2014.
DOI : 10.1109/CVPR.2014.487

URL : https://hal.archives-ouvertes.fr/hal-01057240

W. Chen, H. Wang, Y. Li, H. Su, Z. Wang et al., Synthesizing Training Images for Boosting Human 3D Pose Estimation, 2016 Fourth International Conference on 3D Vision (3DV), pp.479-488, 2016.
DOI : 10.1109/3DV.2016.58

URL : http://arxiv.org/pdf/1604.02703

A. Collet and S. Srinivasa, Efficient multi-view object recognition and full pose estimation, 2010 IEEE International Conference on Robotics and Automation, pp.2050-2055, 2010.
DOI : 10.1109/ROBOT.2010.5509615

URL : http://www.ri.cmu.edu/pub_files/2010/5/Collet2010.pdf

A. Collet, M. Martinez, and S. Srinivasa, The MOPED framework: Object recognition and pose estimation for manipulation, The International Journal of Robotics Research, vol.39, issue.10, pp.1284-1306, 2011.
DOI : 10.1016/S0262-8856(96)01112-2

URL : http://www.ri.cmu.edu/pub_files/2011/9/moped.pdf

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.886-893, 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas et al., FlowNet: Learning Optical Flow with Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), pp.2758-2766, 2015.
DOI : 10.1109/ICCV.2015.316

URL : http://arxiv.org/pdf/1504.06852

P. Felzenszwalb, R. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1627-1645, 2010.
DOI : 10.1109/TPAMI.2009.167

URL : http://people.cs.uchicago.edu/~pff/papers/lsvm-pami.pdf

C. Feng, Y. Xiao, A. Willette, W. Mcgee, and V. Kamat, Towards Autonomous Robotic In-Situ Assembly on Unstructured Construction Sites Using Monocular Vision, Proceedings of the 31st International Symposium on Automation and Robotics in Construction and Mining (ISARC), 2014.
DOI : 10.22260/ISARC2014/0022

URL : http://www.iaarc.org/publications/fulltext/isarc2014_submission_169.pdf

S. Fidler, S. Dickinson, and R. Urtasun, ) 3d object detection and viewpoint estimation with a deformable 3D cuboid model, Advances in Neural Information Processing Systems (NIPS), pp.611-619, 2012.

S. Garrido-jurado, R. Muoz-salinas, F. Madrid-cuevas, and M. Marn-jimnez, Automatic generation and detection of highly reliable fiducial markers under occlusion, Pattern Recognition, vol.47, issue.6, pp.2280-2292, 2014.
DOI : 10.1016/j.patcog.2014.01.005

S. Garrido-jurado, R. Muoz-salinas, F. Madrid-cuevas, and R. Medina-carnicer, Generation of fiducial marker dictionaries using Mixed Integer Linear Programming, Pattern Recognition, vol.51, pp.481-491, 2016.
DOI : 10.1016/j.patcog.2015.09.023

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.580-587, 2014.
DOI : 10.1109/CVPR.2014.81

D. Glasner, M. Galun, S. Alpert, R. Basri, and G. Shakhnarovich, Viewpoint-aware object detection and pose estimation, International Conference on Computer Vision (ICCV), IEEE, pp.1275-1282, 2011.

K. He, G. Gkioxari, P. Dollár, and R. Girshick, Mask R- CNN. arXiv preprint arXiv, p.170306870, 2017.

M. Hejrati and D. Ramanan, Analyzing 3D objects in cluttered images, Advances in Neural Information Processing Systems (NIPS), pp.593-601, 2012.

T. Hoda?, J. Matas, and . Obdr?álekobdr?álek?obdr?álek?, On evaluation of 6D object pose estimation, European Conference on Computer Vision Workshops (ECCVw), pp.606-619, 2016.

D. Huttenlocher and S. Ullman, Recognizing solid objects by alignment with an image, International Journal of Computer Vision, vol.35, issue.3, pp.195-212, 1990.
DOI : 10.1007/BF00054921

Y. Lecun, B. Boser, J. Denker, D. Henderson, R. Howard et al., Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, vol.1, issue.4, pp.541-551, 1989.
DOI : 10.1007/BF00133697

S. Levine, C. Finn, T. Darrell, and P. Abbeel, End-to-end training of deep visuomotor policies, Journal of Machine Learning Research (JMLR), vol.17, issue.39, pp.1-40, 2016.

S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, The International Journal of Robotics Research, vol.3361, issue.10, pp.4-5421, 2018.
DOI : 10.1109/ROBOT.1994.350995

D. Lowe, Three-dimensional object recognition from single two-dimensional images, Artificial Intelligence, vol.31, issue.3, pp.355-395, 1987.
DOI : 10.1016/0004-3702(87)90070-1

D. Lowe, Object recognition from local scaleinvariant features In: Computer vision, 1999. The proceedings of the seventh IEEE, pp.1150-1157, 1999.

F. Massa, R. Marlet, and A. M. , Crafting a multi-task CNN for viewpoint estimation, Procedings of the British Machine Vision Conference 2016, 2016.
DOI : 10.5244/C.30.91

URL : https://hal.archives-ouvertes.fr/hal-01743267

F. Massa, B. Russell, and A. M. , Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6024-6033, 2016.
DOI : 10.1109/CVPR.2016.648

URL : https://hal.archives-ouvertes.fr/hal-01801049

J. Mundy, Object recognition in the geometric era: A retrospective In: Toward category-level object recognition, pp.3-28, 2006.

X. Peng and K. Saenko, Synthetic to real adaptation with deep generative correlation alignment networks. arXiv preprint arXiv, p.170105524, 2017.

X. Peng, B. Sun, K. Ali, and K. Saenko, Learning Deep Object Detectors from 3D Models, 2015 IEEE International Conference on Computer Vision (ICCV), pp.1278-1286, 2015.
DOI : 10.1109/ICCV.2015.151

B. Pepik, M. Stark, P. Gehler, and B. Schiele, Teaching 3D geometry to deformable part models, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.3362-3369, 2012.
DOI : 10.1109/CVPR.2012.6248075

B. Pepik, R. Benenson, T. Ritschel, and B. Schiele, What Is Holding Back Convnets for Detection?, 37th German Conference on Pattern Recognition (GCPR), pp.517-528, 2015.
DOI : 10.5244/C.26.80

L. Pinto and A. Gupta, Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours, 2016 IEEE International Conference on Robotics and Automation (ICRA), pp.3406-3413, 2016.
DOI : 10.1109/ICRA.2016.7487517

S. Richter, V. Vineet, S. Roth, and V. Koltun, Playing for Data: Ground Truth from Computer Games, European Conference on Computer Vision (ECCV), pp.102-118, 2016.
DOI : 10.1109/TITS.2014.2310138

L. Roberts, Machine perception of threedimensional solids, Massachusetts Institute of Technology (MIT), 1963.

G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. Lopez, The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3234-3243, 2016.
DOI : 10.1109/CVPR.2016.352

F. Sadeghi and S. Levine, CAD)2RL: Real single-image flight without a single real image, Robotics: Science and Systems (RSS) Conference, 2018.

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, Trust region policy optimization, 32nd International Conference on Machine Learning (ICML), pp.1889-1897, 2015.

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus et al., Overfeat: Integrated recognition, localization and detection using convolutional networks, International Conference on Learning Representations (ICLR) Shafaei A, Little JJ, Schmidt M (2016) Play and learn: Using video games to train computer vision models. In: 27th British Machine Vision Conference (BMVC), 2014.

H. Su, C. Qi, Y. Li, and L. Guibas, Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views, 2015 IEEE International Conference on Computer Vision (ICCV), pp.2686-2694, 2015.
DOI : 10.1109/ICCV.2015.308

B. Sun and K. Saenko, From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains, Proceedings of the British Machine Vision Conference 2014, 2014.
DOI : 10.5244/C.28.82

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba et al., Domain randomization for transferring deep neural networks from simulation to the real world Viewpoints and keypoints, 30th International Conference on Intelligent RObots and Systems (IROS) International Conference on Computer Vision and Pattern Recognition (CVPR), pp.1510-1519, 2015.

D. Vazquez, A. Lopez, J. Marin, D. Ponsa, and D. Geronimo, Virtual and Real World Adaptation for Pedestrian Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.4, pp.797-809, 2014.
DOI : 10.1109/TPAMI.2013.163

J. Wu, T. Xue, J. Lim, Y. Tian, J. Tenenbaum et al., Single Image 3D Interpreter Network, European Conference on Computer Vision (ECCV), pp.365-382, 2016.
DOI : 10.1109/TPAMI.2013.87

J. Xiao, B. Russell, and A. Torralba, Localizing 3D cuboids in single-view images, Advances in Neural Information Processing Systems (NIPS), pp.746-754, 2012.