簡易檢索 / 詳目顯示

研究生: 賴家澤
Lai, Chia-Tse
論文名稱: 基於類別級6D姿態估測之機器人夾取
Category-Level 6D Pose Estimation for Robot Grasping
指導教授: 許陳鑑
Hsu, Chen-Chien
口試委員: 許陳鑑
Hsu, Chen-Chien
呂成凱
Lu, Cheng-Kai
彭正偉
Peng, Cheng-Wei
口試日期: 2025/01/14
學位類別: 碩士
Master
系所名稱: 電機工程學系
Department of Electrical Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 59
中文關鍵詞: 6D物件姿態估測類別級姿態估測機器手臂夾取物件偵測
英文關鍵詞: 6D Object Pose Estimation, Category-Level Pose Estimation, Robot Grasping, Object Detection
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202500214
論文種類: 學術論文
相關次數: 點閱:7下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 6D物件姿態估測在機器手臂夾取任務中扮演關鍵角色。然而,過去基於深度學習的物件姿態估測方法大多是實例級 (Instance-Level),限制了其在真實場景中機器手臂夾取的普遍應用能力。在本篇論文,我們採用類別級 (Category-Level) 物件姿態估測,此種方法不僅能夠估測物件的姿態還能夠估測物件的大小。另外,該方法在訓練過程中不需要事先準備精確且完整的物件三維模型,並具備對未見過 (Unseen) 的物件進行姿態估測的能力。因此我們提出類別級SegFormer用於6D物件姿態估測 (Category-Level SegFormer for 6D Object Pose Estimation, also known as CLSF-6DPE) 之方法,本方法結合YOLOv8物件偵測模型和嵌入一個共同的分支 (Shared Head) 的SegFormer來預測正規化的物件座標空間圖 (Normalized Object Coordinate Space Map, also known as NOCS map),接著透過相似變換 (Similarity Transformation) 演算法來估測物件的6D姿態和大小。我們還整合CLSF-6DPE至機器人作業系統 (Robot Operating System),並且設計一個直觀的圖形化操作介面,讓使用者可以輕鬆地執行機器手臂夾取任務。實驗結果顯示,所提出的方法在姿態估測任務中優於傳統基於CNN的模型,展現出更高的準確性,證明其在實際應用中的可行性。

    6D object pose estimation is essential for robot grasping. However, past deep learning-based methods were mostly instance-level, limiting their applicability in real-world scenarios. In this thesis, we apply category-level 6D pose estimation, which estimates not only the 6D pose of objects but also their sizes. In addition, this method does not require exact 3D object models during training and can estimate unseen object poses. Therefore, we propose a Category-Level SegFormer for 6D Object Pose Estimation (CLSF-6DPE). This method integrates the YOLOv8 object detection model with SegFormer, including a shared head to predict the Normalized Object Coordinate Space (NOCS) map, followed by a similarity transformation algorithm to estimate the 6D pose and size of the object. We have also integrated CLSF-6DPE into the Robot Operating System (ROS) and designed an intuitive graphical user interface, enabling users to easily perform robot grasping tasks. Experimental results show that the proposed method outperforms traditional CNN-based models in pose estimation, achieving higher accuracy and demonstrating its feasibility in real-world applications.

    誌謝 i 摘要 ii ABSTRACT iii 目錄 v 表目錄 vii 圖目錄 viii 第一章 緒論 1 第二章 文獻探討 3 2.1 實例級物件姿態估測 3 2.2 類別級物件姿態估測 4 第三章 類別級SegFormer用於6D物件姿態估測 6 3.1 正規化的物件座標空間圖 6 3.2 CLSF-6DPE 20 3.2.1 基於SegFormer預測NOCS Maps 20 3.2.2 6D物件姿態估測 25 第四章 基於6D物件姿態估測之機器人夾取 27 4.1 抓取姿態 27 4.2 應用於機器手臂之夾取任務 29 第五章實驗與結果 36 5.1 資料集 36 5.2 實作細節 39 5.3 物件偵測模型 40 5.3.1 Mask rcnn 40 5.3.2 YOLOv8 40 5.3.3 PointRend 41 5.4 評估指標 43 5.5 結果比較 49 第六章 結論 55 參考文獻 56

    Y. Hu, J. Hugonot, P. Fua and M. Salzmann, "Segmentation-Driven 6D Object Pose Estimation," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 3380-3389.
    S. Peng, Y. Liu, Q. Huang, X. Zhou and H. Bao, "PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 4556-4565.
    H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song and L. J. Guibas, "Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 2637-2646.
    J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox and S. Birchfield, "Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects," in Proceedings of The 2nd Conference on Robot Learning, Zurich, Switzerland, 2018, pp. 306-316.
    K. He, G. Gkioxari, P. Dollár and R. Girshick, "Mask R-CNN," in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 2980-2988.
    J. Dai, K. He and J. Sun, "Instance-Aware Semantic Segmentation via Multi-task Network Cascades," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 3150-3158.
    P. O. Pinheiro, R. Collobert and P. Dollar, "Learning to segment object candidates," in Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'15), Montreal, Canada, 2015, pp. 1990-1998.
    Y. Xiang, T. Schmidt, V. Narayanan and D. Fox, "Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes," arXiv preprint arXiv:1711.00199, 2017.
    M. Tian, M. H. Ang and G. H. Lee, "Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation," in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 2020, pp. 530-546.
    S. Umeyama, "Least-squares estimation of transformation parameters between two point patterns," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp. 376-380, April 1991.
    Y. Di, R. Zhang, Z. Lou, F. Manhardt, X. Ji, N. Navab and F. Tombari, "GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting," in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 6771-6781.
    R. Zhang, Y. Di, F. Manhardt, F. Tombari and X. Ji, "SSP-Pose: Symmetry-Aware Shape Prior Deformation for Direct Category-Level Object Pose Estimation," in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022, pp. 7452-7459.
    A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, et al., "Shapenet: An information-rich 3d model repository," arXiv preprint arXiv:1512.03012, 2015.
    S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige and N. Navab, "Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes," in Computer Vision–ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Korea, 2013, pp. 548-562.
    Z. Li, G. Wang, X. Ji, "CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 7677-7686.
    E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez and P. Luo, "SegFormer: Simple and efficient design for semantic segmentation with transformers," Advances in Neural Information Processing Systems, pp. 12077-12090, 2021.
    C. Xie, Y. Xiang, A. Mousavian and D. Fox, "Unseen Object Instance Segmentation for Robotic Environments," IEEE Transactions on Robotics, vol. 37, no. 5, pp. 1343-1359, Oct. 2021.
    F. Manhardt, D. M. Arroyo, C. Rupprecht, B. Busam, T. Birdal, N. Navab and F. Tombari, "Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 6840-6849.
    T. Russ, "Robotic Manipulation," Course Notes for MIT 6.421, 2024. [Online]. Available: http://manipulation.mit.edu.
    L. Yang, Q. Cao, M. Lin, H. Zhang and Z. Ma, "Robotic hand-eye calibration with depth camera: A sphere model approach," in 2018 4th International Conference on Control, Automation and Robotics (ICCAR), Auckland, New Zealand, 2018, pp. 104-110.
    D. Pavllo, D. J. Tan, M. J. Rakotosaona and F. Tombari, "Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion," in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 4391-4401.
    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, et al., "Pytorch: An imperative style, high-performance deep learning library," Advances in neural information processing systems 32, 2019.
    A. Kirillov, Y. Wu, K. He and R. Girshick, "PointRend: Image Segmentation As Rendering," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 9796-9805.
    A. Gupta, P. Dollar and R. Girshick, "LVIS: A Dataset for Large Vocabulary Instance Segmentation," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 5351-5359.
    T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár and C. L. Zitnick, "Microsoft coco: Common objects in context," in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 2014, pp. 740-755.
    Y. You, R. Shi, W. Wang and C. Lu, "CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild," in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 6856-6865.
    A. Ahmadyan, L. Zhang, A. Ablavatski, J. Wei and M. Grundmann, "Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 7818-7827.
    Wikipedia contributors, "Rotation matrix," 11 10 2024. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Rotation_matrix&oldid=1250574232. [Accessed 28 10 2024].

    下載圖示
    QR CODE