研究生: |
賴彥廷 Lai, Yen-Ting |
---|---|
論文名稱: |
具有自動點雲預處理的即時點雲動作辨識系統 A Real-Time Point Cloud Action Recognition System with Automated Point Cloud Preprocessing |
指導教授: |
林政宏
Lin, Cheng-Hung |
口試委員: |
林政宏
Lin, Cheng-Hung 劉一宇 Liu, Yi-Yu 賴穎暉 Lai, Ying-Hui |
口試日期: | 2024/07/22 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 39 |
中文關鍵詞: | 動作辨識 、點雲 、動態點雲 |
英文關鍵詞: | action recognition, point cloud, dynamic point cloud |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202401352 |
論文種類: | 學術論文 |
相關次數: | 點閱:164 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文討論了點雲動作辨識系統的自動化預處理。 點雲動作辨識的優點是受到光照和視角變化的影響較小,因為它關注的是物體的三維位置而不是單純像素值。即使在複雜和黑暗的環境中,也能實現強大的識別性能。此外,點雲動作辨識在機器人、虛擬實境、自動駕駛、人機互動、遊戲開發等領域也有廣泛的應用。例如,理解人類行為對於機器人技術中更好的互動和協作至關重要,而在虛擬實境中,它可以捕捉和再現用戶動作以增強真實感和互動性。為了建立運行穩定的點雲動作識別系統,通常需要過濾掉背景和不相關的點,從而獲得乾淨且對齊的點雲數據。在過去的多數方法中,點雲過濾和動作識別通常是分開執行的,很少有系統一起運行。在本文中,我們提出了一種方法,使用戶能夠直接從 Microsoft Azure Kinect DK 取得點雲資料並執行全面的自動化預處理。這將能產生沒有背景點的更乾淨的點雲數據,適合用於動作辨識。 我們的方法利用 PSTNet 進行點雲動作識別,並在透過自動預處理獲得的資料集(包括 12 個動作類別)上訓練模型。最後,我們開發了一種結合自動點雲預處理的即時點雲動作辨識系統。
This thesis discusses automated preprocessing of point cloud action recognition systems. In order to establish a point cloud action recognition system that operates stably, it is usually necessary to filter out background and irrelevant points to obtain clean and aligned point cloud data. In most methods in the past, point cloud filtering and action recognition were usually performed separately, and few systems ran the two parts together. In this paper, we propose a method that enables users to obtain point cloud data directly from Microsoft Azure Kinect DK and perform comprehensive automated point cloud preprocessing. This will produce cleaner point cloud data without background points, suitable for motion recognition. Our method utilizes PSTNet for point cloud action recognition and trains the model on a dataset obtained through automatic preprocessing, including 12 action categories. Finally, we develop a real-time point cloud action recognition system combined with automatic point cloud preprocessing.
T. Raj et al., "A survey on LiDAR scanning mechanisms," Electronics, vol. 9, no. 5, pp. 741, 2020.
R. Horaud, et al., "An overview of depth cameras and range scanners based on time-of-flight technologies," Machine Vision and Applications, vol. 27, no. 7, pp. 1005-1020, 2016.
A. Geiger, et al., "Vision meets robotics: The KITTI dataset," The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231-1237, 2013.
P. Sun, et al., "Scalability in perception for autonomous driving: Waymo open dataset," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446-2454, 2020.
X. Huang, et al., "The apolloscape dataset for autonomous driving," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 954-960, 2018.
C. R. Qi, et al., "PointNet: Deep learning on point sets for 3D classification and segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652-660, 2017.
C. R. Qi, et al., "PointNet++: Deep hierarchical feature learning on point sets in a metric space," Advances in Neural Information Processing Systems, vol. 30, 2017.
Y. Li, et al., "PointCNN: Convolution on x-transformed points," Advances in Neural Information Processing Systems, vol. 31, 2018.
Y. Wang, et al., "Dynamic graph CNN for learning on point clouds," ACM Transactions on Graphics (TOG), vol. 38, no. 5, p. 1-12, 2019.
X. Liu, et al., "MeteorNet: Deep learning on dynamic 3D point cloud sequences," in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9246-9255, 2019.
H. Fan, et al., "PSTNet: Point spatio-temporal convolution on point cloud sequences," arXiv preprint arXiv:2205.13713, 2022.
Y.-W. Chang, et al., "Training and testing low-degree polynomial data mappings via linear SVM," Journal of Machine Learning Research, vol. 11, no. 4, 2010.
K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," Advances in Neural Information Processing Systems, vol. 27, 2014.
D. Tran, et al., "Learning spatiotemporal features with 3D convolutional networks," in Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497, 2015.
J. Carreira and A. Zisserman, "Quo vadis, action recognition? A new model and the Kinetics dataset," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299-6308, 2017.
X. Liu, et al., "FlowNet3D: Learning scene flow in 3D point clouds," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 529-537, 2019.
I. Gorordo Fernandez, "pyKinectAzure," GitHub repository, Available from: https://github.com/ibaiGorordo/pyKinectAzure.
Q.-Y. Zhou, et al., "Open3D: A modern library for 3D data processing," arXiv preprint arXiv:1801.09847, 2018.
J. Redmon, et al., "You Only Look Once: Unified, real-time object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016.
G. Jocher, "YOLOv5: Open source implementation of YOLOv5 in PyTorch," GitHub repository, Available from: https://github.com/ultralytics/yolov5.
T. N. Kipf and M. Welling, "Semi-supervised classification with graph convolutional networks," arXiv preprint arXiv:1609.02907, 2016.
L. Shi, et al., "Two-stream adaptive graph convolutional networks for skeleton-based action recognition," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026-12035, 2019.
P. J. M. Ali, et al., "Data normalization and standardization: A technical report," Machine Learning Technical Report, vol. 1, no. 1, pp. 1-6, 2014.
J. B. MacQueen, "Some Methods for classification and Analysis of Multivariate Observations," Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, vol. 1, pp. 281-297, 1967.
C. Feichtenhofer, A. Pinz, and A. Zisserman, "Convolutional two-stream network fusion for video action recognition," in IEEE International Conference on Computer Vision and Pattern Recognition CVPR, 2016.
S. Ji, W. Xu, M. Yang, and K. Yu, "3d convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, 2013.
G. W. Taylor, R. Fergus, Y. LeCun, and C. Bregler, "Convolutional learning of spatio-temporal features," in European conference on computer vision, Springer, pp. 140–153, 2010.
W. Luo, B. Yang, and R. Urtasun, "Fast and furious: Real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 3569-3577, 2018.
C. Choy, J. Gwak, and S. Savarese, "4d spatio-temporal convnets: Minkowski convolutional neural networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3075-3084, 2019.
H. Fan and Y. Yang, "PointRNN: Point recurrent neural network for moving point cloud processing," arXiv preprint arXiv:1910.08287, 2019.
H. Fan, Y. Yang, and M. Kankanhalli, "Point 4d transformer networks for spatio-temporal modeling in point cloud videos," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14204-14213, 2021.
Y. Wang, et al., "3DV: 3D dynamic voxel for action recognition in depth video," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 511-520, 2020.
R. Achanta, et al., "SLIC superpixels compared to state-of-the-art superpixel methods," IEEE transactions on pattern analysis and machine intelligence, 2012, vol. 34, no. 11, pp. 2274-2282.
A. Kirillov, et al., "Segment anything," in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015-4026, 2023.
P. Skalski, "How to Use the Segment Anything Model (SAM)," Roboflow Blog, January 22, 2024, Available from: https://blog.roboflow.com/how-to-use-segment-anything-model-sam/.
T.-Y. Lin, et al., "Microsoft coco: Common objects in context," in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer International Publishing, pp. 740-755, 2014.
A. Kirillov, et al., "Panoptic segmentation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9404-9413, 2019..
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," in Proceedings of the Neural Information Processing Systems (NeurIPS), 2017.
Y. Zhou and O. Tuzel, "VoxelNet: End-to-end learning for point cloud based 3D object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490-4499, 2018.
C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464-7475, 2023.
GeeksforGeeks, “Example of camera calibration using OpenCV,” GeeksforGeeks, https://www.geeksforgeeks.org/calibratecamera-opencv-in-python/, Last Updated Jun 08, 2023.