Basic Search / Detailed Display

Author: 羅郁鈞
Lo, Yu-Chun
Thesis Title: 基於非對稱U-Net實現微小且快速移動之物體檢測網路
TinySeeker: A Network for Seeking Tiny and Fast Moving Object Based on Asymmetric U-Net
Advisor: 林政宏
Ling, Cheng-Hung
Committee: 林政宏
Ling, Cheng-Hung
賴穎暉
Lai, Ying-Hui
劉一宇
Liu, Yi-Yu
Approval Date: 2024/07/22
Degree: 碩士
Master
Department: 電機工程學系
Department of Electrical Engineering
Thesis Publication Year: 2024
Academic Year: 112
Language: 中文
Number of pages: 46
Keywords (in Chinese): 物件偵測U-Net高效率架構熱力圖預測
Keywords (in English): Object Detection, U-Net, High Efficient Structure, Heatmap prediction
Research Methods: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202401422
Thesis Type: Academic thesis/ dissertation
Reference times: Clicks: 83Downloads: 1
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 本論文旨在探討物件偵測在微小、快速且特徵不明顯的物體上的應用。為了改進比賽戰術並提升技能,專業運動員和業餘玩家經常使用手機或相機記錄他們的練習和比賽。隨著這一領域的興起,越來越多的研究人員開始結合深度學習模型與運動分析,以提供更全面的見解。物件偵測是其中的關鍵任務,因為識別物體的位置可以提供有價值的資訊,如戰略分析。然而,針對如羽毛球這樣快速移動且模糊的物體進行追蹤的研究仍然有限。TrackNetv2方法基於VGG-16和U-Net,通過熱力圖檢測羽毛球的位置,但其架構需要大量計算資源,難以在實際應用中保持高效。為了解決這個問題,我們提出了一種名為TinySeeker的非對稱架構,這種新穎的架構不僅能精確的檢測羽毛球的位置,還能提高計算效率,在檢測精度和計算需求之間達到了最佳平衡,使其在現實應用中既實用又高效。實驗結果表明,Tinyseeker可以在保持精度的同時減少多達26%的計算量。這種架構在該領域標誌著一項重大進展,推動了物體檢測任務的可能性,並為未來的類似研究設立了新的基準。

    To refine strategies and augment skills, both professional athletes and amateur players routinely utilize cameras to document their practice sessions and games. As a result, an increasing number of researchers are exploring this field, aiming to offer comprehensive insights. Object detection is a pivotal task within this field, as identifying object locations can provide valuable insights, such as strategic analysis. However, only a limited number of studies have specifically focused on tracking fast-moving and indistinct objects such as a badminton shuttlecock. The preceding method, TrackNetv2, proposed the use of VGG-16 and U-Net, a heatmap-based approach, for badminton detection. However, the architecture of U-Net demands substantial computational resources in this paper. To tackle this issue, we present a pioneering asymmetric architecture named Tinyseeker inspired by U-Net. This novel model not only assures precise detection of the badminton shuttlecock's location, but it also champions computational efficiency. The reimagined structure strikes an optimal balance between detection accuracy and computational demands, making it a practical and effective solution for real-world applications. Experimental results show that Tinyseeker can reduce calculation up to 26% while remaining the precision. This architecture marks a significant advancement in the field, pushing the boundaries of what is possible within object detection tasks and setting a new benchmark for similar studies in the future.

    誌謝 i 中文摘要 ii 英文摘要 iii 目錄 v 表目錄 vii 圖目錄 viii 符號說明 ix 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 4 1.3 研究方法概述 4 1.4 研究貢獻 5 1.5 論文架構 5 第二章 文獻探討 7 2.1 圖像分類(Image Classification)與物件偵測(Object Detection) 7 2.2 物體軌跡圖(Trajectory Pattern) 9 2.3 TrackNet 10 2.4 U-Net 13 2.5 熱力圖(Heatmap) 14 2.6 Intersection over Union (IoU) 15 2.7 歐氏距離 18 第三章 研究方法 19 3.1 TinySeeker 19 3.1.1 輸入特徵 20 3.1.2 非對稱架構 21 3.1.3 熱力圖的輸出 28 第四章 實驗結果 32 4.1 實驗設置 32 4.1.1 資料集 32 4.1.2 評估方式 35 4.1.3 實驗參數 35 4.2 實驗結果 36 4.2.1 U-Net與TinySeeker 37 4.2.2 不同種類的非對稱架構 39 4.2.2 範例影片 40 第五章 結論與未來展望 41 5.1 結論 41 5.2 未來展望 41 參考文獻 42 自傳 46 學術成就 46

    HUANG, Yu-Chuan, et al. Tracknet: A deep learning network for tracking high-speed and tiny objects in sports applications. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2019. p. 1-8.
    Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer International Publishing, 2015.
    SUN, Nien-En, et al. Tracknetv2: Efficient shuttlecock tracking network. In: 2020 International Conference on Pervasive Artificial Intelligence (ICPAI). IEEE, 2020. p. 86-91.
    SIMONYAN, Karen; ZISSERMAN, Andrew. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
    HE, Kaiming, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770-778.
    DENG, Jia, et al. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009. p. 248-255.
    RUSSAKOVSKY, Olga, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 2015, 115: 211-252.
    REDMON, Joseph, et al. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 779-788.
    REDMON, Joseph; FARHADI, Ali. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
    BOCHKOVSKIY, Alexey; WANG, Chien-Yao; LIAO, Hong-Yuan Mark. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
    LI, Chuyi, et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976, 2022.
    WANG, Chien-Yao; BOCHKOVSKIY, Alexey; LIAO, Hong-Yuan Mark. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023. p. 7464-7475.
    GIRSHICK, Ross, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. p. 580-587.
    GIRSHICK, Ross. Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2015. p. 1440-1448.
    REN, Shaoqing, et al. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 2015, 28.
    ARCHANA, Maruthavanan; GEETHA, M. Kalaisevi. Object detection and tracking based on trajectory in broadcast tennis video. Procedia Computer Science, 2015, 58: 225-232.
    YU, Xinguo, et al. A trajectory-based ball detection and tracking algorithm in broadcast tennis video. In: 2004 International Conference on Image Processing, 2004. ICIP'04. IEEE, 2004. p. 1049-1052.
    RENÒ, Vito, et al. Real-time tracking of a tennis ball by combining 3d data and domain knowledge. In: 2016 1st International Conference on Technology and Innovation in Sports, Health and Wellbeing (TISHW). IEEE, 2016. p. 1-7.
    YAN, Fei; CHRISTMAS, W.; KITTLER, Josef. A tennis ball tracking algorithm for automatic annotation of tennis match. In: British machine vision conference. 2005. p. 619-628.
    ZHOU, Xiangzeng, et al. Tennis ball tracking using a two-layered data association approach. IEEE Transactions on Multimedia, 2014, 17.2: 145-156.
    LONG, Jonathan; SHELHAMER, Evan; DARRELL, Trevor. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 3431-3440.
    GRAVES, Alex. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
    ZHANG, Liang, et al. Attention in convolutional LSTM for gesture recognition. Advances in neural information processing systems, 2018, 31.
    ZHOU, Bolei, et al. Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 2921-2929..
    ZEILER, Matthew D.; FERGUS, Rob. Visualizing and understanding convolutional networks. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13. Springer International Publishing, 2014. p. 818-833.
    SELVARAJU, Ramprasaath R., et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 618-626.
    XU, Kelvin, et al. Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning. PMLR, 2015. p. 2048-2057.
    DEVLIN, Jacob, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    HE, Kaiming, et al. Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 2961-2969.
    O PINHEIRO, Pedro O.; COLLOBERT, Ronan; DOLLÁR, Piotr. Learning to segment object candidates. Advances in neural information processing systems, 2015, 28.
    WANG, Xinlong, et al. Solo: Segmenting objects by locations. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16. Springer International Publishing, 2020. p. 649-665.
    STEINBACH, Michael; KARYPIS, George; KUMAR, Vipin. A comparison of document clustering techniques. 2000.
    RODRIGUEZ, Alex; LAIO, Alessandro. Clustering by fast search and find of density peaks. science, 2014, 344.6191: 1492-1496.
    LECUN, Yann, et al. Handwritten digit recognition with a back-propagation network. Advances in neural information processing systems, 1989, 2.
    LIU, Ting, et al. An investigation of practical approximate nearest neighbor algorithms. Advances in neural information processing systems, 2004, 17.
    SHLENS, Jonathon. A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100, 2014.
    DATTA, Ritendra, et al. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys (Csur), 2008, 40.2: 1-60.
    LINDEBERG, Tony. Scale invariant feature transform. 2012.
    SCHROFF, Florian; KALENICHENKO, Dmitry; PHILBIN, James. Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 815-823.
    OH SONG, Hyun, et al. Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 4004-4012.
    KOCH, Gregory, et al. Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop. 2015.

    下載圖示
    QR CODE