研究生: |
陳尚德 Chen, Shang-De |
---|---|
論文名稱: |
基於高速球種定位系統之深度集成與漸進訓練策略 ADEPTS: An Advanced Deep Ensemble and Progressively Training Strategy for High-speed Ball Localization |
指導教授: |
林政宏
Lin, Cheng-Hung |
口試委員: |
賴穎暉
Lai, Ying-Hui 陳勇志 Chen, Yung-Chih 林政宏 Lin, Cheng-Hung |
口試日期: | 2024/01/17 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 46 |
中文關鍵詞: | 球種定位 、生成式深度神經網路 、漸進式訓練 、多尺度學習 |
英文關鍵詞: | ball localization, progressively training, multi-scale learning |
DOI URL: | http://doi.org/10.6345/NTNU202400138 |
論文種類: | 學術論文 |
相關次數: | 點閱:69 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,將體育賽事與深度學習架構相結合在應用層面引起了廣泛地關注, 其中智能裁判、戰術規劃等結合神經網路進行輔助之需求日漸增加。在相關的應 用當中,深度學習架構通常扮演輔助角色,以幫助運動員或團隊針對比賽過程進 行分析,從而全面了解當前之比賽狀態,其中又以球類運動為甚。為執行高效之 戰術分析,在球類運動當中運動員和球種的位置檢測具有重要意義,其精確檢測 與否將大幅影響整體之戰術規劃結果。然而,部分球種之快速、體積小且不易預 測的特性使過往常見之物件偵測架構不易進行定位,成為了一個挑戰性的問題。 為此,於本文當中我們援引生成網路架構進行高速球種定位系統設計,並提出 ADEPTS 策略以針對該系統之訓練策略進行最佳化。ADEPTS 結合了多尺度特徵 融合和漸進式學習方法,使網路能夠更準確地捕捉高速球運動的軌跡特徵,同時 提高了訓練效率。研究結果表明,我們設計之高速球種定位系統可取得高定位精 度,且 ADEPTS 的加入可以額外減少該架構約 26.14%的訓練時間,這使其成為 實際應用中的實用和有效解決方案。
In recent times, integrating sports events with deep learning architectures has attracted significant attention, resulting in an increasing demand for applications in this field. In regards to sports such as shuttlecock or tennis, the precise monitoring of player and ball positions holds significant importance. This task is indispensable for a thorough understanding of the game’s current state, serving both coaches and players. However, fast and unpredictable behavior makes extracting representative features a challenging issue that remains to be solved. To address this problem, we propose Advanced Deep Ensemble and Progressively Training Strategy (ADEPTS), which is an optimized training strategy designed for ball detection systems. ADEPTS combines multi-scale feature fusion and the progressive learning approach, allowing networks to capture the trajectory features of high-speed ball movement more accurately while improving training efficiency. Experimental results show that ADEPTS can significantly reduce training time by about 26.14% with high-resolution outputs. Additionally, it makes the network achieve even better localization accuracy, which makes it a practical and effective solution for real-world applications.
P. R. Kamble, A. G. Keskar, and K. M. Bhurchandi, “Ball tracking in sports: a survey,” Artificial Intelligence Review, vol. 52, pp. 1655–1705, 2019.
E. E. Cust, A. J. Sweeting, K. Ball, and S. Robertson, “Machine and deep learning for sport-specific movement recognition: A systematic review of model development and performance,” Journal of sports sciences, vol. 37, no. 5, pp. 568–600, 2019.
F. Wu, Q. Wang, J. Bian, N. Ding, F. Lu, J. Cheng, D. Dou, and H. Xiong,“A survey on video action recognition in sports: Datasets, methods and applications,” IEEE Transactions on Multimedia, 2022.
Z. Zhao, W. Chai, S. Hao, W. Hu, G. Wang, S. Cao, M. Song, J.-N. Hwang, and G. Wang, “A survey of deep learning in sports applications: Perception, comprehension, and decision,” arXiv preprint arXiv:2307.03353, 2023.
“Hawk-Eye Innovations,” https://www.hawkeyeinnovations.com, 2023, accessed on August 5, 2023.
H.-T. Chen, W.-J. Tsai, S.-Y. Lee, and J.-Y. Yu, “Ball tracking and 3d trajectory approximation with applications to tactics analysis from single-camera volleyball sequences,” Multimedia Tools and Applications, vol. 60, pp. 641–667, 2012.
X. Wang, V. Ablavsky, H. B. Shitrit, and P. Fua, “Take your eyes off the ball: Improving ball-tracking by focusing on team play,” Computer Vision and Image Understanding, vol. 119, pp. 102–115, 2014.
H. Myint, P. Wong, L. Dooley, and A. Hopgood, “Tracking a table tennis ball for umpiring purposes,” in 2015 14th IAPR International Conference on Machine Vision Applications (MVA). IEEE, 2015, pp. 170–173.
Y.-C. Huang, I.-N. Liao, C.-H. Chen, T.-U. ˙Ik, and W.-C. Peng, “Tracknet: A deep learning network for tracking high-speed and tiny objects in sports applications,” in 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2019, pp. 1–8.
N.-E. Sun, Y.-C. Lin, S.-P. Chuang, T.-H. Hsu, D.-R. Yu, H.-Y. Chung, and T.-U. ˙Ik, “Tracknetv2: Efficient shuttlecock tracking network,” in 2020 International Conference on Pervasive Artificial Intelligence (ICPAI). IEEE, 2020, pp. 86–91.
T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical -40- image segmentation,” in Medical Image Computing and Computer-Assisted Intervention– MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”International journal of computer vision, vol. 60, pp. 91–110, 2004.
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1. Ieee, 2005, pp. 886–893
H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” in Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9. Springer, 2006, pp. 404–417.
P. Dollar, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 8, pp. 1532–1545, 2014.
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627–1645, 2009.
C. Cortes and V. Vapnik, “Support-vector networks,” Mach Learn, vol. 20, no. 3, pp. 273– 297, 1995.
T. K. Ho, “Random decision forests,” in ICDAR, 1995, pp. 278– 282.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-based convolutional networks for accurate object detection and segmentation,” TPAMI, vol. 38, no. 1, pp. 142–158, 2015.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NeurIPS, 2012, pp. 1097–1105.
K. He, X. Zhang, S. Ren, J. J. I. T. o. P. A. Sun, and M. Intelligence, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," vol. 37, no. 9, pp. 1904- 16, 2014.
K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in ´ Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969.
T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for ´ dense object detection,” TPAMI, vol. 42, no. 2, pp. 318–327, 2020.
J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
W. Liu et al., “Ssd: Single shot multibox detector,” in ECCV, 2016, pp. 21–37. -41-
X. Wang, R. Zhang, T. Kong, L. Li, and C. Shen, “Solov2: Dynamic and fast instance segmentation,” Advances in Neural information processing systems, vol. 33, pp. 17 721– 17 732, 2020.
Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: A simple and strong anchor-free object detector,” TPAMI, vol. 44, no. 4, pp. 1922–1933, 2022.
X. Zhou, D. Wang, and P. Krahenb ¨ uhl, “Objects as points,” ¨ arXiv preprint arXiv:1904.07850, 2019.
H. Law and J. Deng, “Cornernet: Detecting objects as paired keypoints,” in ECCV, 2018, pp. 734–750.
V. Mnih, N. Heess, A. Graves et al., “Recurrent models of visual attention,” Advances in neural information processing systems, vol. 27, 2014.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 012–10 022.
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-toend object detection with transformers,” in ECCV, 2020, pp. 213–229.
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” in ICLR, 2020.
T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft coco: Common objects in ´ context,” in ECCV, 2014, pp. 740–755.
X. Yu, Y. Gong, N. Jiang, Q. Ye, and Z. Han, “Scale match for tiny person detection,” in WACV, 2020, pp. 1257–1265.
S. Yang, P. Luo, C. C. Loy, and X. Tang, “Wider face: A face detection benchmark,” in CVPR, 2016, pp. 5525–5533.
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” IJCV, vol. 88, no. 2, pp. 303–338, 2010.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009, pp. 248–255. -42-
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, “Aggregated ´ residual transformations for deep neural networks,” in CVPR, 2017, pp. 1492–1500.
T. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in CVPR, 2017, pp. 2117–2125.
G. Ghiasi, T.-Y. Lin, and Q. V. Le, “Nas-fpn: Learning scalable feature pyramid architecture for object detection,” in CVPR, 2019, pp. 7029–7038.
M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” in CVPR, 2020, pp. 10 778–10 787.
S. Qiao, L.-C. Chen, and A. Yuille, “Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution,” in CVPR, 2021, pp. 10 208–10 219.
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in CVPR, 2018, pp. 8759–8768.
H. Zhang, K. Wang, Y. Tian, C. Gou, and F.-Y. Wang, “Mfrcnn: Incorporating multi-scale features and global information for traffic object detection,” IEEE Trans. Veh. Technol., vol. 67, no. 9, pp. 8019–8030, 2018.
S. Woo, S. Hwang, and I. S. Kweon, “Stairnet: Top-down semantic aggregation for accurate one shot detection,” in WACV, 2018, pp. 1093–1102.
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in CVPR, 2018, pp. 7794–7803.
S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in ECCV, 2018, pp. 3–19.
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in CVPR, 2018, pp. 7132– 7141.
M. Jaderberg, K. Simonyan, A. Zisserman et al., “Spatial transformer networks,” in NeurIPS, 2015, pp. 2017–2025.
A. Vaswani et al., “Attention is all you need,” in NeurIPS, 2017, pp. 6000–6010.
X. Dai, Y. Chen, B. Xiao, D. Chen, M. Liu, L. Yuan, and L. Zhang, “Dynamic head: Unifying object detection heads with attentions,” in CVPR, 2021, pp. 7373–7382.
C. Feng, Y. Zhong, Y. Gao, M. R. Scott, and W. Huang, “Tood: Task-aligned one-stage object detection,” in ICCV, 2021, pp. 3490– 3499.
Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “Ccnet: Criss-cross attention for semantic segmentation,” in ICCV, 2019, pp. 603–612. -43-
X. Yang, J. Yang, J. Yan, Y. Zhang, T. Zhang, Z. Guo, X. Sun, and K. Fu, “Scrdet: Towards more robust detection for small, cluttered and rotated objects,” in ICCV, 2019, pp. 8231–8240.
K. Yi, Z. Jian, S. Chen, and N. Zheng, “Feature selective small object detection via knowledge-based recurrent attentive neural network,” arXiv preprint arXiv:1803.05263, 2018.
T.-S. Fu, H.-T. Chen, C.-L. Chou, W.-J. Tsai, and S.-Y. Lee, “Screen-strategy analysis in broadcast basketball video using player tracking,” in 2011 Visual Communications and Image Processing (VCIP). IEEE, 2011, pp. 1–4.
M. Archana and M. K. Geetha, “Object detection and tracking based on trajectory in broadcast tennis video,” Procedia Computer Science, vol. 58, pp. 225–232, 2015.
V. Belagiannis and A. Zisserman, “Recurrent human pose estimation,” in 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, 2017, pp. 468–475.
T. Pfister, J. Charles, and A. Zisserman, “Flowing convnets for human pose estimation in videos,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1913–1921.
X. Yu, C.-H. Sim, J. R. Wang, and L. F. Cheong, “A trajectory-based ball detection and tracking algorithm in broadcast tennis video,” in 2004 International Conference on Image Processing, 2004. ICIP’04., vol. 2. IEEE, 2004, pp. 1049–1052.
Y.-C. Lo, P.-Y. Chou, B.-Z. Xie, and C.-H. Lin, “TinySeeker: A Network for Seeking Tiny and Fast Moving Object Based on Asymmetric U-Net and Calibrator,” 2023 IEEE 13th International Conference on Consumer Electronics - Berlin (ICCE-Berlin), September 2-5, 2023.
Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48.
S. Guo, W. Huang, H. Zhang, C. Zhuang, D. Dong, M. R. Scott, and D. Huang, “Curriculumnet: Weakly supervised learning from large-scale web images,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 135–150.
J. Han, Y. Yang, D. Zhang, D. Huang, D. Xu, and F. De La Torre,“Weakly-supervised learning of category-specific 3d object shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1423–1437, 2019. -44-
E. A. Platanios, O. Stretcu, G. Neubig, B. Poczos, and T. M. Mitchell, “Competencebased curriculum learning for neural machine translation,” arXiv preprint arXiv:1903.09848, 2019.
Y. Tay, S. Wang, L. A. Tuan, J. Fu, M. C. Phan, X. Yuan, J. Rao, S. C. Hui, and A. Zhang, “Simple and effective curriculum pointer-generator networks for reading comprehension over long narratives,” arXiv preprint arXiv:1905.10847, 2019.
C. Florensa, D. Held, M. Wulfmeier, M. Zhang, and P. Abbeel, “Reverse curriculum generation for reinforcement learning,” in Conference on robot learning. PMLR, 2017, pp. 482–495.
S. Narvekar, J. Sinapov, and P. Stone, “Autonomous task sequencing for customized curriculum design in reinforcement learning.” in IJCAI, 2017, pp. 2536–2542.
A. Beers, J. Brown, K. Chang, J. P. Campbell, S. Ostmo, M. F. Chiang, and J. KalpathyCramer, “High-resolution medical image synthesis using progressively grown generative adversarial networks,” arXiv preprint arXiv:1805.03144, 2018.
D. Kim, M. Kim, G. Kwon, and D.-S. Kim, “Progressive face super-resolution via attention to facial landmark,” arXiv preprint arXiv:1908.08239, 2019.
Y. Wang, H. Xie, S. Fang, Y. Qu, and Y. Zhang, “Pert: a progressively region-based network for scene text removal,” arXiv preprint arXiv:2106.13029, 2021.
B. Hariharan, P. Arbel ́aez, R. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 447–456.
T. Kong, A. Yao, Y. Chen, and F. Sun, “Hypernet: Towards accurate region proposal generation and joint object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 845–853.
S. Bell, C. L. Zitnick, K. Bala, and R. Girshick, “Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2874–2883.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
Q. Zhao, T. Sheng, Y. Wang, F. Ni, and L. Cai, “Cfenet: An accurate and efficient singleshot object detector for autonomous driving,” arXiv preprint arXiv:1806.09790, 2018.
P.-Y. Chou, C.-H. Lin, and W.-C. Kao, “A novel plug-in module for fine-grained visual classification,” arXiv preprint arXiv:2202.03822, 2022. -45-
P.-Y. Chou, Y.-Y. Kao, and C.-H. Lin, “Fine-grained visual classification with hightemperature refinement and background suppression,” arXiv preprint arXiv:2303.06442, 2023.
Y. Cui, C. Zeng, X. Zhao, Y. Yang, G. Wu, and L. Wang, “Sportsmot: A large multi-object tracking dataset in multiple sports scenes,” arXiv preprint arXiv:2304.05170, 2023.
S. Giancola, M. Amine, T. Dghaily, and B. Ghanem, “Soccernet: A scalable dataset for action spotting in soccer videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 1711–1721.