簡易檢索 / 詳目顯示

研究生: 陳柏諺
Chen, Po-Yen
論文名稱: 基於Transformer物件關聯模型應用於籃球賽事分析
Application of Object Relation Modeling with Transformer in Basketball Analytics
指導教授: 林政宏
Lin, Cheng-Hung
口試委員: 賴穎暉
Lai, Ying-Hui
陳勇志
Chen, Yung-Chih
林政宏
Lin, Cheng-Hung
口試日期: 2024/01/17
學位類別: 碩士
Master
系所名稱: 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 47
中文關鍵詞: 深度學習輸入資訊最佳化自注意機制物體關聯性
英文關鍵詞: Deep Learning, Input Optimization, Self-Attention, Object Correlation
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202400137
論文種類: 學術論文
相關次數: 點閱:62下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在籃球賽事分析中,準確識別持球者和判斷得分時機對於確定得分者是關鍵挑戰。傳統的分析方法,比如物件重疊度和相對距離測量,往往在識別持球和進球時刻面臨較高的誤判風險。
    為了解決這一問題,我們對本團隊先前提出的Transformer-based Object Relationship Finder(ORF)架構的輸入特徵進行了改進,重點關注了幾個關鍵因素:與球密切相關的球員、球員的姿勢,以及不同的物件類型。這一策略顯著提高了架構在識別複雜動作和搶球情況下的準確度,使得持球者的識別準確率從原來的80.79%提升至86.18%,有效地展示了精準特徵選擇的重要性。此外,我們還利用Transformer-based Object Relationship Finder架構來識別進球時機,並結合最後接觸球的持球者信息,從而有效地判斷得分者,相較於傳統方法我們將得分者準確率從63.89%提高到了87.50%,這一成績突顯了Transformer-based Object Relationship Finder在籃球分析中的強大效能和廣泛應用前景。
    最後,我們開發了一款整合了這些技術的應用工具。這不僅讓教練和分析師能更全面地理解比賽情況,還為未來的籃球研究和技術開發提供了堅實的基礎。

    In basketball game analysis, accurately identifying the ball handler and determining the scoring opportunity is crucial for pinpointing the scorer. Traditional analysis methods, such as object overlap and distance measurement, often face a high risk of misjudgment in identifying ball handling and scoring moments.
    To address this issue, we improved the input features of the Transformer-based Object Relationship Finder (ORF) architecture previously proposed by our team, with a focus on several key factors: players closely associated with the ball, their postures, and different types of objects. This strategy significantly increased the accuracy of the architecture in identifying complex actions and ball contest situations, raising the accuracy of ball handler identification from 80.79% to 86.18%, effectively demonstrating the importance of precise feature selection. Moreover, we utilized the Transformer-based Object Relationship Finder architecture to identify the timing of scoring moments, combined with the information of the last player to touch the ball, thereby effectively determining the scorer. Compared to traditional methods, we increased the scorer identification accuracy from 63.89% to 87.50%, highlighting the strong performance and wide application prospects of the Transformer-based Object Relationship Finder in basketball analysis.
    Finally, we developed an application tool that integrates these technologies. This not only enables coaches and analysts to understand the game more comprehensively but also lays a solid foundation for future basketball research and technological development.

    誌  謝 i 中文摘要 ii 英文摘要 iii 目  錄 v 圖 目 錄 vii 表 目 錄 ix 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 3 1.3 研究方法概述 3 1.4 研究貢獻 4 1.5 論文架構 5 第二章 文獻探討 6 2.1 YOLOv7 6 2.2 TrackerKCF 9 2.3 ViTPose 10 2.4 IoU 12 2.5 Transformer 14 2.5.1 Self-Attention and Multi-Head Attention 15 2.5.2 Feedforward Network 16 2.5.3 Position Encoding 16 2.5.4 Residual Connection 17 2.5.5 Layer Normalization 18 2.6 Transformer-based Object Relationship Finder 18 第三章 研究方法 20 3.1 持球者輸入特徵篩選 20 3.1.1 利用IoU優化持球者分析 20 3.1.2 球員的位置與骨架關節點的選擇 22 3.2 得分者判斷 23 3.2.1 何時發生進球 23 3.2.2 識別實現得分的具體球員 25 3.3 透視變換 26 第四章 實驗結果 29 4.1 K-Fold 29 4.2 評測指標 30 4.2.1 持球者評測指標 30 4.2.2 得分者評測指標 31 4.3 比較結果 32 4.4 視覺化分析 35 4.5 應用程式 37 第五章 結論與未來展望 41 5.1 結論 41 5.2 未來展望 41 參考文獻 42 自  傳 47 學術成就 47

    P.-Y. Huang, P.-Y. Chou, and C.-H. Lin, "A Transformer-based Object Relationship Finder for Object Status Analysis," in Proc. ICCE-Taiwan, Taiwan, Jul. 2023, pp. 563-564. DOI: 10.1109/ICCE-Taiwan58799.2023.10226887.
    A. Krizhevsky, I. Sutskever, and G. Hinton, "Imagenet classification with deep convolutional neural networks," in Proc. Neural Information Processing Systems (NIPS), Lake Tahoe, NV, 2012, pp. 1097-1105.
    K. Simonyan, and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in arXiv preprint, arXiv:1409.1556, 2014.
    K. He, X. Zhang, and S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," in arXiv preprint, arXiv:1512.03385, 2015.
    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in arXiv preprint, arXiv:1506.02640, 2015.
    Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields," in arXiv preprint, arXiv:1812.08008, 2019.
    H.-S. Fang, J. Li, H. Tang, C. Xu, H. Zhu, Y. Xiu, Y.-L. Li, and C. Lu, "AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time," in arXiv preprint, arXiv:2211.03375, 2022.
    J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang, W. Liu, and B. Xiao, "Deep High-Resolution Representation Learning for Visual Recognition," in arXiv preprint, arXiv:1908.07919, 2020.
    Y. Xu, J. Zhang, Q. Zhang, and D. Tao, "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation," in arXiv preprint, arXiv:2204.12484, 2022.
    D. P. Kingma, and M. Welling, "Auto-Encoding Variational Bayes," in arXiv preprint, arXiv:1312.6114, 2014.
    I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to Sequence Learning with Neural Networks," in arXiv preprint, arXiv:1409.3215, 2014.
    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention Is All You Need," in arXiv preprint, arXiv:1706.03762, 2017.
    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in arXiv preprint, arXiv:1810.04805, 2018.
    A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving Language Understanding by Generative Pre-Training," in arXiv preprint, arXiv:1803.08240, 2018.
    C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," in arXiv preprint, arXiv:2207.02696, 2022.
    N. Wojke, A. Bewley, and D. Paulus, "Simple Online and Realtime Tracking with a Deep Association Metric," in arXiv preprint, arXiv:1703.07402, 2017.
    P. Bergmann, T. Meinhardt, and L. Leal-Taixe, "Tracking Without Bells and Whistles," in arXiv preprint, arXiv:1903.05625, 2019.
    S. Sun, and et al., "Transtrack: Multiple object tracking with transformer," in arXiv preprint, arXiv:2012.15460, 2020.
    T. Meinhardt, A. Kirillov, L. Leal-Taixe, and C. Feichtenhofer, "TrackFormer: Multi-Object Tracking with Transformers," in arXiv preprint, arXiv:2101.02702, 2021.
    J. Cao, X. Weng, R. Khirodkar, J. Pang, and K. Kitani, "Observation-centric sort: Rethinking sort for robust multi-object tracking," in arXiv preprint, arXiv:2203.14360, 2022.
    A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," in arXiv preprint, arXiv:2004.10934, 2020.
    C.-Y. Wang, "CSPNet: A New Backbone that can Enhance Learning Capability of CNN," in arXiv preprint, arXiv:1911.11929, 2019.
    T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature Pyramid Networks for Object Detection," in arXiv preprint, arXiv:1612.03144, 2017.
    S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path Aggregation Network for Instance Segmentation," in arXiv preprint, arXiv:1803.01534, 2018.
    X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, J. Sun, "RepVGG: Making VGG-style ConvNets Great Again," in arXiv preprint, arXiv:2101.03697, 2021.
    G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks," in arXiv preprint, arXiv:1608.06993, 2018.
    R. Pascanu, "How to Construct Deep Recurrent Neural Networks," in arXiv preprint, arXiv:1312.6026, 2013.
    H. Sak, A. Senior, and F. Beaufays, "Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition," in arXiv preprint, arXiv:1402.1128, 2014.
    Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov, "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context," in arXiv preprint, arXiv:1901.02860, 2019.
    Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, "XLNet: Generalized Autoregressive Pretraining for Language Understanding," in arXiv preprint, arXiv:1906.08237, 2020.
    Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, "RoBERTa: A Robustly Optimized BERT Pretraining Approach," in arXiv preprint, arXiv:1907.11692, 2019.
    W. Fedus, B. Zoph, and N. Shazeer, "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity," in arXiv preprint, arXiv:2101.03961, 2022.
    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-End Object Detection with Transformers," in arXiv preprint, arXiv:2005.12872, 2020.
    K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, "Masked Autoencoders Are Scalable Vision Learners," in arXiv preprint, arXiv:2111.06377, 2021.
    Z. Xia, X. Pan, S. Song, L. E. Li, and G. Huang, "Vision Transformer with Deformable Attention," in arXiv preprint, arXiv:2201.00520, 2022.
    X. Zhai, A. Kolesnikov, N. Houlsby, and L. Beyer, "Scaling Vision Transformers," in arXiv preprint, arXiv:2106.04560, 2022.
    L. Dong, S. Xu and B. Xu, "Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 5884-5888, doi: 10.1109/ICASSP.2018.8462506.
    Q. Zhang, H. Lu, H. Sak, A. Tripathi, E. McDermott, S. Koo, and S. Kumar, "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss," in arXiv preprint, arXiv:2002.02562, 2020.
    A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang, "Conformer: Convolution-augmented Transformer for Speech Recognition," in arXiv preprint, arXiv:2005.08100, 2020.
    Y. Zhang, J. Qin, D. S. Park, W. Han, C.-C. Chiu, R. Pang, Q. V. Le, and Y. Wu, "Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition," in arXiv preprint, arXiv:2010.10504, 2022.
    B. Li et al., "Scaling End-to-End Models for Large-Scale Multilingual ASR," in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Cartagena, Colombia, 2021, pp. 1011-1018, doi: 10.1109/ASRU51503.2021.9687871.
    G. Bebis and M. Georgiopoulos, "Feed-forward neural networks," in IEEE Potentials, vol. 13, no. 4, pp. 27-31, Oct.-Nov. 1994, doi: 10.1109/45.329294.
    J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer Normalization," in arXiv preprint, arXiv:1607.06450, 2016.
    N. Khatri, A. Dasgupta, Y. Shen, X. Zhong, and F. Y. Shih, "Perspective Transformation Layer," in arXiv preprint, arXiv:2201.05706, 2022.
    S. Yadav and S. Shukla, "Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification," in IEEE 6th International Conference on Advanced Computing (IACC), Bhimavaram, India, 2016, pp. 78-83, doi: 10.1109/IACC.2016.25.

    無法下載圖示 電子全文延後公開
    2028/01/17
    QR CODE