簡易檢索 / 詳目顯示

研究生: 黃鼎傑
Huang, Ding-Jie
論文名稱: 基於穩定長期特徵之無空間關係之多目標多相機行人追蹤系統
A Long-Range Stable Feature based Multi-Target Multi-Camera Pedestrian Tracking System without Spatial Relationships
指導教授: 林政宏
Lin, Cheng-Hung
口試委員: 陳勇志
Chen, Yung-Chih
賴穎暉
Lai, Ying-Hui
林政宏
Lin, Cheng-Hung
口試日期: 2023/07/24
學位類別: 碩士
Master
系所名稱: 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 49
中文關鍵詞: 多目標多相機行人追蹤卡爾曼濾波器ID切換
英文關鍵詞: multi-target multi-camera pedestrian tracking, Kalman filters, identity switching
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202301060
論文種類: 學術論文
相關次數: 點閱:99下載:14
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 多目標多相機行人追蹤系統這項任務的目的是利用多個相機追蹤多個人物。目前的方法通常使用相機位置和畫面作為輸入,首先運用基於線性運動的卡爾曼濾波器對每個相機進行單相機的人物追蹤。同時,在追蹤過程中使用指數移動平均法提取人物特徵,最後利用相機位置和人物外觀特徵進行多相機間的人物匹配。然而,我們發現使用這種卡爾曼濾波器容易發生ID切換(identity switching)的問題。另外,若使用基於相機位置的追蹤系統,當場域切換到相機較多的位置時,需要重新編寫程式邏輯整合相機位置,耗費巨大的成本。如果不使用相機位置作為輸入,如何提取穩定且有效的人物外觀特徵就變得尤為重要。因此,在本研究中,我們提出了一種與過去方法不同的卡爾曼濾波器和一種新的長期特徵存儲方法,可以生成更穩定的外觀特徵,從而解決了使用指數移動平均法存儲的長期特徵可能導致的ID切換問題。此外,在匹配過程中不使用相機位置資訊,使得該系統更容易移植。為了評估我們提出的方法,我們建立了一個自有資料集,包含約40000幀、1080p、30fps的影片。實驗結果表明,我們的方法能夠更好地解決上述問題。在多相機追蹤方面,我們的IDF1性能指標相較於過去方法提升了約15%;在單相機追蹤方面,我們成功恢復了75%被交換的ID。

    The objective of a multi-target multi-camera pedestrian tracking system is to track multiple individuals using multiple cameras. Current methods use Kalman filters for trajectory prediction based on linear motion and match observed individuals with predicted trajectories using long-term features stored using exponential moving average and camera positions. However, these methods result in identity switching (ID switching) during non-linear motion and significant appearance changes. In addition, changing to a location with a large number of cameras can incur high costs due to the rewriting of the program logic. In our approach, we use a different Kalman filter and propose a new method for stable feature storage that eliminates ID switching and the need for camera position information. To evaluate the proposed approach, we have created our own dataset consisting of approximately 40,000 frames of 1080p, 30fps videos. Experimental results show that our method effectively addresses the aforementioned issues. In multi-camera tracking, our proposed method has about 15% improvement on IDF1 performance metrics compared to previous methods. For single-camera tracking, we successfully recover 75% of switched IDs.

    誌謝 i 中文摘要 ii 英文摘要 iii 目錄 iv 圖目錄 vi 表目錄 vii 第一章 緒論1 1.1 研究動機與背景1 1.2 研究目的4 1.3 研究方法概述4 1.4 研究貢獻5 第二章 文獻探討6 2.1 多人多相機追蹤系統6 2.2 物件辨識系統9 2.3 人物重識別模型11 2.4 多物件追蹤系統13 第三章 研究方法18 3.1 系統流程18 3.2 單相機追蹤算法18 3.3 長期特徵儲存算法21 3.4 基於人物特徵之ID恢復算法23 第四章 實驗結果25 4.1 系統驗證指標25 4.1.1 ID F1-score25 4.1.2 ID Switch26 4.1.3 Multi-cam ID Switch27 4.1.4 ID recovery rate27 4.2 資料集27 4.3 Re-ID模型選擇與門檻值制定實驗29 4.4 儲存幀數實驗30 4.5 單相機追蹤實驗31 4.6 多相機追蹤實驗32 4.7 輸出結果可視化33 第五章  結論與未來展望43 5.1 結論43 5.2 未來展望43 參考文獻44 自  傳49

    [1] Wojke, N., Bewley, A., & Paulus, D. (2017, September). Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP) (pp. 3645-3649). IEEE.
    [2] Cao, J., Pang, J., Weng, X., Khirodkar, R., & Kitani, K. (2023). Observation-centric sort: Rethinking sort for robust multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9686-9696).
    [3] He, L., Liu, G., Tian, G., Zhang, J., & Ji, Z. (2019). Efficient multi-view multi-target tracking using a distributed camera network. IEEE Sensors Journal, 20(4), 2056-2063.
    [4] Winardi, S., Akbar, S., & Wardhana, Y. D. (2019, October). Human Localization With Multi-Camera Using Detection And Tracking Object. In 2019 Fourth International Conference on Informatics and Computing (ICIC) (pp. 1-6). IEEE.
    [5] Hu, H., Hachiuma, R., Saito, H., Takatsume, Y., & Kajita, H. (2022). Multi-Camera Multi-Person Tracking and Re-Identification in an Operating Room. Journal of Imaging, 8(8), 219.
    [6] Byeon, M., Oh, S., Kim, K., Yoo, H. J., & Choi, J. Y. (2015). Efficient Spatio-Temporal Data Association Using Multidimensional Assignment in Multi-Camera Multi-Target Tracking. In BMVC (pp. 68-1).
    [7] Tesfaye, Y. T., Zemene, E., Prati, A., Pelillo, M., & Shah, M. (2019). Multi-target tracking in multiple non-overlapping cameras using fast-constrained dominant sets. International Journal of Computer Vision, 127, 1303-1320.
    [8] Li, Y. J., Weng, X., Xu, Y., & Kitani, K. M. (2021). Visio-temporal attention for multi-camera multi-target association. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9834-9844).
    [9] Ren, P., Lu, K., Yang, Y., Yang, Y., Sun, G., Wang, W., ... & Liu, W. (2021). Multi-camera vehicle tracking system based on spatial-temporal filtering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4213-4219).
    [10] Cao, Z., Niu, D., Gong, H., Shi, C., Zhang, G., & Yang, Y. (2022, November). A cross-camera pedestrian trajectory tracking method in shopping mall environment. In Jiangsu Annual Conference on Automation (JACA 2022) (Vol. 2022, pp. 70-75). IET.
    [11] Wei, M., & Pei, J. (2019, November). Pedestrian tracking combined with deep learning and camera network topology in non-overlapping multi-camera surveillance. In 2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE) (pp. 689-693). IEEE.
    [12] Zhang, X., & Izquierdo, E. (2019, December). Real-time multi-target multi-camera tracking with spatial-temporal information. In 2019 IEEE Visual Communications and Image Processing (VCIP) (pp. 1-4). IEEE.
    [13] Jiang, X., Fang, Z., Xiong, N. N., Gao, Y., Huang, B., Zhang, J., ... & Harrington, P. (2018). Data fusion-based multi-object tracking for unconstrained visual sensor networks. IEEE Access, 6, 13716-13728.
    [14] Im, S. K., & Chan, K. H. (2023, February). Distributed Spatial Transformer for Object Tracking in Multi-Camera. In 2023 25th International Conference on Advanced Communication Technology (ICACT) (pp. 122-125). IEEE.
    [15] Yu, J., Zhou, T., Cai, Z., & Kuang, W. (2023, June). Tracking Targets in Hyper-Scale Cameras Using Movement Predication. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
    [16] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
    [17] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
    [18] Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
    [19] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
    [20] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
    [21] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).
    [22] Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
    [23] Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
    [24] Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430.
    [25] Wang, C. Y., Liao, H. Y. M., Wu, Y. H., Chen, P. Y., Hsieh, J. W., & Yeh, I. H. (2020). CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 390-391).
    [26] Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759-8768).
    [27] Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 886-893). Ieee.
    [28] Zang, X., Li, G., & Gao, W. (2022). Multidirection and Multiscale Pyramid in Transformer for Video-Based Pedestrian Retrieval. IEEE Transactions on Industrial Informatics, 18(12), 8776-8785.
    [29] Yan, Y., Qin, J., Chen, J., Liu, L., Zhu, F., Tai, Y., & Shao, L. (2020). Learning multi-granular hypergraphs for video-based person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2899-2908).
    [30] Eom, C., Lee, G., Lee, J., & Ham, B. (2021). Video-based person re-identification with spatial and temporal memory networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 12036-12045).
    [31] He, S., Luo, H., Wang, P., Wang, F., Li, H., & Jiang, W. (2021). Transreid: Transformer-based object re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 15013-15022).
    [32] Wang, G., Lai, J., Huang, P., & Xie, X. (2019, July). Spatial-temporal person re-identification. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 8933-8940).
    [33] Wieczorek, M., Rychalska, B., & Dąbrowski, J. (2021). On the unreasonable effectiveness of centroids in image retrieval. In Neural Information Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part IV 28 (pp. 212-223). Springer International Publishing.
    [34] Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2019). Omni-scale feature learning for person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3702-3712).
    [35] Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2021). Learning generalisable omni-scale representations for person re-identification. IEEE transactions on pattern analysis and machine intelligence, 44(9), 5056-5069.
    [36] Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017, February). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).
    [37] Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492-1500).
    [38] Yoon, J. H., Yang, M. H., Lim, J., & Yoon, K. J. (2015, January). Bayesian multi-object tracking using motion context from multiple objects. In 2015 IEEE Winter Conference on Applications of Computer Vision (pp. 33-40). IEEE.
    [39] Dicle, C., Camps, O. I., & Sznaier, M. (2013). The way they move: Tracking multiple targets with similar appearance. In Proceedings of the IEEE international conference on computer vision (pp. 2304-2311).
    [40] Bewley, A., Ott, L., Ramos, F., & Upcroft, B. (2016, May). Alextrac: Affinity learning by exploring temporal reinforcement within association chains. In 2016 IEEE International conference on robotics and automation (ICRA) (pp. 2212-2218). IEEE.
    [41] Kim, C., Li, F., Ciptadi, A., & Rehg, J. M. (2015). Multiple hypothesis tracking revisited. In Proceedings of the IEEE international conference on computer vision (pp. 4696-4704).
    [42] Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016, September). Simple online and realtime tracking. In 2016 IEEE international conference on image processing (ICIP) (pp. 3464-3468). IEEE.
    [43] Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision (pp. 1116-1124).
    [44] Wang, Z., Zheng, L., Liu, Y., Li, Y., & Wang, S. (2020). Towards real-time multi-object tracking. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16 (pp. 107-122). Springer International Publishing.
    [45] Du, Y., Zhao, Z., Song, Y., Zhao, Y., Su, F., Gong, T., & Meng, H. (2023). Strongsort: Make deepsort great again. IEEE Transactions on Multimedia.
    [46] Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., ... & Wang, X. (2022, October). Bytetrack: Multi-object tracking by associating every detection box. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII (pp. 1-21). Cham: Springer Nature Switzerland.
    [47] Aharon, N., Orfaig, R., & Bobrovsky, B. Z. (2022). BoT-SORT: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651.
    [48] He, L., Liao, X., Liu, W., Liu, X., Cheng, P., & Mei, T. (2020). Fastreid: A pytorch toolbox for general instance re-identification. arXiv preprint arXiv:2006.02631.
    [49] Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., ... & Smola, A. (2022). Resnest: Split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2736-2746).
    [50] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    [51] Meinhardt, T., Kirillov, A., Leal-Taixe, L., & Feichtenhofer, C. (2022). Trackformer: Multi-object tracking with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8844-8854).
    [52] Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., & Wei, Y. (2022, October). Motr: End-to-end multiple-object tracking with transformer. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII (pp. 659-675). Cham: Springer Nature Switzerland.
    [53] Cao, Z., Huang, Z., Pan, L., Zhang, S., Liu, Z., & Fu, C. (2022). TCTrack: Temporal contexts for aerial tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14798-14808).

    下載圖示
    QR CODE