簡易檢索 / 詳目顯示

研究生: 蔡妤涓
Tsai, Yu-Chuan
論文名稱: 基於深度學習之鯨豚個體身分辨識系統
Cetacean Individual Identification System Based on Deep Learning
指導教授: 方瓊瑤
Fang, Chiung-Yao
口試委員: 陳世旺
Chen, Sei-Wang
黃仲誼
Huang, Chung-I
羅安鈞
Luo, An-Chun
方瓊瑤
Fang, Chiung-Yao
口試日期: 2024/01/20
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 43
中文關鍵詞: 鯨魚海豚個體身分辨識深度學習影像檢索附加角度邊界損失函數動態邊界應用於子中心附加角度邊界損失函數
英文關鍵詞: Whale, Dolphin, Individual Identification, Deep Learning, Image Retrieval, Additive Angular Margin Loss, Sub-center Additive Angular Margin Loss with Dynamic Margin
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202400346
論文種類: 學術論文
相關次數: 點閱:81下載:23
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究提出一個基於深度學習之鯨豚個體身分辨識系統,希望透過鯨豚個體身分辨識的技術,追蹤鯨豚遷徙路徑來估算鯨豚族群數量,進一步評估和保護海洋生態系統的健康。研究目標為辨識同一物種內不同鯨豚個體的生物特徵,以及同一隻鯨豚在不同拍攝環境下的影像特徵差異。由於鯨豚資料集中存在影像品質不穩定和個體影像數量極不平均的問題,故本研究著手解決這些問題,包含資料前處理(Data Preprocessing)、提出模型改良方法,及不同面向的測試方法。
    本系統首先對鯨豚資料集進行資料前處理,接著進行鯨豚偵測,最後作鯨豚個體身分辨識。資料集前處理包括資料清理(Data Cleaning)和資料增強(Data Augmentation),其目的在解決資料集中的潛在問題。在鯨豚偵測階段,採用YOLOv5定位鯨豚位置,過濾背景雜訊以增加模型訓練速度。在鯨豚個體身分辨識階段,利用骨幹模型(Backbone Model)從鯨豚影像中提取特徵,並使用頭部模型(Head Model)進行個體身分預測。本研究使用EfficientNetV1-B4作為骨幹模型,頭部模型使用附加角度邊界損失函數(ArcFace)。針對資料集問題對頭部模型進行改良,以提高鯨豚個體身分辨識的正確率。透過在ArcFace加入子中心(Sub-center)向量,解決同一隻鯨豚在不同拍攝環境下的影像特徵差異的問題,從而提升鯨豚個體身分辨識的正確率。此外,引入動態邊界(Dynamic Margin)解決在訓練階段鯨豚個體影像數量極不平均的問題,加快模型的收斂速度。
    實驗結果顯示改良後的子中心附加角度邊界損失函數在三個面向的測試 實際應用情況、多數合成資料庫(Synthetic Data),和部分合成資料庫(影像數量3張以上的鯨豚個體)之mAP分別為68.63%、81.60%和35.70%。相較於原始的ArcFace提升4.83%、6.08%和8.19%。另外,將動態邊界應用於子中心附加角度邊界損失函數的改良方案,在維持相當正確率相當的情況下,減少28%的訓練時間。由實驗結果發現,本研究所提出的改良方案能對資料集問題進行適當處理並提升鯨豚個體身分辨識的準確率。

    This research presents a system based on deep learning for individual cetacean identification. It aims to track cetacean migration paths and estimate their population numbers for assessing and maintaining the health of marine ecosystems. The study focuses on distinguishing individual biological characteristics from images of cetaceans of the same species and those of the same cetaceans captured in different environments. To address the issues with the dataset, such as unstable image quality and a highly uneven distribution of individual images, the research focuses on data preprocessing, model improvement, and comprehensive testing methods.
    First, the cetacean dataset is preprocessed to achieve clean data. Subsequently, cetacean detection is performed using YOLOv5 to identify cetaceans and filter background noise, followed by cetacean individual identification. EfficientNetV1-B4 is chosen as the backbone model, and the Additive Angular Margin Loss (ArcFace) is adopted for the head model. Incorporating sub-centers into ArcFace addresses the problem of different image features of the same cetacean under varying environments, thus improving identification accuracy. Moreover, the introduction of dynamic margins in sub-center ArcFace deals with the uneven distribution of individual images during training, enhancing the model's convergence speed.
    Experimental results show that the improved sub-center ArcFace achieves higher mAP scores across three testing scenarios: real-world application, majority synthetic dataset, and partial synthetic dataset (individuals with more than three images). Compared to the original ArcFace, mAP improves by 4.83%, 6.08%, and 8.19%, respectively. Additionally, applying sub-center ArcFace with dynamic margins maintains similar accuracy levels while reducing training time by 28%. The findings indicate the effectiveness of the proposed improvements in handling dataset issues and improving the accuracy of cetacean individual identification.

    1 Introduction 1 1.1 Research Motivation 1 1.2 Research Difficulties 3 1.3 Research Contribution 8 2 Related Work 9 2.1 Research Background and Method 9 2.2 YOLO series for Object Detection 9 2.3 Backbone series for Feature Extraction 11 2.3.1 Densely Connected Convolutional Network (DenseNet) 11 2.3.2 ConvNeXt series 11 2.3.3 EfficientNet series 12 2.4 Head series for Individual Identification 14 2.4.1 Additive Angular Margin Loss (ArcFace) 14 3 Research Method 17 3.1 System Overview 17 3.2 Data Preprocessing 19 3.2.1 Data Cleaning 19 3.2.2 Data Augmentation 19 3.3 Examination of Backbone and Head Models 20 3.3.1 Examination of backbone model 20 3.3.2 Modification of head model 22 3.4 Comprehensive Evaluation Methods for Imbalanced Data 24 3.4.1 Training, validation, and testing phases of ArcFace 24 3.4.2 Evaluation methods of imbalanced data 25 4 Experimental Results 27 4.1 Research Environment and Equipment Settings 27 4.2 Cetacean Dataset 27 4.3 Evaluation for Image Retrieval 28 4.4 Backbone Examination Analysis 29 4.5 Head Improvement Analysis 31 5 Conclusion and Future Works 36 5.1 Conclusion 36 5.2 Future Works 37 References 38 Appendix 42

    [1] B.W. Eakins and G.F. Sharman, “Volumes of the World's Oceans from ETOPO1,” NOAA National Geophysical Data Center, Boulder, CO, 2010.
    [2] B.S. Halpern, C. Longo, D. Hardy, K.L. McLeod, J.F. Samhouri, S.K. Katona, K. Kleisner, S.E. Lester, J. O’Leary, M. Ranelletti, A.A. Rosenberg, C. Scarborough, E.R. Selig, B.D. Best, D.R. Brumbaugh, F.S. Chapin, L.B. Crowder, K.L. Daly, S.C. Doney, C. Elfes, M.J. Fogarty, S.D. Gaines, K.I. Jacobsen, L.B. Karrer, H.M. Leslie, E. Neeley, D. Pauly, S. Polasky, B. Ris, K.S. Martin, G.S. Stone, U.R. Sumaila, and D. Zeller, “An index to assess the health and benefits of the global ocean,” Nature, vol. 488, pp. 615-620, Aug. 2012.
    [3] J. Roman, J. Estes, L. Morissette, C. Smith, D. Costa, J. McCarthy, J.B. Nation, S. Nicol, A. Pershing, and V. Smetacek, “Whales as Marine Ecosystem Engineers,” Frontiers in Ecology and the Environment, vol. 12, no. 7, pp. 377-85, Jul. 2014.
    [4] R. Chami, T. Cosimano, C. Fullenkamp, S. Oztosun, R. Chami, T. Cosimano, C. Fullenkamp, and S. Oztosun, “Nature’s Solution to Climate Change,” Finance and Development, vol. 56, pp. 34-38, Dec. 2019.
    [5] T. Cheeseman, K. Southerland, W. Reade, and A. Howard, “Happywhale - Whale and Dolphin Identification,” Kaggle website (2022). Available: https://kaggle.com/competitions/happy-whale-and-dolphin (Feb. 7, 2023).
    [6] M.A. Rahman, “Happywhale: BoundingBox [YOLOv5],” Kaggle website (2022). Available: https://www.kaggle.com/code/awsaf49/happywhale-boundingbox-yolov5/notebook (Feb. 7, 2023).
    [Pat23] P.T. Patton, T. Cheeseman, K. Abe, T. Yamaguchi, W. Reade, K. Southerland, A. Howard, E.M. Oleson, J.B. Allen, E. Ashe, A. Athayde, R.W. Baird, C. Basran, E. Cabrera, J. Calambokidis, J. Cardoso, E.L. Carroll, A. Cesario, B.J. Cheney, E. Corsi, J. Currie, J.W. Durban, E.A. Falcone, H. Fearnbach, K. Flynn, T. Franklin, W. Franklin, B. G. Vernazzani, T. Genov, M. Hill, D.R. Johnston, E.L. Keene, S.D. Mahaffy, T.L. McGuire, L. McPherson, C. Meyer, R. Michaud, A. Miliou, D.N. Orbach, H.C. Pearson, M.H. Rasmussen, W.J. Rayment, C. Rinaldi, R. Rinaldi, S. Siciliano, S. Stack, B. Tintore, L.G. Torres, J.R. Towers, C. Trotter, R.T. Moore, C.R. Weir, R. Wellard, R. Wells, K.M. Yano, J.R. Zaeschmar, and L. Bejder, “A deep learning approach to photo–identification demonstrates high performance on two dozen cetacean species,” Methods in Ecology and Evolution, vol. 14, pp. 2611-2625, Oct. 2023.
    [Red16] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, 2016, pp. 779-788.
    [Red17] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, Honolulu, 2017, pp. 6517-6525.
    [Lof15] S. Loffe and C. Szegedy, (2015). “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv preprint arXiv: 1502.03167.
    [Ren15] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Proceedings of Advances in neural information processing systems 28 (NIPS), Montréal, Canada, 2015.
    [Red18] J. Redmon and A. Farhadi, (2018). “YOLOv3: An Incremental Improvement,” arXiv preprint arXiv:1804.02767.
    [Lin17] T.Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, Honolulu, 2017, pp. 2117-2125.
    [He16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, 2016, pp. 770-778.
    [Boc20] A. Bochkovskiy, C. Wang, and H.M. Liao, (2020). “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv preprint arXiv: 2004.10934.
    [Wan20] C.Y. Wang, H.Y. Mark Liao, Y.H. Wu, P.Y. Chen, J.W. Hsieh, and I.H. Yeh, "CSPNet: A New Backbone that can Enhance Learning Capability of CNN," Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, Seattle, WA, USA, pp. 1571-1580.
    [He15] K. He, X. Zhang, S. Ren and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 37, no. 9, pp. 1904-1916, Sept. 2015.
    [Liu18] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia; “Path Aggregation Network for Instance Segmentation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 8759-8768.
    [Joc20] G. Jocher, “YOLOv5 in Pytorch,” Github (2020). Available: https://github.com/ultralytics/yolov5 (Oct. 23, 2023).
    [Hu18] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, Salt Lake City, UT, USA, pp. 7132-7141.
    [Hua17] G. Huang, Z. Liu, L. Van Der Maaten, and K.Q. Weinberger, “Densely connected convolutional networks,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, Honolulu, 2017, pp. 4700-4708.
    [Liu21] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin., and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 2021, pp. 10012-10022.
    [Liu22] Z. Liu, H. Mao, C. -Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A ConvNet for the 2020s,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 11966-11976.
    [Woo23] S. Woo, S. Debnath, R. Hu, Z. Liu, I. S. Kweon, and S. Xie, “ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 16133-16142.
    [Tan19] M. Tan and Q.V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Proceedings of 36th International Conference on Machine Learning (PMLR), Long Beach, California, USA, 2019, vol. 97, pp. 6105-6114.
    [Cho17] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, Honolulu, 2017, pp. 1251-1258.
    [Tan21] M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” Proceedings of International Conference on Machine Learning (PMLR), online, 2021, pp. 10096-10106.
    [Den22] J. Deng, J. Guo, J. Yang, N. Xue, I. Kotsia, and S. Zafeiriou, “ArcFace: Additive Angular Margin Loss for Deep Face Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 44, no. 10, pp. 5962-5979, Oct. 2022.
    [Ha20] Q. Ha, B. Liu, F. Liu, and P. Liao, (2020). “Google Landmark Recognition 2020 Competition Third Place Solution,” arXiv preprint arXiv: 2010.05350.
    [Hua20] Y. Huang, Y. Wang, Y. Tai, X. Liu, P. Shen, S. Li, J. Li, and F. Huang, “CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, Washington virtual/online, 2020, pp. 5901-5910.
    [Bus20] A. Buslaev, V.I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin, and A.A. Kalinin, “Albumentations: Fast and Flexible Image Augmentations,” Information, vol. 11, no. 2: 125, Feb. 2020.
    [Woo18] S. Woo, J. Park, J.Y. Lee, and I.S. Kweon, “Cbam: Convolutional block attention module,” Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 2018, pp. 3-19.

    下載圖示
    QR CODE