研究生: |
林旭政 LIN, Hsu-Cheng |
---|---|
論文名稱: |
以深度學習為基礎之野生動物辨識系統 A Wildlife Recognition System Based on Deep Learning |
指導教授: |
方瓊瑤
Fang, Chiung-Yao |
口試委員: | 陳世旺 黃仲誼 羅安鈞 許之凡 |
口試日期: | 2021/07/23 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 63 |
中文關鍵詞: | 陷阱相機 、物種辨識 、物件辨識類神經網路 、深度學習 、野生動物物種辨識 |
英文關鍵詞: | camera trap, species recognition, object detection neural network, deep learning, wildlife recognition |
DOI URL: | http://doi.org/10.6345/NTNU202101057 |
論文種類: | 學術論文 |
相關次數: | 點閱:151 下載:34 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來人們對於動物保育的意識抬頭,越來越多人加入保護野生動物的行列,對於野生動物生態系統進行觀察可以為人們提供保育的方針。目前主要觀察的方法是使用陷阱相機,因為其能夠長時間運行且不對野生動物造成影響。但由於陷阱相機擷取影像數量太過龐大,分析影像變成耗費大量的時間與勞力的枯燥工作。故本研究擬開發一套以深度學習為基礎之野生動物辨識系統,能夠自動辨識與計算影像中動物種類與數量,以期達到輔助分析之結果。
野生動物辨識系統使用陷阱相機所攝影像來進行動物物種辨識,透過物件偵測類神經網路模型來辨識影像中野生動物的種類以及數量。本研究使用RefineDet物件偵測模型的改良版來進行動物物種辨識,將野生動物影像輸入本系統後,經由物件偵測類神經網路模型進行動物物種辨識以及數量統計。在改良RefineDet原型架構方面,本研究引入彈性非極值抑制演算法、受納域模塊等改良來提升RefineDet模型對野生動物的辨識能力;另外加入批量正規化技術來加速模型的訓練過程,提升RefineDet的整體效能。
本研究使用賽倫蓋蒂陷阱相機資料集(Snapshot Serengeti dataset)[2]所蒐集的影像進行訓練及測試,辨識的野生動物種類共有11種,分別是水牛(Buffalo)、非洲象(African Elephant)、葛氏瞪羚(Grant's Gazelle)、湯氏瞪羚(Thomson's Gazelle)、長頸鹿(Giraffe)、黑斑羚(Impala)、灰頸鷺鴇(Kori Bustard)、獅(Lion)、牛羚(Wildebeest)、斑馬(Zebra)等。實驗結果為本野生動物辨識系統VOC mAP為83.29%,顯示出本研究所提出之野生動物辨識系統確實能夠準確偵測出影像中野生動物物種及數量。
In recent years, people’s awareness of animal conservation has risen, and more and more people have joined to protecting wildlife. Observing the wildlife ecosystem can provide people with guidelines for conservation. At present, the main observation method is to use a camera trap, because it can service for a long time and does not affect wildlife. However, because the number of images captured by the camera trap are too large, analyzing the images becomes a boring task that consumes a lot of time and labor. Therefore, this research intends to develop a wildlife recongnition system based on deep learning, which can automatically identify and calculate the types and numbers of animals in the images, in order to achieve the results of auxiliary analysis.
The wildlife recognition system uses the image captured by the camera trap to recognize the species, and uses the object detection neural network model to recognize the type and number of wildlifes in the image. This research uses an improved version of the RefineDet object detection model for species recognition. After inputting wildlife images into the system, the object detection neural network model is used for species recognition and quantity statistics. In terms of improving the prototype architecture of RefineDet, this research introduces improvements such as soft non-maximum suppression algorithms and receptive field block to improve the RefineDet model’s ability to recognize wildlifes; in addition, batch normalization technology is added to accelerate the model training step and improve RefineDet’s performance.
In this study, images collected by the Snapshot Serengeti dataset [2] were used for training and testing. A total of 11 species of wild animals were identified, namely Buffalo and African Elephant, Grant's Gazelle, Thomson's Gazelle, Giraffe, Impala, Kori Bustard, Lion, Wildebeest, Zebra, etc. The experimental result is that the VOC mAP of the wild animal identification system is 83.29%, which shows that the wildlife recognition system can accurately detect the species and quantity of wildlifes in the image.
[Sch18] S. Schneider, G. W. Taylor, and S. Kremer, “Deep Learning Object Detection Methods for Ecological Camera Trap Data,” Proceedings of 2018 15th Conference on Computer and Robot Vision (CRV), Toronto, ON, 2018, pp. 321-328.
[Ceb15] G. Ceballos, P. R. Ehrlich, A. D. Barnosky, A. García, R. M. Pringle, and T. M. Palmer, “Accelerated Modern Human–Induced Species Losses: Entering the Sixth Mass Extinction,” Ameriacn Association for the Advancement of Science, vol. 1, no.5, May 1,2015.
[Hua19] Y. C. Huang, I. N. Liao, C. H. Chen, C. W. Yi, and W. C. Peng, “TrackNet: A Deep Learning Network for Tracking High-speed and Tiny Objects in Sports Applications,” Proceedings of 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 2019, pp. 1-8.
[Hao19] J. Hao, F. Jiang, R. Zhang, X. Lin, B. Leng, and G. Song, “Scale Pyramid Attention for Single Shot MultiBox Detector,“ IEEE Access, vol. 7, pp. 138816-138824, 2019.
[Liu15] X. Liu, D. Tao, M. Song, L. Zhang, J. Bu, and C. Chen, “Learning to Track Multiple Targets,” in IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 5, pp. 1060-1073, May 2015.
[He15] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 1 Sept. 2015
[Pat20] A. Patel, L. Cheung, N. Khatod, I. Matijosaitien, A. Arteaga and J. W. Gilkey Jr, “Real-Time Recognition of Galápagos Snake Species Using Deep Learning,” Animals 2020, 10, 806.
[Hua19] X. Huang, Z. Hu, X. Wang, X. Yang, J. Zhang, and D. Shi, “An Improved Single Shot Multibox Detector Method Applied in Body Condition Score for Dairy Cows,” Animals, 2019, vol. 9, pp. 470.
[Liu16] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. Berg, “SSD: Single Shot MultiBox Detector,” arXiv preprint arXiv:1512. 02325, 2016.
[Nin17] C. Ning, H. Zhou, Y. Song, and J. Tang, “Inception Single Shot MultiBox Detector for Object Detection,” Proceedings of 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, 2017, pp. 549-554.
[Zha18] S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, “Single-Shot Refinement Neural Network for Object Detection,” Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, 2018, pp. 4203-4212.
[Zha20] S. Zhang, L. Wen, Z. Lei, and S. Z. Li, “RefineDet++: Single-Shot Refinement Neural Network for Object Detection,” IEEE Transactions on Circuits and Systems for Video Technology, 2020.
[Wea19] O. R. Wearn and P. Glover-Kapfer, “Snap Happy: Camera Traps are an Effective Sampling Tool when Compared with Alternative Methods,” Published online 2019 Mar 6.
[He17] K. He, X. Zhang, S. Ren, and J. Sun, “DSSD: Deconvol-utional Single Shot Detector,” arXiv preprint arXiv:1701.06659, 2017.
[Fu15] C. Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, “Deep Residual Learning for Image Recognition,” arXiv preprint arXiv: 1512.03385, 2015.
[Gir15] R. Girshick, “Fast R-CNN,” Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440-1448, 2015.
[Ren17] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017.
[Red16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016.
[Liu18] S. Liu, D. Huang, and Y. Wang, “Receptive Field Block Net for Accurate and Fast Object Detection,” Proceedings of the European Conference on Computer Vision (ECCV), pp. 385-400, 2018.
[Gir16] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-Based Convolutional Networks for Accurate Object Detection and Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 142-158, 1 Jan. 2016.
[Red17] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517-6525, 2017.
[Red18] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv preprint arXiv:1804.02767, 2018.
[Boc20] A. Bochkovskiy, C. Y. Wang, and H. Y. Mark Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv preprint arXiv:2004. 10934, 2020.
[Bod17] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-NMS — Improving Object Detection with One Line of Code,” Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 5562-5570, 2017.
[Iof15] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Interal Covariate Shift,” Proceedings of The 32nd International Conference on Machine Learning, pp. 488-456, 2015.
[內18] 內政部營建署臺灣國家公園,為什麼生物多樣性很重要?2018年。取自https://np.cpami.gov.tw/youth/index.php?option=com_content& view=article&id=2754&Itemid=106
[Wik18] Wikipedia, Camera trap, https://en.wikipedia.org/wiki/Camera_trap
[Wik20] Wikipedia, 鱟試劑, https://zh.wikipedia.org/wiki/%E9%B2%8E%E8% AF%95%E5%89%82
[1] IUCN, The IUCN Red List of Threatened, https://www.iucnredlist.org/
[2] LILA BC, Labeled Information Library of Alexandria: Biology and Conservation, http://lila.science/
[3] A. Swanson, M. Kosmala, C. Lintott, R. Simpson, A. Smith, and C. Packer, “Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna,” Scientific Data 2: 150026., 2015.