研究生: |
黃世龍 Huang, Shih-Long |
---|---|
論文名稱: |
Modified Faster R-CNN with Applications to Cat and Dog Image Detection Modified Faster R-CNN with Applications to Cat and Dog Image Detection |
指導教授: |
樂美亨
Yueh, Mei-Heng |
口試委員: |
樂美亨
Yueh, Mei-Heng 郭岳承 Kuo, Yueh-Cheng 黃聰明 Huang, Tsung-Ming |
口試日期: | 2024/07/22 |
學位類別: |
碩士 Master |
系所名稱: |
數學系 Department of Mathematics |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 58 |
中文關鍵詞: | 深度學習 、物件辨識 、Faster R-CNN |
英文關鍵詞: | Deep learning, Object detection, Faster R-CNN |
研究方法: | 實驗設計法 、 比較研究 、 觀察研究 |
DOI URL: | http://doi.org/10.6345/NTNU202401217 |
論文種類: | 學術論文 |
相關次數: | 點閱:195 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習技術的快速發展,神經網絡在物件檢測應用的範圍和性能上不斷改進,取得了顯著的進展。本論文基於 Faster R-CNN 框架,通過調整參數和卷積神經網絡,應用於 Kaggle 數據集中的貓狗圖像檢測。通過觀察性能變化並使用統計重採樣方法來確保數據集對模型精度和召回率的影響,論文展示了重採樣方法和參數調整如何影響模型的精度和召回率。在調整到最佳參數後,論文展示了基於 ResNet 的 Faster R-CNN 模型在物件特徵提取和邊界框回歸中的有效性,並比較了單階段物件辨識與兩階段物件辨識的精度差異。實驗結果表明,作為 Faster R-CNN 模型中特徵提取卷積神經網絡的 ResNet 在該數據集上表現出色,且兩階段物件辨識模型在此數據集上有較好的精度表現。
With the rapid development of deep learning technology, neural networks have continuously improved in both the scope and performance of object detection applications, achieving significant advancements. This thesis is based
on the Faster R-CNN framework, altering parameters and convolutional neural networks, and applies it to detecting cat and dog images in the Kaggle dataset. By observing performance changes and employing statistical resampling methods to ensure the precision and recall of the dataset's impact on the model, the thesis demonstrates how resampling methods and parameter adjustments affect model precision and recall. After adjusting for optimal parameters, the effectiveness of the ResNet-based Faster R-CNN model in object feature extraction and bounding box regression, and compares the accuracy differences between one-stage and two-stage object detection. Experimental results indicate that ResNet, used as the feature extraction convolutional neural network in the Faster R-CNN model, performs excellently on this dataset, and the two-stage object detection model exhibits better accuracy performance on this dataset.
Chen, M., Yu, L., Zhi, C., Sun, R., Zhu, S., Gao, Z., Ke, Z., Zhu, M., & Zhang, Y. (2022). Improved faster R-CNN for fabric defect detection based on Gabor filter with Genetic Algorithm optimization. Computers in Industry, 134. https://doi.org/10.1016/j.compind.2021.103551
Fan, Q., Brown, L., & Smith, J. (2016). A closer look at Faster R-CNN for vehicle detection. 2016 IEEE intelligent vehicles symposium (IV), 124-129. https://doi.org/10.1109/IVS.2016.7535375
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International journal of computer vision, 59, 167-181. https://doi.org/10.1023/B:VISI.0000022288.19776.77
Girshick, R. (2015). Fast r-cnn. Proceedings of the IEEE international conference on computer vision, 1440-1448. https://doi.org/10.1109/ICCV.2015.169
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE international conference on computer vision, 1026-1034. https://doi.org/10.48550/arXiv.1502.01852
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 770-778. https://doi.org/10.48550/arXiv.1512.03385
Hosang, J., Benenson, R., Dollár, P., & Schiele, B. (2015). What makes for effective detection proposals? IEEE transactions on pattern analysis and machine intelligence, 38(4), 814-830. https://doi.org/10.1109/TPAMI.2015.2465908.
Hosang, J., Benenson, R., & Schiele, B. (2017). Learning non-maximum suppression. Proceedings of the IEEE conference on computer vision and pattern recognition, 4507-4515. https://doi.org/10.48550/arXiv.1705.02950
Hung, J., & Carpenter, A. (2017). Applying faster R-CNN for object detection on malaria images. Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 56-61. https://doi.org/10.48550/arXiv.1804.09548
Jiang, P., Ergu, D., Liu, F., Cai, Y., & Ma, B. (2022). A Review of Yolo algorithm developments. Procedia computer science, 199, 1066-1073. https://doi.org/10.1016/j.procs.2022.01.135
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791.
Liu, X., Ghazali, K. H., Han, F., & Mohamed, I. I. (2023). Review of CNN in aerial image processing. The Imaging Science Journal, 71(1), 1-13. https://doi.org/10.1080/13682199.2023.2174651
Maity, M., Banerjee, S., & Chaudhuri, S. S. (2021). Faster r-cnn and yolo based vehicle detection: A survey. 2021 5th international conference on computing methodologies and communication (ICCMC), 1442-1447. https://doi.org/10.1109/ICCMC51019.2021.9418274
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity (Vol. 5). https://doi.org/10.1007/BF02478259
Papageorgiou, C. P., Oren, M., & Poggio, T. (1998). A general framework for object detection. Sixth international conference on computer vision (IEEE Cat. No. 98CH36271), 555-562. https://doi.org/10.1109/TPAMI.2015.2465908.
Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061. https://doi.org/10.48550/arXiv.2010.16061
Qian, S., & Weng, G. (2015). Research on object detection based on mathematical morphology. 4th international conference on information technology and management innovation, 203-208. https://doi.org/10.2991/icitmi-15.2015.36
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031.
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., & Savarese, S. (2019). Generalized intersection over union: A metric and a loss for bounding box regression. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 658-666. https://doi.org/10.1109/CVPR.2019.00075
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International journal of computer vision, 104, 154-171. https://doi.org/10.1007/s11263-013-0620-5
Xiao, J., Wang, J., Cao, S., & Li, B. (2020). Application of a novel and improved VGG-19 network in the detection of workers wearing masks (Vol. 1518). IOP Publishing. https://doi.org/10.1088/1742-6596/1518/1/012041
Yadav, N., Yadav, A., & Kumar, M. (2015). An introduction to neural network methods for differential equations (Vol. 1). Springer. https://doi.org/10.1007/978-94-017-9816-7
Zhiqiang, W., & Jun, L. (2017). A review of object detection based on convolutional neural network. 2017 36th Chinese control conference (CCC), 11104-11109. https://doi.org/10.23919/ChiCC.2017.8029130.