研究生: |
黃貞棠 Huang, Zhen-Tang |
---|---|
論文名稱: |
Weakly Supervised Object Localization Using A Self-Training Approach Weakly Supervised Object Localization Using A Self-Training Approach |
指導教授: |
葉梅珍
Yeh, Mei-Chen |
口試委員: |
林嘉文
Lin, Chia-Wen 朱威達 Chu, Wei-Ta 葉梅珍 Yeh, Mei-Chen |
口試日期: | 2022/07/01 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 英文 |
論文頁數: | 27 |
中文關鍵詞: | 深度學習 、弱監督物件定位 、半監督學習 |
英文關鍵詞: | Weakly Supervised, Object Localization |
DOI URL: | http://doi.org/10.6345/NTNU202300790 |
論文種類: | 學術論文 |
相關次數: | 點閱:96 下載:5 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年物件定位與偵測在深度學習的研究上持續受到關注,許多物件定位的技術廣泛應用於我們的產業與生活中,為了達成可以在現實應用為目標,我們必須考量在現實應用中缺乏完整標註資料與標註成本的問題,且模型的準確率與泛化能力更是重要的指標。
弱監督物件定位(Weakly supervised object localization)的研究,是在訓練集資料中缺乏位置級標籤,僅能使用圖像級標籤進行物件定位的工作。最近一個弱監督物件定位規範被提出,其包含一個具位置級標籤但少量的驗證集,供研究者調整超參以訓練出強健的定位模型。在本論文中,我們提出以半監督學習的技術應用於弱監督物件定位工作。我們獨立拆分定位模型與分類模型,可以解決少樣本的物件定位問題,並且避免分類影響定位的性能。我們提出的定位模型以半監督學習的方式進行訓練,首先使用非常少量有標籤的資料訓練定位模型,再利用其為未標註資料產生偽標籤。我們又提出一個偽標籤篩選演算法,從兩個不同的定位結果利用其互補特性,選擇高品質偽的標籤,並解決資料不平衡與樣本難易度差異的問題。最後我們將篩選後的偽標籤作為訓練資料再次訓練模型,配合使用訓練樣本預訓練的分類器進行分類與辨識。我們提出的方法可以有效利用弱標註的訓練資料,降低資料標註成本。我們所提出的方法不僅能有效提升模型的準確率,同時也將模型測試於不同的資料集中展現模型的泛化能力。
Object localization and detection have received increasing attention in computer vision and enable a wide range of applications. In the development of such deep learning techniques, we must consider the scale and the cost of data annotations, as well as the accuracy and the generalization ability of the developed model. Weakly supervised object localization deals with the lack of location-level labels in the training data---only image-level labels are available to train a model for inferring object locations. Recently, researchers found this problem ill-posed and propose a benchmark that involves a small amount of full supervisions for training a model. In this paper, we present a semi-supervised learning method to address the problem. In particular, we decompose this task into class-agnostic object detection and image classification, alleviating the need of samples to train a robust detector. The proposed localization model is developed via self-training: we use a small amount of data will full supervision to train the class-agnostic detector, and then use it to generate pseudo bounding boxes for data with weak supervision. Furthermore, we propose a selection algorithm to discover high-quality pseudo labels, along with a loss function to deal with data imbalance. We show that the proposed semi-supervised learning method is competitive to state-of-the-art methods. Moreover, it has the potential to generalize well on different datasets.
[1] Behl, A., Jafari, O.H., Mustikovela, S.K., Alhaija, H.A., Rother, C., Geiger, A.: Bounding boxes, segmentations and object coordinates: How important is recognition for 3d scene flow estimation in autonomous driving scenarios? In: Proceedings of the IEEE/CVF Conference on Computer Vision. (2017)
[2] Barnes, D., Maddern, W., Posner, I.: Find your own way: Weakly-supervised segmentation of path proposals for urban autonomy. In: IEEE International Conference on Robotics and Automation, IEEE (2017) 203–210
[3] Xu, L., Lv, S., Deng, Y., Li, X.: A weakly supervised surface defect detection based on convolutional neural network. IEEE Access 8 (2020) 42285–42296
[4] Hwang, S., Kim, H.E.: Self-transfer learning for weakly supervised lesion localization. In: International Conference on Medical Image Computing and Computer- Assisted Intervention, Springer (2016) 239–246
[5] Choe, J., Oh, S.J., Chun, S., Lee, S., Akata, Z., Shim, H.: Evaluation for weakly supervised object localization: Protocol, metrics, and datasets. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)
[6] Choe, J., Oh, S.J., Lee, S., Chun, S., Akata, Z., Shim, H.: Evaluating weakly super- vised object localization methods right. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). (2020)
[7] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115 (2015) 211–252
[8] Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltechucsd birds 200. (2010)
[9] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2016)
[10] Zhang, X., Wei, Y., Feng, J., Yang, Y., Huang, T.S.: Adversarial complementary learning for weakly supervised object localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2018) 1325–1334
[11] Mai, J., Yang, M., Luo, W.: Erasing integrated learning: A simple yet effective approach for weakly supervised object localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2020) 8766–8775
[12] Zhang, X., Wei, Y., Kang, G., Yang, Y., Huang, T.: Self-produced guidance for weakly-supervised object localization. In: European Conference on Computer Vision. (2018) 597–613
[13] Zhang, C.L., Cao, Y.H., Wu, J.: Rethinking the route towards weakly supervised object localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2020) 13460–13469
[14] Wei, J., Wang, Q., Li, Z., Wang, S., Zhou, S.K., Cui, S.: Shallow feature matters for weakly supervised object localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2021) 5993–6001
[15] Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
[16] Wang, C.Y., Yeh, I.H., Liao, H.Y.M.: You only learn one representation: Unified network for multiple tasks. arXiv preprint arXiv:2105.04206 (2021)
[17] Jocher, G.: ultralytics/yolov5: v3.1 - Bug Fixes and Performance Improvements. https://github.com/ultralytics/yolov5 (2020)
[18] Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition. (2014) 580–587
[19] Ren, S., He, K., Girshick, R., Sun, J.: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems. Volume 9199. (2015) 2969239–2969250
[20] Sohn, K., Zhang, Z., Li, C.L., Zhang, H., Lee, C.Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020)
[21] Tang, Y., Chen, W., Luo, Y., Zhang, Y.: Humble teachers teach better students for semi-supervised object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2021) 3132–3141
[22] Zhou, Q., Yu, C., Wang, Z., Qian, Q., Li, H.: Instant-teaching: An end-to-end semi- supervised object detection framework. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2021) 4081–4090
[23] Liu, Y.C., Ma, C.Y., He, Z., Kuo, C.W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. In: International Conference on Learning Representations. (2021)
[24] Yang, L., Zhuo, W., Qi, L., Shi, Y., Gao, Y.: St++: Make self-training work better for semi-supervised semantic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2022) 4268–4277
[25] Lin, T.Y., Goyal, P., Girshick, R., He, K., Doll ́ar, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision. (2017)
[26] Xie, S., Girshick, R., Doll ́ar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition. (2017) 1492–1500
[27] Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: European Conference on Computer Vision. (2018) 420–435
[28] Guo, G., Han, J., Wan, F., Zhang, D.: Strengthen learning tolerance for weakly supervised object localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2021) 7403–7412
[29] Pan, X., Gao, Y., Lin, Z., Tang, F., Dong, W., Yuan, H., Huang, F., Xu, C.: Unveiling the potential of structure preserving for weakly supervised object localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2021) 11642–11651
[30] Kim, E., Kim, S., Lee, J., Kim, H., Yoon, S.: Bridging the gap between classification and localization for weakly supervised object localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2022)
[31] Zhu, L., She, Q., Chen, Q., You, Y., Wang, B., Lu, Y.: Weakly supervised object localization as domain adaption. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2022)
[32] Wu, P., Zhai, W., Cao, Y.: Background activation suppression for weakly supervised object localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2022)
[33] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll ́ar, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, Springer (2014) 740–755