研究生: |
趙怡華 Chao, Yi-Hua |
---|---|
論文名稱: |
屬性增強於零樣本物件偵測 Attribute Augmentation for Zero-Shot Object Detection |
指導教授: |
葉梅珍
Yeh, Mei-Chen |
口試委員: |
彭彥璁
Peng, Yan-Tsung 方瓊瑤 Fang, Chiung-Yao 葉梅珍 Yeh, Mei-Chen |
口試日期: | 2024/01/30 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 31 |
中文關鍵詞: | 零樣本物件偵測 、生成對抗學習 、視覺語義關係 、屬性增強 |
英文關鍵詞: | zero-shot object detection, generative adversarial learning, visual-semantic relationship, attribute augmentation |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202400244 |
論文種類: | 學術論文 |
相關次數: | 點閱:63 下載:7 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本研究中,我們探討了零樣本物件偵測的問題,這項任務涉及預測目標影像中物件的位置和標籤,無論這些標籤屬於seen或unseen的類別。許多基於生成方法的零樣本物件偵測方法利用類別的語義屬性,結合高斯雜訊來生成視覺特徵。這種方法通過生成unseen類別的樣本,將零樣本物件偵測問題轉變為一個近似監督式物件偵測問題。然而,當前的生成模型是基於單一的、完整的語義屬性資料,包含一個類別的所有屬性資訊。以這種方式生成的視覺特徵並不足夠覆蓋真實的視覺特徵,真實的視覺特徵很可能缺乏某些屬性資訊。僅僅使用完整的視覺特徵,對於有效訓練分類器以對真實視覺特徵進行分類是不夠的。鑒於此,我們提出了一種方法,目的在於通過擴展語義屬性來生成模擬真實視覺特徵分布的多樣化特徵,以提升物件偵測模型中分類器的性能。我們在兩個常見的物件偵測資料集上測試了我們的方法:MS COCO 和 PASCAL VOC。
In this study, we investigate the problem of zero-shot object detection, a task that involves predicting the locations and labels of objects in target images, regardless of whether these labels belong to seen or unseen categories. Many zero-shot object detection approaches based on generative methods utilize the semantic attributes of categories combined with Gaussian noise to generate visual features. This approach transforms the zero-shot object detection problem into an approximated supervised object detection problem by generating samples of unseen categories. However, current generative models are based on a singular, complete set of semantic attributes, encompassing all attribute information of a category. The visual features generated in this manner do not resemble real visual features, which may lack certain attribute information. Merely complete visual features are insufficient for effectively training classifiers to categorize real visual features. In light of this, we propose a method for extending semantic attributes with the aim of generating diversified features that simulate the actual distribution of real visual features. This is intended to enhance the performance of classifiers within object detection models. We tested our method on two common object detection datasets: MS COCO and PASCAL VOC.
J. Han, D. Zhang, G. Cheng, N. Liu, and D. Xu, “Advanced deep-learning techniques for salient and category-specific object detection: A survey”, IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 84–100, 2018.
Y. Yuan, X. Liang, X. Wang, D.-Y. Yeung, and A. Gupta, “Temporal dynamic graph lstm for action-driven video object detection”, in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1801–1810.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks”, Advances in neural information processing systems, vol. 28, 2015.
Z. Liu, Y. Lin, Y. Cao, et al., “Swin transformer: Hierarchical vision transformer using shifted windows”, in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 012–10 022.
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “Endto-end object detection with transformers”, in European conference on computer vision, Springer, 2020, pp. 213–229.
J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully convolutional networks”, Advances in neural information processing systems, vol. 29, 2016.
W. Liu, D. Anguelov, D. Erhan, et al., “Ssd: Single shot multibox detector”, in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, 2016, pp. 21–37.
J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271.
A. Bansal, K. Sikka, G. Sharma, R. Chellappa, and A. Divakaran, “Zero-shot object detection”, in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 384–400.
N. Hayat, M. Hayat, S. Rahman, S. Khan, S. W. Zamir, and F. S. Khan, “Synthesizing the unseen for zero-shot object detection”, in Proceedings of the Asian Conference on Computer Vision, 2020.
S. Rahman, S. Khan, and F. Porikli, “Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts”, in Asian Conference on Computer Vision, Springer, 2018, pp. 547–563.
P. Zhu, H. Wang, and V. Saligrama, “Don’t even look once: Synthesizing features for zero-shot detection”, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 693–11 702.
I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative adversarial nets”, Advances in neural information processing systems, vol. 27, 2014.
P. Huang, J. Han, D. Cheng, and D. Zhang, “Robust region feature synthesizer for zeroshot object detection”, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7622–7631.
S. Sarma, S. Kumar, and A. Sur, “Resolving semantic confusions for improved zero-shot detection”, British Machine Vision Conference, 2022.
F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815–823.
X. Zhao, Y. Shen, S. Wang, and H. Zhang, “Boosting generative zero-shot learning by synthesizing diverse features with attribute augmentation”, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 2022, pp. 3454–3462.
M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, “Zero-shot learning with semantic output codes”, Advances in neural information processing systems, vol. 22, 2009.
C. H. Lampert, H. Nickisch, and S. Harmeling, “Learning to detect unseen object classes by between-class attribute transfer”, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 951–958. DOI: 10.1109/CVPR.2009.5206594.
Z. Zhang and V. Saligrama, “Zero-shot learning via semantic similarity embedding”, in Proceedings of the IEEE international conference on computer vision, 2015, pp. 4166– 4174.
Z. Fu, T. Xiang, E. Kodirov, and S. Gong, “Zero-shot object recognition by semantic manifold distance”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2635–2644.
Y. Xian, Z. Akata, G. Sharma, Q. Nguyen, M. Hein, and B. Schiele, “Latent embeddings for zero-shot classification”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 69–77.
A. Frome, G. S. Corrado, J. Shlens, et al., “Devise: A deep visual-semantic embedding model”, Advances in neural information processing systems, vol. 26, 2013.
W.-L. Chao, S. Changpinyo, B. Gong, and F. Sha, “An empirical study and analysis of generalized zero-shot learning for object recognition in the wild”, in Computer Vision– ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, Springer, 2016, pp. 52–68.
Y. Xian, C. H. Lampert, B. Schiele, and Z. Akata, “Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly”, IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 9, pp. 2251–2265, 2018.
B. Romera-Paredes and P. Torr, “An embarrassingly simple approach to zero-shot learning”, in International conference on machine learning, PMLR, 2015, pp. 2152–2161.
Y. Annadani and S. Biswas, “Preserving semantic relations for zero-shot learning”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7603–7612.
L. Zhang, T. Xiang, and S. Gong, “Learning a deep embedding model for zero-shot learning”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2021–2030.
S. Liu, M. Long, J. Wang, and M. I. Jordan, “Generalized zero-shot learning with deep calibration network”, Advances in neural information processing systems, vol. 31, 2018.
H. Jiang, R. Wang, S. Shan, and X. Chen, “Learning class prototypes via structure alignment for zero-shot recognition”, in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 118–134.
V. K. Verma, G. Arora, A. Mishra, and P. Rai, “Generalized zero-shot learning via synthesized examples”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4281–4289.
E. Schonfeld, S. Ebrahimi, S. Sinha, T. Darrell, and Z. Akata, “Generalized zero-and few-shot learning via aligned variational autoencoders”, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8247–8255.
S. Chen, W. Wang, B. Xia, et al., “Free: Feature refinement for generalized zero-shot learning”, in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 122–131.
R. Keshari, R. Singh, and M. Vatsa, “Generalized zero-shot learning via over-complete distribution”, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13 300–13 308.
K. Li, M. R. Min, and Y. Fu, “Rethinking zero-shot learning: A conditional visual classification perspective”, in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3583–3592.
M. R. Vyas, H. Venkateswara, and S. Panchanathan, “Leveraging seen and unseen semantic relationships for generative zero-shot learning”, in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16, Springer, 2020, pp. 70–86.
Y. Xian, S. Sharma, B. Schiele, and Z. Akata, “F-vaegan-d2: A feature generating framework for any-shot learning”, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10 275–10 284.
Y. Yu, Z. Ji, J. Han, and Z. Zhang, “Episode-based prototype generating network for zero-shot learning”, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 14 035–14 044.
Y. Shen, J. Qin, L. Huang, L. Liu, F. Zhu, and L. Shao, “Invertible zero-shot recognition flows”, in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16, Springer, 2020, pp. 614–631.
B. Demirel, R. G. Cinbis, and N. Ikizler-Cinbis, “Zero-shot object detection by hybrid region embedding”, British Machine Vision Conference, 2018.
Z. Li, L. Yao, X. Zhang, X. Wang, S. Kanhere, and H. Zhang, “Zero-shot object detection with textual descriptions”, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 8690–8697.
S. Rahman, S. Khan, and N. Barnes, “Improved visual-semantic alignment for zeroshot object detection”, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 11 932–11 939.
S. Zhao, C. Gao, Y. Shao, et al., “Gtnet: Generative transfer network for zero-shot object detection”, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 12 967–12 974.
Y. Zheng, R. Huang, C. Han, X. Huang, and L. Cui, “Background learnable cascade for zero-shot object detection”, in Proceedings of the Asian Conference on Computer Vision, 2020.
Z. Sun, M. Lin, X. Sun, Z. Tan, H. Li, and R. Jin, “Mae-det: Revisiting maximum entropy principle in zero-shot nas for efficient object detection”, in International Conference on Machine Learning, PMLR, 2022.
T.-Y. Lin, M. Maire, S. Belongie, et al., “Microsoft coco: Common objects in context”, in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer, 2014, pp. 740–755.
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge”, International journal of computer vision, vol. 88, pp. 303–308, 2009.
S. Rahman, S. Khan, and N. Barnes, “Polarity loss for zero-shot object detection”, arXiv preprint arXiv:1811.08982, 2018.
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification”, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, M. Lapata, P. Blunsom, and A. Koller, Eds., Valencia, Spain: Association for Computational Linguistics, Apr. 2017, pp. 427–431. [Online]. Available: https://aclanthology.org/E17-2068.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans”, Advances in neural information processing systems, vol. 30, 2017.
K. Chen, J. Wang, J. Pang, et al., “MMDetection: Open mmlab detection toolbox and benchmark”, arXiv preprint arXiv:1906.07155, 2019.
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization.”, in ICLR (Poster), Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available: http://dblp.unitrier.de/db/conf/iclr/iclr2015.html#KingmaB14.
S. Rahman, S. Khan, and F. Porikli, “Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts”, in Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part I 14, Springer, 2019, pp. 547–563.