研究生: |
陳彥合 Chen, Yan-He |
---|---|
論文名稱: |
通過間接視覺語義對齊改進廣義零樣本學習的視覺表徵 Refining Visual Representation for Generalized Zero-Shot Learning via Soft Visual-Semantic Alignment |
指導教授: |
葉梅珍
Yeh, Mei-Chen |
口試委員: |
林嘉文
Lin, Chia-Wen 朱威達 Chu, Wei-Ta 葉梅珍 Yeh, Mei-Chen |
口試日期: | 2022/07/01 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 34 |
中文關鍵詞: | 廣義零樣本學習 、細粒度視覺辨識 、視覺語義嵌入 、間接對齊 、圓損失函數 |
英文關鍵詞: | Generalized Zero-Shot Learning, Fine-Grained Visual Recognition, Visual-Semantic Embedding, Soft Alignment, Circle Loss |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202200883 |
論文種類: | 學術論文 |
相關次數: | 點閱:110 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
我們探討廣義零樣本學習的問題,其任務是預測目標圖像的標籤,無論其標籤屬於可見類別或是未見類別。我們發現大多數方法都學習了一個聯合嵌入空間,其中圖像特徵及其相應的類原型是對齊的。由於視覺空間和語義空間之間的固有差距,這種直接對齊可能很困難。我們提出放寬對齊要求,避免在圖像和語意嵌入之間進行成對比較,來實現一個新的學習框架。我們提出的間接視覺語意對齊方法 (Soft Visual-Semantic Alignment),是通過對由精粹後的視覺特徵和目標類的類原型組成的連接特徵向量進行分類。此外我們使用圓損失(Circle Loss)來優化嵌入模型,該損失函數允許對不同的類內和類間相似性進行不同的懲罰強度。我們廣泛的實驗表明,間接對齊方式在學習區辨性和廣義視覺特徵方面更加靈活。我們證明了所提出方法的優越性,其性能與五個基準上的最新技術相當。
We address the problem of generalized zero-shot learning where the task is to predict the label of a target image whether its label belongs to the seen or unseen category. We find a majority of methods learn a joint embedding space where image features and their corresponding class prototypes are aligned. Such a direct alignment can be difficult, because of the inherent gap between the visual and the semantic space. We propose to relax the alignment requirement, accomplished by a learning framework that avoids performing pair-wise comparisons between the image and the class embeddings. The soft visual-semantic alignment is performed by classifying a concatenated feature vector consisting of the refined visual features and the class prototype of the target class. Furthermore, we employ circle loss to optimize the embedding model that allows different penalty strength on different within-class and between-class similarities. Our extensive experiments show that the indirect alignment manner is more flexible to learn discriminative and generalized visual features. We demonstrate the superiority of the proposed method with performance on par with the state of the art on five benchmarks.
[1] Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. ArXiv abs/1701.07875 (2017)
[2] Chandhok, S., Balasubramanian, V.N.: Two-level adversarial visual-semantic coupling for generalized zero-shot learning. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision. pp. 3100–3108 (2021)
[3] Dumoulin, V., Belghazi, I., Poole, B., Mastropietro, O., Lamb, A., Arjovsky, M., Courville, A.: Adversarially learned inference. arXiv preprint arXiv:1606.00704 (2016)
[4] Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1778–1785. IEEE (2009)
[5] Felix, R., Reid, I., Carneiro, G., et al.: Multi-modal cycle-consistent generalized zero-shot learning. In: Proceedings of the European Conference on Computer Vision. pp. 21–37 (2018)
[6] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014)
[7] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. Advances in Neural Information Processing Systems 30 (2017)
[8] Han, Z., Fu, Z., Chen, S., Yang, J.: Contrastive embedding for generalized zeroshot learning. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 2371–2381 (2021)
[9] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 770–778 (2016)
[10] Huynh, D., Elhamifar, E.: Fine-grained generalized zero-shot learning via dense attribute-based attention. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 4483–4493 (2020)
[11] Ji, Z., Fu, Y., Guo, J., Pang, Y., Zhang, Z.M., et al.: Stacked semantics-guided attention model for fine-grained zero-shot learning. Advances in Neural Information Processing Systems 31 (2018)
[12] Jiang, H., Wang, R., Shan, S., Chen, X.: Transferable contrastive network for generalized zero-shot learning. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 9765–9774 (2019)
[13] Keshari, R., Singh, R., Vatsa, M.: Generalized zero-shot learning via over-complete distribution. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 13300–13308 (2020)
[14] Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
[15] Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: Proceedings of the AAAI Conference on Artificial Intelligence (2008)
[16] Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., Huang, Z.: Leveraging the invariant side of generative zero-shot learning. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 7402–7411 (2019)
[17] Chen, S., Wang, W., Xia, B., Peng, Q., You, X., Zheng, F., Shao, L.: Free: Feature refinement for generalized zero-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 122–131 (2021)
[18] Li, X., Xu, Z., Wei, K., Deng, C.: Generalized zero-shot learning via disentangled representation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 35, pp. 1966–1974 (2021)
[19] Long, Y., Liu, L., Shao, L., Shen, F., Ding, G., Han, J.: From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (2017)
[20] Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research 9(11) (2008)
[21] Min, S., Yao, H., Xie, H., Wang, C., Zha, Z.J., Zhang, Y.: Domain-aware visual bias eliminating for generalized zero-shot learning. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 12664– 12673 (2020)
[22] Narayan, S., Gupta, A., Khan, F.S., Snoek, C.G., Shao, L.: Latent embedding feedback and discriminative features for zero-shot classification. In: Proceedings of the European Conference on Computer Vision. pp. 479–495. Springer (2020)
[23] Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Indian Conference on Computer Vision, Graphics and Image Processing (Dec 2008)
[24] Pambala, A., Dutta, T., Biswas, S.: Generative model with semantic embedding and integrated classifier for generalized zero-shot learning. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision. pp. 1237–1246 (2020)
[25] Patterson, G., Hays, J.: Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2751–2758. IEEE (2012)
[26] Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of finegrained visual descriptions. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 49–58 (2016)
[27] Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., Wei, Y.: Circle loss: A unified perspective of pair similarity optimization. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 6398–6407 (2020)
[28] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Advances in Neural Information Processing Systems 30 (2017)
[29] Chen, S., Xie, G., Liu, Y., Peng, Q., Sun, B., Li, H., You, X., Shao, L.: Hsva: Hierarchical semantic-visual adaptation for zero-shot learning. Advances in Neural Information Processing Systems 34, 16622–16634 (2021)
[30] Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Tech. rep. (2011)
[31] Xian, Y., Lampert, H.C., Schiele, B., Akata, Z.: Zero-shot learning - a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018)
[32] Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zeroshot learning. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 5542–5551 (2018)
[33] Xian, Y., Sharma, S., Schiele, B., Akata, Z.: f-vaegan-d2: A feature generating framework for any-shot learning. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 10275–10284 (2019)
[34] Yue, Z., Wang, T., Sun, Q., Hua, X.S., Zhang, H.: Counterfactual zero-shot and open-set visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 15404–15414 (2021)
[35] Liu, J., Shi, C., Tu, D., Shi, Z., Liu, Y.: Zero-shot image classification based on a learnable deep metric. Sensors 21(9), 3241 (2021)
[36] Zhu, Y., Xie, J., Tang, Z., Peng, X., Elgammal, A.: Semantic-guided multiattention localization for zero-shot learning. Advances in Neural Information Processing Systems 32 (2019)
[37] Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision. pp. 2794–2802 (2017)
[38] Skorokhodov, I., Elhoseiny, M.: Class normalization for (continual)? generalized zero-shot learning. In: International Conference on Learning Representations (2021), https://openreview.net/forum?id=7pgFL2Dkyyy
[39] Feng, Y., Huang, X., Yang, P., Yu, J., Sang, J.: Non-generative generalized zero-shot learning via task-correlated disentanglement and controllable samples synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9346–9355 (2022)
[40] Xu, W., Xian, Y., Wang, J., Schiele, B., Akata, Z.: Vgse: Visually-grounded semantic embeddings for zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9316–9325 (2022)