國立臺灣師範大學博碩士論文全文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳冠穎 Chen, Guan-Ying
論文名稱：	深度視覺語義嵌入模型於生成式多標籤零樣本學習 Deep Visual-Semantic Embedding Model for Generative Multi-Label Zero-Shot Learning
指導教授：	葉梅珍 Yeh, Mei-Chen
口試委員：	葉梅珍 Yeh, Mei-Chen 陳祝嵩 Chen, Chu-Song 彭彥璁
口試日期：	2021/07/30
學位類別：	碩士 Master
系所名稱：	資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	37
中文關鍵詞：	多標籤、零樣本學習、視覺語義嵌入模型、生成對抗網路
英文關鍵詞：	Multi-Label, Zero-Shot Learning, visual semantic embedding model, GAN, generative adversarial network
研究方法:	實驗設計法
DOI URL：	http://doi.org/10.6345/NTNU202101371
論文種類：	學術論文
相關次數：	點閱：459 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

零樣本學習是指分類器不只能識別在訓練階段已經看過的物件，甚至能識
別未曾看過的物件，而在多標籤零樣本學習中，每個實例中可能出現不只一個
物件，這使得識別任務變得更加困難。
　　過去的方法常利用標籤的屬性嵌入(attributes embedding)及影像抽取出的
視覺特徵(visual feature)，投影到同一空間中，藉此尋找與影像特徵最接近的
標籤，或是利用知識圖譜、知識庫建構標籤之間的關係，根據此關係來幫助辨
識標籤。然而在資料集欠缺屬性嵌入時，常用於替代的語義嵌入(word mbedding)並不像屬性嵌入一樣具有良好的辨識力，而建構關係的方法，也容易太過信任知識庫，便將關係強加上去，忽略了影像本身包含的資訊。近年來由於生成對抗網路(Generative Adversarial Network)的興起，對於未知類別，先從已知類別學習影像特徵的表達式及對應的屬性，再由屬性標籤生成影像特徵變得更加有效率，結果也更準確。基於這項觀察，我們提出了生成對抗網路結合語義嵌入的深度學習模型，從語義嵌入生成影像特徵，以及將影像特徵轉換成分類器映射至語義嵌入空間，尋找屬於該影像的標籤。藉由影像特徵及語義嵌入互相映射來更好地預測未知類別，並根據影像特徵與分類器之間的關係，將多標籤任務轉換化成單標籤任務。

附表目錄　iv
附圖目錄　v

第一章  簡介　1
1	研究背景1
2	研究動機2
3	研究目的3
4	論文架構4

第二章  相關研究探討　5
1	零樣本學習5
1.1	基於語義嵌入-DeViSE方法	6
1.2	基於語義自編碼方法-SAE	7
2	多標籤任務8
3	語義嵌入8
4	AutoEncoder	9
4.1	VAE	10
5	生成對抗網路	10
5.1	CGAN	11
5.2	WGAN	11
5.3	VAEGAN	14
6	Relation to previous methods	15

第三章  方法與步驟　16
1	問題定義	16
2	模型架構	17
2.1	辨別器	17
2.2	生成器	17
2.3	多標籤分類器	20

第四章  實驗結果　21
1	資料集	21
2	評估方式	21
3	Ablation study	23
4	實驗一 VOC2007  ZSL	24
5	實驗二 VOC2007  GZSL	26
6	實驗三 NUS-WIDE  ZSL	28
7	實驗四 NUS-WIDE  GZSL	30
8	實驗分析	32

第五章  結論	33
參考著作	34
                                

[1] Wei Wang, Vincent W. Zheng, Han Yu, and Chunyan Miao. A Survey of Zero-Shot Learning: Settings, Methods, and Applications. ACM Trans. Intell. Syst. Technol.10, 2, Article 13, 2019.

[2] Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, Tomas Mikolov. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS, 2013.

[3] Elyor Kodirov, Tao Xiang, Shaogang Gong. Semantic Autoencoder for Zero-Shot Learning. In CVPR, 2017.

[4] Mehdi Mirza, Simon Osindero. Conditional Generative Adversarial Nets. arXiv preprint arXiv:1411.1784, 2014.

[5] Martin Arjovsky, Léon Bottou. Towards Principled Methods for Training Generative Adversarial Networks. In ICLR, 2017.

[6] Martin Arjovsky, Soumith Chintala, Léon Bottou. Wasserstein Generative Adversarial Networks. arXiv preprint arXiv:1701.07875, 2017.

[7] Xianwen Yu, Xiaoning Zhang, Yang Cao and Min Xia. VAEGAN: A Collaborative Filtering Framework based onAdversarial Variational Autoencoders. In IJCAI, 2019.

[8] Yongqin Xian, Sauabh Sharma, Bernt Schiele, Zeynep Akata. f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning. In CVPR, 2019.

[9]Meng Ye, Yuhong Guo. Multi-Label Zero-Shot Learning with Transfer-AwareLabel Embedding Projection. arXiv preprint arXiv:1808.02474, 2018.

[10] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In ICLR Workshop, 2013.

[11] Jeffrey Pennington, Richard Socher, Christopher D. Manning. GloVe: Global Vectors for Word Representation. In EMNLP, 2014.

[12] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805, 2018.

[13] M.L.Menéndez, J.A.Pardo, L.Pardo, M.C.Pardo. The Jensen-Shannon divergence. J. Frankl. Inst. 1997.

[14] M. Everingham, L. Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 88(2):303–338, 2010.

[15] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhip-ing Luo, and Yantao Zheng. Nus-wide: a real-world web image database from national university of singapore. In CIVR, 2009.

[16] Y. Zhang, B. Gong, and M. Shah, Fast zero-shot image tagging. In CVPR, 2016.

[17] M. B. Sariyildiz and R. G. Cinbis, Gradient matching generative networks for zero-shot learning. In CVPR, 2019.

[18] J. Lu, J. Li, Z. Yan, and C. Zhang, Zero-shot learning by generating pseudo feature representations. arXiv:1703.06389, 2017.

[19] Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S Corrado, and Jeffrey Dean. Zero-shot learning by convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650, 2013.

[20] Y. Xian, Z. Akata, G. Sharma, Q. Nguyen, M. Hein, and B. Schiele. Latent embeddings for zero-shot classification. In CVPR, 2016.

[21] Y. Fu, Y. Yang, T. M. Hospedales, T. Xiang, and S. Gong. Transductive multi-label zero-shot learning. In BMVC, 2014.

[22] Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. Label-embedding for image classification. In TPAMI, 2015.

[23] Dat Huynh and Ehsan Elhamifar. A shared multi-attention framework for multi-label zero-shot learning. In CVPR, 2020.

[24] Akshita Gupta, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Ling Shao and Joost van de Weijer. Generative Multi-Label Zero-Shot Learning. arXiv preprint arXiv:2101.11606, 2021

電子全文延後公開
2026/09/07

簡易檢索 / 詳目顯示

相關論文