研究生: 陳俊彥
Chen, Chun-Yen
論文名稱: 整合全局場景與局部注意的自監督多標籤分類
From Whole to Parts: Integrating Global Context and Local Attention for Self-Supervised Multi-Label Classification
指導教授: 葉梅珍
Yeh, Mei-Chen
口試委員: 王鈺強
Wang, Yu-Chiang
Kang, Li-Wei
Yeh, Mei-Chen
口試日期: 2023/07/24
學位類別: 碩士
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 43
中文關鍵詞: 自監督學習對比學習多標籤分類
英文關鍵詞: Self-supervised learning, Contrastive learning, Multi-label classification
DOI URL: http://doi.org/10.6345/NTNU202301210
論文種類: 學術論文
  • 自監督學習在各種計算機視覺任務中取得了顯著的成果,證明了其在廣泛應用中的有效性。然而,儘管取得了這些成功,針對多標籤分類的挑戰的研究工作仍相對有限。該領域尚待深入探討,需要進一步研究以充分利用自監督學習技術進行多標籤分類任務。

    Self-supervised learning has shown promising results in various computer vision tasks, proving its effectiveness in a wide range of applications. However, despite these successes, there has been limited work specifically addressing the challenges of multi-label classification. This area remains relatively underexplored, and further research is needed to fully harness the potential of self-supervised learning techniques for multi-label classification tasks.
    In this paper, we present a multi-level representation learning (GOLANG) framework for self-supervised multi-label classification, which captures the image context and object information simultaneously. Our approach combines global context learning and local alignment to capture different levels of semantic information in images. The global context learning module learns from the whole image, while the local alignment module eliminates object-irrelevant nuisances by learning where to learn.
    By integrating both modules, our model effectively learns diverse levels of semantic information to facilitate the multi-label classification task. To further enhance the model's ability to extract object-scene relationships, we introduce cross-level prediction, which effectively captures the intricate interplay between various objects and scenes within images. The GOLANG framework demonstrates state-of-the-art performance on self-supervised multi-label classification tasks, highlighting its effectiveness in capturing the intricate relationships between multiple objects and scenes in images.

    1. Introduction 1 2. Related Work 6 2.1 Self-supervised learning 6 2.2 Multi-label classification 11 3. Method 14 3.1 Global context 18 3.2 Local attention 23 3.2.1 Per-pixel projection 25 3.2.2 Shuffling the local views 28 3.3 Cross-view prediction 29 3.4 Loss function 31 4. Experiments 33 4.1 Linear evaluation 34 4.2 Transfer learning to other downstream tasks 36 4.3 Ablation study 36 5. Conclusion 40 Reference 41

