簡易檢索 / 詳目顯示

研究生: 林詠閎
Lin, Yung-Hung
論文名稱: 具泛化能力的槽位表示之自監督學習
Self-Supervised Learning with Generalized Slot Representation
指導教授: 葉梅珍
Yeh, Mei-Chen
口試委員: 葉梅珍
Yeh, Mei-Chen
吳志強
Wu, Jhih-Ciang
陳柏琳
Chen, Ber-Lin
口試日期: 2025/01/02
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 37
中文關鍵詞: 自監督學習局部增強物件偵測實例分割
英文關鍵詞: Self-supervised learning, Local augmentation, Object detection, Instance segmentation
論文種類: 學術論文
相關次數: 點閱:10下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自監督的發展在近年來受到了巨大的關注,其中對比式學習透過拉近同一圖像的另一視圖並推遠來自其他圖像的視圖,從而從未具標註的資料中學習表徵。近年以場景為主的影像資料集也開始被使用於預訓練,並且著重局部的學習可以在場景資料集表現得更好,這些方法大多依賴密集的匹配機制或是透過Selective Search找出可能物件,最近的方法透過將像素進行分群學習(稱作槽位),將相同語義的像素分配到同一槽位內,並讓習得的槽位可以隨著資料進行調整。我們發現全局的增強方法無法針對槽位調整,因此,我們提出了一種局部的特徵增強方法,透過對每個槽位進行特徵級別的增強,使槽位可以學習到資料的更多變化與型態,以提升泛化能力。
    我們在物件偵測、語義分割、多標籤分類等下游任務上評估我們所開發的自監督方法的性能,我們引入的方法不會增加訓練參數,並且在各個下游任務的表現上都有所提升。

    Self-supervise learning has received a great deal of attention in recent years. In particular, contrastive learning learns representations by pulling another view of the same image closer to it and pushing away the view from the other image. Recently, scene-centric image datasets have been used for pre-training as region-level learning performs better than image-level learning with the scene-centric images, and most of these methods rely on dense matching mechanisms or Selective Search to find possible objects. Recent methods learn by clustering pixels into groups (referred to as slots), assigning pixels with the same semantics to the same slot, and allowing the learned slots to adjust dynamically with the data.We propose a local feature augmentation method that enhances features at the slot level, the semantic slots can learn more variations and patterns of the data to improve the generalization ability.
    We evaluate the performance of the feature-enhanced self-supervised approach on downstream tasks such as object detection, semantic segmentation, and multi-label classification. The proposed approach does not increase the training parameters and improves on each downstream task.

    摘要 i Abstract ii 表格目錄 v 附圖目錄 i 第一章 緒論 1 1.1研究背景 1 1.2研究動機 3 1.3研究架構 4 第二章 相關研究 6 2.1自監督學習 6 2.1.1圖像級對比式自監督學習 6 2.1.2像素級對比式自監督學習 7 2.1.3區域級對比式自監督學習 8 2.1.4物體級對比式自監督學習 8 2.1.5 圖像重建式自監督學習 10 2.2特徵增強 10 第三章 方法 12 3.1模型架構 12 3.2 Slot Attention 13 3.3 Semantic Grouping with pixel level 14 3.4 Slot-level Feature Augment 16 3.5 Loss Function 17 第四章 實驗 18 4.1實驗設置 18 4.1.1資料集 18 4.1.2評估指標 19 4.1.3實作細節 20 4.1.4評估設置 21 4.2實驗結果 22 4.3消融研究 25 4.3.1局部感知特徵增強 25 4.3.2梯度影響 25 4.3.3 額外語義分群 26 4.3.4增強槽位選擇 27 4.3.5語義槽位視覺化 27 4.3.6 Slot Embedding視覺化 28 第五章 結論 30 參考文獻 31

    [1] T. Chen, S. Kornblith, M. Norouzi, G. Hinton, “A simple framework for contrastive learning of visual representations,” ICML, 2020.
    [2] K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, “Momentum contrast for unsupervised visual representation learning,” CVPR, 2020.
    [3] Jean-Bastien Grill, Florian Strub, Florent Altche, Corentin ´ Tallec, Pierre H Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, et al.”Bootstrap your own latent: A new approach to self-supervised learning,“ NeurIPS, 2020.
    [4] Xinlei Chen and Kaiming He. “Exploring simple siamese representation learning,” CVPR, 2021.
    [5] M.Caron, I.Misra, J. Mairal, P. Goyal, P. Bojanowski2, A. Joulin,” Unsupervised Learning of Visual Features by Contrasting Cluster Assignments,”NeurIPS, 2020.
    [6] M. Caron, P. Bojanowski, A. Joulin, M. Douze, “Deep Clustering for Unsupervised Learning of Visual Features,”ECCV, 2018.
    [7] J. Zbontar, L. Jing, I. Misra, Y. LeCun, S. Deny,” Barlow Twins: Self-Supervised Learning via Redundancy Reduction,”ICML,2021.
    [8] A. Bardes, J. Ponce, Y. LeCun,” VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning,”ICLR,2022
    [9] T. Huynh, S. Kornblith, M. R. Walter, M. Maire, M. Khademi,” Boosting Contrastive Self-Supervised Learning with False Negative Cancellation,” WACV,2022.
    [10] C. J Reed , S. Metzger , A. Srinivas , T. Darrell , K. Keutzer,” SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning,”CVPR,2021.
    [11] Mingkai Zheng, Shan You, Fei Wang, Chen Qian, Changshui Zhang, Xiaogang Wang, Chang Xu,” ReSSL: Relational Self-Supervised Learning with Weak Augmentation,”Neurips,2021.
    [12] Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman,” With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations,”CVPR,2021.
    [13] Soroush Abbasi Koohpayegani, Ajinkya Tejankar, Hamed Pirsiavash,” Mean Shift for Self-Supervised Learning,”ICCV2021.
    [14] Chenxin Tao, Honghui Wang, Xizhou Zhu, Jiahua Dong, Shiji Song, Gao Huang, Jifeng Dai,” Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework,”CVPR,2022.
    [15] Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin,” Emerging Properties in Self-Supervised Vision Transformers,”CVPR,2021.
    [16] Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei Li,” Dense Contrastive Learning for Self-Supervised Visual Pre-Training,”CVPR,2021.
    [17] Zhenda Xie, Yutong Lin, Zheng Zhang, Yue Cao, Stephen Lin, Han Hu,” Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning,”CVPR,2021.
    [18] Zhaoqing Wang, Qiang Li, Guoxin Zhang, Pengfei Wan, Wen Zheng, Nannan Wang, Mingming Gong, Tongliang Liu,” Exploring Set Similarity for Dense Self-supervised Representation Learning,”CVPR,2022.
    [19] Ashraful Islam, Ben Lundell, Harpreet Sawhney, Sudipta Sinha, Peter Morales, Richard J. Radke,” Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation,”WACV,2023.
    [20] Enze Xie, Jian Ding, Wenhai Wang, Xiaohang Zhan, Hang Xu, Peize Sun, Zhenguo Li, Ping Luo,” DetCo: Unsupervised Contrastive Learning for Object Detection,” ICCV,2021.
    [21] Yucheng Zhao, Guangting Wang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha,” Self-Supervised Visual Representations Learning by Contrastive Mask Prediction,” ICCV,2021.
    [22] Ceyuan Yang, Zhirong Wu, Bolei Zhou, Stephen Lin,” Instance Localization for Self-supervised Detection Pretraining,”CVPR,2021.
    [23] Tete Xiao, Colorado J Reed, Xiaolong Wang, Kurt Keutzer, Trevor Darrell,” Region Similarity Representation Learning,”ICCV,2021.
    [24] Yufei Xu, Qiming Zhang, Jing Zhang, Dacheng Tao,” RegionCL: Can Simple Region Swapping Contribute to Contrastive Learning?,”ECCV,2022.
    [25] Byungseok Roh, Wuhyun Shin, Ildoo Kim, Sungwoong Kim,” Spatially Consistent Representation Learning,”CVPR,2021.
    [26] Fangyun Wei, Yue Gao, Zhirong Wu, Han Hu, Stephen Lin,” Aligning Pretraining for Detection via Object-Level Contrastive Learning,”NeurIPS,2021.
    [27] Jiahao Xie, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy,” Unsupervised Object-Level Representation Learning from Scene Images,” Neurips,2021.
    [28] Olivier J. Hénaff, Skanda Koppula, Jean-Baptiste Alayrac, Aaron van den Oord, Oriol Vinyals, João Carreira,” Efficient Visual Pretraining with Contrastive Detection,” ICCV,2021.
    [29] Ramprasaath R. Selvaraju, Karan Desai, Justin Johnson, Nikhil Naik,” CASTing Your Model: Learning to Localize Improves Self-Supervised Representations,” CVPR,2021.
    [30] Duc Tam Nguyen, Maximilian Dax, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Zhongyu Lou, Thomas Brox,” DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision” Neurips2019
    [31] Xiangyu Peng, Kai Wang, Zheng Zhu, Mang Wang, Yang You,” Crafting Better Contrastive Views for Siamese Representation Learning,”CVPR,2022.
    [32] Zhaowen Li, Yousong Zhu, Fan Yang, Wei Li, Chaoyang Zhao, Yingying Chen, Zhiyang Chen, Jiahao Xie, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang,” UniVIP: A Unified Framework for Self-Supervised Visual Pre-training,”CVPR,2022.
    [33] Lang Huang, Shan You, Mingkai Zheng, Fei Wang, Chen Qian, Toshihiko Yamasaki,” Learning Where to Learn in Cross-View Self-Supervised Learning,” CVPR,2022.
    [34] Xin Wen, Bingchen Zhao, Anlin Zheng, Xiangyu Zhang, Xiaojuan Qi,” Self-Supervised Visual Representation Learning with Semantic Grouping” Neurips 2022.
    [35] Kaiyou Song, Shan Zhang, Zihao An, Zimeng Luo, Tong Wang, Jin Xie,” Semantics-Consistent Feature Search for Self-Supervised Visual Representation Learning,”ICCV,2023.
    [36] Ke Zhu, Minghao Fu, Jianxin Wu,” Multi-Label Self-Supervised Learning with Scene Images,”ICCV,2023.
    [37] Thalles Santos Silva, Helio Pedrini, Adín Ramírez Rivera,” Self-supervised Learning of Contextualized Local Visual Embeddings,”ICCV,2023.
    [38] Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu,” SimMIM: A Simple Framework for Masked Image Modeling,” CVPR,2022.
    [39] Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, Christoph Feichtenhofer,” Masked Feature Prediction for Self-Supervised Visual Pre-Training,” CVPR,2022.
    [40] Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick,” Masked Autoencoders Are Scalable Vision Learners,”CVPR,2022.
    [41] Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg,” Point-Level Region Contrast for Object Detection Pre-Training,”CVPR,2022.
    [42] Mehdi Noroozi, Paolo Favaro,” Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles,”ECCV,2016.
    [43] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei,” ImageNet Large Scale Visual Recognition Challenge,”arxiv,2014.
    [44] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár,” Microsoft COCO: Common Objects in Context,”arxiv,2014.
    [45] Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf,” Object-Centric Learning with Slot Attention,”NeurIPS,2020.
    [46] Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz,” mixup: Beyond Empirical Risk Minimization,”ICLR,2018.
    [47] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo, “ CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features,”ICCV,2019.
    [48] Jungsoo Lee, Eungyeup Kim, Juyoung Lee, Jihyeon Lee, Jaegul Choo,” Learning Debiased Representation via Disentangled Feature Augmentation,”NeurIPS,2021.
    [49] Boyao Shi, Wenbin Li, Jing Huo, Pengfei Zhu, Lei Wang, Yang Gao,” Global- and local-aware feature augmentation with semantic orthogonality for few-shot image classification,”arxiv,2023.
    [50] Jianlong Yuan, Qian Qi, Fei Du, Zhibin Wang, Fan Wang, Yifan Liu,” FAKD: Feature Augmented Knowledge Distillation for Semantic Segmentation,”WACV,2024.
    [51] Pan Li, Da Li, Wei Li, Shaogang Gong, Yanwei Fu,” A Simple Feature Augmentation for Domain Generalization,”ICCV2021.
    [52] Chun-Yen Chen, Mei-Chen Yeh,” Self-Supervised Multi-Label Classification with Global Context and Local Attention,”ICMR,2024.
    [53] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun,” Deep Residual Learning for Image Recognition,”CVPR,2016.
    [54] Yang You, Igor Gitman, and Boris Ginsburg. “ Large Batch Training of Convolutional Networks,”arXiv,2017.
    [55] Ilya Loshchilov, Frank Hutter,” SGDR: Stochastic Gradient Descent with Warm Restarts,”arxiv,2016.
    [56] Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, Andrew Zisserman, “The Pascal Visual Object Classes Challenge: A Retrospective,”arxiv,2007.
    [57] Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, Ross Girshick, “Detectron2. https://github.com/facebookresearch/detectron2,”2019.
    [58] Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, “Mask R-CNN,” ICCV,2017.
    [59] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun,” Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,”NeurIPS2015.
    [60] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie,” Feature Pyramid Networks for Object Detection,”CVPR,2017.
    [61] Junran Peng, Ming Sun, Zhaoxiang Zhang, Tieniu Tan, Junjie Yan,” POD: Practical Object Detection with Scale-Sensitive Network,”ICCV,2019.
    [62] Spyros Gidaris, Praveer Singh, Nikos Komodakis,” Unsupervised Representation Learning by Predicting Image Rotations,”ICLR,2018.
    [63] Laurens Van der Maaten and Geoffrey Hinton.”Visualizing data using t-sne.” JMLR,2008.

    無法下載圖示 電子全文延後公開
    2030/01/02
    QR CODE