研究生: |
吳建霖 Wu, Jian-Lin |
---|---|
論文名稱: |
用於理解和比較變壓器模型的可視化分析系統 A Visual Analytics System for Understanding and Comparing Transformer Models |
指導教授: |
王科植
Wang, Ko-Chih |
口試委員: |
王科植
Wang, Ko-Chih 張鈞法 Chang, Chun-Fa 林士勛 Lin, Shih-Syun |
口試日期: | 2022/09/01 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 41 |
英文關鍵詞: | Visualization, Transformer |
DOI URL: | http://doi.org/10.6345/NTNU202201639 |
論文種類: | 學術論文 |
相關次數: | 點閱:83 下載:5 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,自然語言處理(NLP)技術取得了長足的進步。基於轉換器的模型在 各種自然語言處理問題中表現良好。然而,一個自然語言任務可以由多個不同的模 型來完成,它們的架構略有不同,例如不同的層數和注意力頭。除了量化指標作為 選擇模型的依據外,很多用戶還考慮了理解模型語言的能力以及它所需要的計算資 源。然而,對兩個不同層數和注意力頭的基於transformer的模型進行比較和深入的 分析並不容易,因為它缺乏模型之間固有的一對一匹配。因此,當用戶為NLP 任務 訓練、選擇或改進模型時,比較具有不同架構的模型是一項至關重要且具有挑戰性 的任務。在本文中,我們提出了一個可視化分析系統來探索語言模型之間的差異, 並幫助用戶選擇模型或找出模型可以改進的地方。我們的系統支持兩個模型的比 較,用戶可以交互地探索不同模型下的特定層或頭部,並識別異同。使用我們的工 具,用戶不僅可以通過模型學習到哪些語言特徵,還可以深入分析兩個不同層數和 頭的基於轉換器的模型之間的細微差別。用戶的用例和反饋表明,我們的工具可以 幫助人們深入了解並促進模型比較任務。
In recent years, natural language processing (NLP) technology has made great progress. Models based on transformers have performed well in various natural language processing problems. However, a natural language task can be done by multiple different models with slightly difference architectures, such as different number of layers and attention heads. In addition to quantitative indicators as the basis for selecting models, many users also consider the ability of understanding the language of the model and the computing resources it requires. However, comparably and deeply analyze two transformer-based models with difference number of layers and attention heads are not easy because it is lacks of the inherent one to one match between models. So comparing models with different architectures is a crucial and challenging task when users train, select or improve models for their NLP tasks. In this paper, we propose a visual analysis system to explore the differences between language models and help user to select model or find out where the model could be improve. Our system supports the comparison of two models and users can interactively explore specific layers or heads under different models and identify the similarities or differences. With our tool, users can not only what linguistic features are learned by the model, but also deeply analyze the subtle difference between two transformer-based model with different number of layers and heads. The use cases and feedback from users show that our tool can help people gain insight into and facilitate model comparison task.
[1] Mart´ın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. {TensorFlow}: a system for {Large-Scale} machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pages 265–283, 2016.
[2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
[3] Hangbo Bao, Li Dong, and Furu Wei. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
[4] Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A neural´ probabilistic language model. The journal of machine learning research, 3:1137–1155, 2003.
[5] Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi¨ Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
[6] Ronan Collobert and Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160–167, 2008.
[7] Joseph F DeRose, Jiayao Wang, and Matthew Berger. Attention flows: Analyzing and comparing attention mechanisms in language models. IEEE Transactions on Visualization and Computer Graphics, 27(2):1160–1170, 2020.
[8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[9] Alex Endert, William Ribarsky, Cagatay Turkay, BL William Wong, Ian Nabney, I D´ıaz Blanco, and Fabrice Rossi. The state of the art in integrating machine learning into visual analytics. In Computer Graphics Forum, volume 36, pages 458–486. Wiley Online Library, 2017.
[10] Yaru Hao, Li Dong, Furu Wei, and Ke Xu. Visualizing and understanding the effectiveness of bert. arXiv preprint arXiv:1908.05620, 2019.
[11] Mickel Hoang, Oskar Alija Bihorac, and Jacobo Rouces. Aspect-based sentiment analysis using bert. In Proceedings of the 22nd nordic conference on computational linguistics, pages 187–196, 2019.
[12] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory.¨ Neural computation, 9(8):1735–1780, 1997.
[13] Benjamin Hoover, Hendrik Strobelt, and Sebastian Gehrmann. exbert: A visual analysis tool to explore learned representations in transformers models. arXiv preprint arXiv:1910.05276, 2019.
[14] Guan Li, Junpeng Wang, Han-Wei Shen, Kaixin Chen, Guihua Shan, and Zhonghua Lu. Cnnpruner: Pruning convolutional neural networks with visual analytics. IEEE Transactions on Visualization and Computer Graphics, 27(2):1364–1373, 2020.
[15] Yiran Li, Takanori Fujiwara, Yong K Choi, Katherine K Kim, and Kwan-Liu Ma. A visual analytics system for multi-model comparison on clinical data predictions. Visual Informatics, 4(2):122–131, 2020.
[16] Mengchen Liu, Jiaxin Shi, Zhen Li, Chongxuan Li, Jun Zhu, and Shixia Liu. Towards better analysis of deep convolutional neural networks. IEEE transactions on visualization and computer graphics, 23(1):91–100, 2016.
[17] Shixia Liu, Xiting Wang, Mengchen Liu, and Jun Zhu. Towards better analysis of machine learning models: A visual analytics perspective. Visual Informatics, 1(1):48– 56, 2017.
[18] Zihan Liu, Feijun Jiang, Yuxiang Hu, Chen Shi, and Pascale Fung. Ner-bert: a pre-trained model for low-resource entity tagging. arXiv preprint arXiv:2112.00405, 2021.
[19] Pei-Shan Lo, Jian-Lin Wu, Syu-Ting Deng, and Ko-Chih Wang. Cnervis: a visual diagnosis tool for chinese named entity recognition. Journal of Visualization, 25(3):653–669, 2022.
[20] Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernock´ y, and Sanjeev Khudanpur.` Recurrent neural network based language model. In Interspeech, volume 2, pages 1045–1048. Makuhari, 2010.
[21] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
[22] Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, and Huamin Qu. Understanding hidden memories of recurrent neural networks. In 2017 IEEE Conference on Visual Analytics Science and Technology (VAST), pages 13–24. IEEE, 2017.
[23] Faidon Mitzalis, Ozan Caglayan, Pranava Madhyastha, and Lucia Specia. Bertgen: Multi-task generation through bert. arXiv preprint arXiv:2106.03484, 2021.
[24] Sugeerth Murugesan, Sana Malik, Fan Du, Eunyee Koh, and Tuan Manh Lai. Deepcompare: Visual and interactive comparison of deep learning model performance. IEEE computer graphics and applications, 39(5):47–59, 2019.
[25] Cheonbok Park, Inyoup Na, Yongjang Jo, Sungbok Shin, Jaehyo Yoo, Bum Chul Kwon, Jian Zhao, Hyungjong Noh, Yeonsoo Lee, and Jaegul Choo. Sanvis: Visual analytics for understanding self-attention networks. In 2019 IEEE Visualization Conference (VIS), pages 146–150. IEEE, 2019.
[26] Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. Image transformer. In International Conference on Machine Learning, pages 4055–4064. PMLR, 2018.
[27] Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand¨
Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent
Dubourg, et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
[28] Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
[29] Harald Piringer, Wolfgang Berger, and Jurgen Krasser. Hypermoval: Interactive visual¨ validation of regression models for real-time simulation. In Computer Graphics Forum, volume 29, pages 983–992. Wiley Online Library, 2010.
[30] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by generative pre-training. 2018.
[31] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
[32] Tim Rocktaschel, Edward Grefenstette, Karl Moritz Hermann, Tom¨ a´s Koˇ ciskˇ y, and` Phil Blunsom. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664, 2015.
[33] Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, and Alexander M Rush. S eq 2s eq-v is: A visual debugging tool for sequenceto-sequence models. IEEE transactions on visualization and computer graphics, 25(1):353–363, 2018.
[34] Ian Tenney, Dipanjan Das, and Ellie Pavlick. Bert rediscovers the classical nlp pipeline. arXiv preprint arXiv:1905.05950, 2019.
[35] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
[36] Junpeng Wang, Liang Wang, Yan Zheng, Chin-Chia Michael Yeh, Shubham Jain, and Wei Zhang. Learning-from-disagreement: A model comparison and visual analytics framework. IEEE Transactions on Visualization and Computer Graphics, 2022.
[37] Xingbo Wang, Jianben He, Zhihua Jin, Muqiao Yang, Yong Wang, and Huamin Qu. M2lens: visualizing and explaining multimodal models for sentiment analysis. IEEE Transactions on Visualization and Computer Graphics, 28(1):802–812, 2021.
[38] James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viegas, and Jimbo Wilson. The what-if tool: Interactive probing of machine learning´ models. IEEE transactions on visualization and computer graphics, 26(1):56–65, 2019.
[39] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pages 2048–2057. PMLR, 2015.
[40] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019.
[41] Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, and Yong Rui. Visualizing and comparing convolutional neural networks. arXiv preprint arXiv:1412.6631, 2014.
[42] Jianlong Zhou, Weidong Huang, and Fang Chen. A radial visualisation for model comparison and feature identification. In 2020 IEEE Pacific Visualization Symposium (PacificVis), pages 226–230. IEEE, 2020.