簡易檢索 / 詳目顯示

研究生: 王泓壬
Wang, Hung-Ren
論文名稱: 會議語音辨識之上下文語言模型 Reranking 研究
Contextualize Language Model Reranking for Meeting Speech Recognition
指導教授: 陳柏琳
Chen, Berlin
口試委員: 陳冠宇
Chen, Guan-Yu
陳柏琳
Chen, Berlin
曾厚強
Tseng, Hou-Chiang
洪志偉
Hung, Chih-Wei
口試日期: 2023/07/21
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 44
中文關鍵詞: 自動語音辨識語言模型對話語音N-Best 列表列表資訊重新排序跨句資訊大型生成式語言模型ChatGPT
英文關鍵詞: Automatic Speech Recognition, Language Modeling, Conversational Speech, N-Best Lists, List Information, Large Generative Language Models, ChatGPT
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202301357
論文種類: 學術論文
相關次數: 點閱:112下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • ASR N-Best Reranking是自動語音識別(ASR)系統中用於提高轉錄輸出準確性的一種技術。在ASR系統中,系統為輸入音頻片段生成多個後選假設,稱為N-Best列表。而BERT (Bidirectional Encoder Representations from Transformers)是一種先進的語言模型,在文本分類、命名實體識別和問題解答等各種自然語言處理(NLP)任務中表現出卓越的性能。由於BERT能夠捕捉上下文信息並生成高品質的輸入文本表示,因此被用於ASR N-Best Reranking。為了更進一步增強BERT模型的預測,我們探索了增強語意信息與訓練目標,大致分為四部分: (1)將文本文法優劣信息融入到模型中的有效方法;(2)間接將整個N-Best列表信息融入到模型中的有效方法;(3)探討分類、排序及多任務訓練目標於模型訓練的可行性;(4)強化模型提取的文本信息。
    大型生成式語言模型(LLMs)已經證明了其在各種語言相關任務中的卓越泛化能力。本研究我們評估利用LLMs如ChatGPT於ASR N-Best Reranking任務的可行性。
    我們在AMI會議語料庫進行一系列的實驗,實驗結果顯示在降低單詞錯誤率(WER %),提出的方法有其有效性,與基本ASR系統比較最多可達到1.37%的絕對WER (%)下降。

    ASR (Automatic Speech Recognition) N-Best reranking is a task that aims to improve the accuracy of ASR systems by re-ranking the output of the ASR system, known as N-Best lists. The N-Best reranking task involves selecting the most likely transcription from the N-Best list based on additional contextual information. BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art language model that has shown remarkable performance in various natural language processing (NLP) tasks. BERT is being used in ASR N-Best reranking due to its ability to capture contextual information and generate high-quality representations of input text. We explore the enhancement of semantic information and training objectives, which are broadly divided into four parts: (1) effective methods to incorporate text grammatical strength and weakness information into the model; (2) effective methods to indirectly incorporate the whole N-Best list information into the model; (3) exploring the feasibility of categorization, sorting, and multitask training objectives for the model training; and (4) enhancement of textual information extracted by the model. Large-scale generative language models (LLMs) have demonstrated their excellent generalization ability in various language-related tasks. In this study we evaluate to utilize the excellent generalization ability of LLMs in ASR N-Best Reranking task.We conduct a series of experiments on AMI meeting corpus and the experimental results show the effectiveness of the proposed method in reducing the Word Error Rate (1.37 %).

    第一章 緒論 1 1.1研究背景 1 1.2動機與研究貢獻 3 第二章 文獻回顧 8 2.1 BERT於ASR N-Best重排之應用(PBERT) 8 2.2 BERT結合上下文於ASR N-Best重排之應用 9 2.3 Alpaca-LoRA大型生成式語言模型 10 2.4 ChatGPT 於ASR錯誤修正之應用 11 第三章 方法 13 3.1文法偵測 13 3.2語意相似度特徵 14 3.2.1上下文理解(Global Similarity) 15 3.2.2列表語義關係(Local Similarity) 16 3.3訓練目標 17 3.3.1基於交叉熵Cross Entropy之訓練目標 18 3.3.2 ListNet 19 3.3.3 LambdaRank 20 3.3.4 Multitask 20 3.4文本信息增強 21 3.4.1注意力池化層集合(Attention Pooling) 22 第四章 實驗 23 4.1實驗設定 23 4.1.1 AMI語料庫與ASR系統設定 23 4.1.2 BERT微調設定 23 4.2 基於BERT語言模型實驗結果 24 4.2.1語法偵測增強 24 4.2.2語意相似度特徵 25 4.2.3訓練目標 27 4.2.4文本信息增強 29 4.2.5 多語意特徵增強分析 30 4.3基於生成式LLMs實驗結果 31 4.3.1 基於LLMs 實驗設定 31 4.3.2 LLMs 實驗結果 32 第五章 結論與展望 34 參考文獻 35 附錄 42

    R. Kneser and H. Ney, “Improved backing-off for M-gram language modeling,” in 1995 International Conference on Acoustics, Speech, and Signal Processing, Detroit, MI, USA: IEEE, 1995, pp. 181–184.
    J. Goodman, “A Bit of Progress in Language Modeling.” arXiv, Aug. 09, 2001.
    S. F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” Comput. Speech Lang., vol. 13, no. 4, pp. 359–393, Oct. 1999,
    T. Mikolov, S. Kombrink, L. Burget, J. Cernocky, and S. Khudanpur, “Extensions of recurrent neural network language model,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic: IEEE, May 2011, pp. 5528–5531.
    T. Mikolov, M. Karafiát, L. Burget, J. Černocký, and S. Khudanpur, “Recurrent neural network based language model,” in Interspeech 2010, ISCA, Sep. 2010, pp. 1045–1048.
    M. Sundermeyer, R. Schlüter, and H. Ney, “LSTM neural networks for language modeling,” in Interspeech 2012, ISCA, Sep. 2012, pp. 194–197.
    S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997,
    G. Sun, C. Zhang, and P. C. Woodland, “Transformer Language Models with LSTM-based Cross-utterance Information Representation.” arXiv, Feb. 12, 2021.
    Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov, “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context.” arXiv, Jun. 02, 2019.
    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” arXiv, May 24, 2019.
    C.-H. Kuo and K.-Y. Chen, “Correcting, Rescoring and Matching: An N-Best List Selection Framework for Speech Recognition,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Chiang Mai, Thailand: IEEE, Nov. 2022, pp. 729–734.
    J. Shin, Y. Lee, and K. Jung, “Effective Sentence Scoring Method using Bidirectional Language Model for Speech Recognition.” arXiv, May 16, 2019.
    J. Salazar, D. Liang, T. Q. Nguyen, and K. Kirchhoff, “Masked Language Model Scoring,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 2699–2712.
    S.-H. Chiu and B. Chen, “Innovative Bert-based Reranking Language Models for Speech Recognition,” in 2021 IEEE Spoken Language Technology Workshop (SLT), Jan. 2021, pp. 266–271.
    L. Xu, Y. Gu, J. Kolehmainen, H. Khan, A. Gandhe, A. Rastrow, A. Stolcke, and I. Bulyko, “RescoreBERT: Discriminative Speech Recognition Rescoring with BERT,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2022, pp. 6117–6121.
    T.-W. Wu, I.-F. Chen, and A. Gandhe, “Learning to rank with BERT-based confidence models in ASR rescoring,” in Interspeech 2022, ISCA, Sep. 2022, pp. 1651–1655.
    B.-C. Yan, H.-W. Wang, S.-H. Chiu, H.-S. Chiu, and B. Chen, “Effective Cross-Utterance Language Modeling for Conversational Speech Recognition.” arXiv, May 31, 2022.
    S. González-Carvajal and E. C. Garrido-Merchán, “Comparing BERT against traditional machine learning text classification,” J. Comput. Cogn. Eng., Apr. 2023,
    I. Tenney, D. Das, and E. Pavlick, “BERT Rediscovers the Classical NLP Pipeline.” arXiv, Aug. 09, 2019.
    B. Van Aken, B. Winter, A. Löser, and F. A. Gers, “How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer Representations,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing China: ACM, Nov. 2019, pp. 1823–1832.
    L. Wu, Z. Zheng, Z. Qiu, H. Wang, H. Gu, T. Shen, C. Qin, C. Zhu, H. Zhu, Q. Liu, H. Xiong, and E. Chen, “A Survey on Large Language Models for Recommendation.” arXiv, May 31, 2023.
    Z. Jiang, A. El-Jaroudi, W. Hartmann, D. Karakos, and L. Zhao, “Cross-lingual Information Retrieval with BERT.” arXiv, Apr. 24, 2020.
    Y. Peng, Q. Chen, and Z. Lu, “An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining,” in Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, Online: Association for Computational Linguistics, 2020, pp. 205–214.
    C. Macdonald, N. Tonellotto, and S. MacAvaney, “IR From Bag-of-words to BERT and Beyond through Practical Experiments,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event Queensland Australia: ACM, Oct. 2021, pp. 4861–4861.
    F. W. Mutinda, S. Yada, S. Wakamiya, and E. Aramaki, “Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT,” Methods Inf. Med., vol. 60, no. S 01, pp. e56–e64, Jun. 2021,
    J. Morris, E. Lifland, J. Y. Yoo, J. Grigsby, D. Jin, and Y. Qi, “TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online: Association for Computational Linguistics, 2020, pp. 119–126.
    M. Nie, M. Yan, and C. Gong, “Prompt-based Re-ranking Language Model for ASR,” in Interspeech 2022, ISCA, Sep. 2022, pp. 3864–3868.
    C. Sun, X. Qiu, Y. Xu, and X. Huang, “How to Fine-Tune BERT for Text Classification?” arXiv, Feb. 05, 2020.
    X. Ma, Z. Wang, P. Ng, R. Nallapati, and B. Xiang, “Universal Text Representation from BERT: An Empirical Study.” arXiv, Oct. 23, 2019.
    W. Sun, L. Yan, X. Ma, P. Ren, D. Yin, and Z. Ren, “Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent.” arXiv, Apr. 19, 2023.
    OpenAI, “Introducing chatgpt.,” 2022.
    OpenAI, “GPT-4 Technical Report.” arXiv, Mar. 27, 2023.
    Microsoft, “Confirmed: the new bing runs on openai’s gpt-4.,” 2023.
    S.-H. Chiu, T.-H. Lo, F.-A. Chao, and B. Chen, “Cross-utterance Reranking Models with BERT and Graph Convolutional Networks for Conversational Speech Recognition.” arXiv, Oct. 01, 2021.
    S.-H. Chiu, T.-H. Lo, and B. Chen, “Cross-sentence Neural Language Models for Conversational Speech Recognition,” in 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China: IEEE, Jul. 2021, pp. 1–7.
    S. E. Brennan and H. H. Clark, “Conceptual pacts and lexical choice in conversation.,” J. Exp. Psychol. Learn. Mem. Cogn., vol. 22, no. 6, pp. 1482–1493, 1996,
    E. A. Schegloff, “Sequencing In Conversational Openings,” in Selected Studies and Applications, J. A. Fishman, Ed., De Gruyter, 1972, pp. 91–125.
    J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, “Attention-Based Models for Speech Recognition.” arXiv, Jun. 24, 2015.
    T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language Models are Few-Shot Learners.” arXiv, Jul. 22, 2020.
    B. Workshop et al., “BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.” arXiv, Jun. 27, 2023.
    H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and Efficient Foundation Language Models.” arXiv, Feb. 27, 2023.
    C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.” arXiv, Jul. 28, 2020.
    Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy and Liang, and Tatsunori B. Hashimoto, “Stanford alpaca: An instruction-following llama model.” 2023.
    Y. Wang, Y. Kordi, S. Mishra, A. Liu, N. A. Smith, D. Khashabi, and H. Hajishirzi, “Self-Instruct: Aligning Language Models with Self-Generated Instructions.” arXiv, May 25, 2023.
    “Alpaca-lora,” 2023,
    E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models.” arXiv, Oct. 16, 2021.
    R. Anil et al., “PaLM 2 Technical Report.” arXiv, May 17, 2023.
    R. Ma, M. Qian, P. Manakul, M. Gales, and K. Knill, “Can Generative Large Language Models Perform ASR Error Correction?” arXiv, Jul. 09, 2023.
    V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Queensland, Australia: IEEE, Apr. 2015, pp. 5206–5210.
    N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.” arXiv, Aug. 27, 2019.
    Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li, “Learning to rank: from pairwise approach to listwise approach,” in Proceedings of the 24th international conference on Machine learning, Corvalis Oregon USA: ACM, Jun. 2007, pp. 129–136.
    X. Wang, C. Li, N. Golbandi, M. Bendersky, and M. Najork, “The LambdaLoss Framework for Ranking Metric Optimization,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino Italy: ACM, Oct. 2018, pp. 1313–1322.
    J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, W. Kraaij, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, I. McCowan, W. Post, D. Reidsma, and P. Wellner, “The AMI Meeting Corpus: A Pre-announcement,” in Machine Learning for Multimodal Interaction, S. Renals and S. Bengio, Eds., in Lecture Notes in Computer Science, vol. 3869. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 28–39.
    S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. E. Y. Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, and T. Ochiai, “ESPnet: End-to-End Speech Processing Toolkit.” arXiv, Mar. 30, 2018.
    T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. Von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. Rush, “Transformers: State-of-the-Art Natural Language Processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online: Association for Computational Linguistics, 2020, pp. 38–45.
    I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization.” arXiv, Jan. 04, 2019.

    下載圖示
    QR CODE