研究生: 鄭皓天
Cheng, Hao-Tien
論文名稱: 多口音英語語音辨識
Multi-accent English Speech Recognition
指導教授: 陳柏琳
Chen, Berlin
口試委員: 陳柏琳
Chen, Berlin
Hung, Jeih-Weih
Chiang, Chen-Yu
口試日期: 2024/01/20
學位類別: 碩士
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 41
中文關鍵詞: 語音辨識口音多任務學習資料視覺化模型探測轉換器
英文關鍵詞: Speech Recognition, Accent, Multi-task Learning, Data Visualization, Model Probing, Adapter
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202400347
論文種類: 學術論文
相關次數: 點閱:57下載:3
  • 隨著全球化的趨勢,英語作為國際通用語言的角色日益重要。然而,由於母語背景、地區和文化差異的影響,英語口音的多樣性也相應增加。這使得語音辨識系統在識別各種口音的英語時面臨著挑戰。

    With globalization, the role of English as an international lingua franca has become increasingly important. However, the diversity of English accents, influenced by native language backgrounds, regional and cultural differences, poses challenges to speech recognition systems in recognizing various accents. This thesis investigates how to improve the Conformer model for multi-accent English speech recognition under limited accent data by enhancing accent discrimination. A method integrating accent classification tasks into the speech recognition model is proposed to increase the model's sensitivity and discrimination towards different accents. The results demonstrate a decrease in word error rate for accented English speech recognition compared to traditional methods. Furthermore, this study visualizes accent features in different layers of the model encoder for analysis, exploring the information represented by features at various layers. Additionally, the thesis examines the performance of the extensively trained Whisper model in English and multilingual versions, as well as under different model sizes, for multi-accent English speech recognition tasks. It also compares the differences between training the model using LoRA and comprehensive fine-tuning, expecting to provide clearer guidance for model selection.

    第一章 緒論 1 1.1. 研究背景 1 1.2. 研究動機 1 1.3. 研究貢獻 2 第二章 文獻探討 4 2.1 背景描述 4 2.2 多口音語音辨識方法 4 2.2.1 聲學模型和語音處理 4 2.2.2 獨立的口音聲學模型 5 2.2.3 多口音深度神經網路 5 2.2.4 對抗生成訓練 6 2.2.5 口音嵌入 7 2.2.6 殘差轉換器 8 2.2.7 口音分類任務特徵 9 2.2.8 一般編碼器與口音編碼器 11 第三章 方法與步驟 13 3.1 Conformer模型 13 3.1.1 簡介 13 3.1.2 Conformer模型架構 13 3.1.3 預訓練與微調 14 3.1.4 輔助性的多任務學習 15 3.2 Whisper模型 17 3.2.1 Whisper模型簡介 17 3.2.2 LoRA 19 第四章 實驗與結果 21 4.1 資料集 21 4.1.1 LibriSpeech資料集 21 4.1.2 AESRC2020資料集 22 4.2 評估指標 23 4.3 實驗結果 24 4.3.1 Librispeech預訓練與微調 24 4.3.2 加入口音分類的多任務學習 25 4.3.3 口音分類任務損失的權重 26 4.3.4 Conformer模型內部特徵視覺化 26 4.3.5 加入語言模型之分析 31 4.3.6 加入領域對抗訓練(DAT)的輔助損失 32 4.3.7 外域資料測試 33 4.3.8 微調不同Whisper模型至多口音任務上 34 4.3.9 使用LoRA訓練Whisper模型 35 第五章 結論與展望 37 參考文獻 38

