研究生: 賴敏軒
論文名稱: 實證探究多種鑑別式語言模型於語音辨識之研究
Empirical Comparisons of Various Discriminative Language Models for Speech Recognition
指導教授: 陳柏琳
Chen, Berlin
學位類別: 碩士
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 68
中文關鍵詞: 語音辨識鑑別式語言模型邊際訓練準則
論文種類: 學術論文
  • 語言模型(Language Model)在自動語音辨識(Automatic Speech Recognition, ASR)系統中扮演相當重要的角色,藉由使用大量的訓練文字來估測其相對應的模型參數,以描述自然語言的規律性。N-連(N-gram)語言模型(特別是雙連詞(Bigram)與三連詞(Trigram))常被用來估測每一個詞出現在已知前N-1個歷史詞之後的條件機率。此外,N-連模型大多是以最大化相似度為訓練目標,對於降低語音辨識錯誤率常會有所侷限,並非能達到最小化辨識錯誤率。近年來為了解決此問題,鑑別式語言模型(Discriminative Language Model, DLM)陸續地被提出,目的為從可能的辨識語句中正確地區別最佳的語句作為辨識之結果,而不是去符合其訓練資料,此概念已經被提出並論證有一定程度的成果。本論文首先實證探討多種以提升語音辨識效能為目標的鑑別式語言模型。接著,我們提出基於邊際(Margin-based)鑑別式語言模型訓練方法,對於被錯誤辨識的語句根據其字錯誤率(Word Error Rate, WER)與參考詞序列(字錯誤率最低)字錯誤率之差為比重,給予不同程度的懲罰。相較於其它現有的鑑別式語言模型,我們所提出的方法使用於大詞彙連續語音辨識(Large Vocabulary Continuous Speech Recognition, LVCSR)時有相當程度的幫助。

    Language modeling (LM), at the heart of most automatic speech recognition (ASR) systems, is to render the regularity of a given natural language, while it corresponding model parameters are estimated on the basis of a large amount of training text. The n-gram (especially the bigram and trigram) language models, which determine the probability of a word given the preceding n-1 word history, are most prominently used. The n-gram model, normally trained with the maximum likelihood (ML) criterion, are not always capable of achieving minimum recognition error rates which in fact are closely connected to the final evaluation metric. To address this problem, in the recent past, a range of discriminative language modeling (DLM) methods, aiming at correctly discriminate the recognition hypotheses for the best recognition results rather than just fitting the distribution of training data, have been proposed and demonstrated with varying degrees of success. In this thesis, we first present an empirical investigation of a few leading DLM models designed to boost the speech recognition performance. Then, we propose a novel use of various margin-based DLM training methods that penalize incorrect recognition hypotheses in proportion to their WER (word error rate) distance from the desired hypothesis (or the oracle) that has the minimum WER. Experiments conducted on a large vocabulary continuous speech recognition (LVCSR) task illustrate the performance merits of the methods instantiated from our DLM framework when compared to other existing methods.

    目錄 i 圖目錄 iii 表目錄 iv 第1章 緒論 1 1.1 研究背景 1 1.2 語音辨識簡介 3 1.3 論文貢獻 6 1.4 論文架構 6 第2章 文獻回顧及方法探討 7 2.1 N-連語言模型及不同層次語言模型 7 2.2 想法及目標 10 2.3 訓練之定義 12 2.4 鑑別式語言模型 14 2.4.1 最小化平方誤差 15 2.4.2 最小化錯誤率期望值 18 2.4.3 最大化對數條件機率 20 2.4.4 考量語句之間彼此之關係 23 2.5 鑑別式語言模型之特性 25 2.6 其它相關文獻 27 第3章 基於邊際資訊之鑑別式語言模型(MDLM) 30 3.1 邊際估測法則 30 3.2 基於邊際資訊之鑑別式語言模型(MDLM) 34 第4章 實驗架構與結果 40 4.1 實驗架構 40 4.1.1 台師大之大詞彙連續語音辨識系統 40 4.1.2 實驗語料 42 4.1.3 語言模型評估 43 4.2 基礎實驗結果 44 4.3 各式鑑別式語言模型實驗結果 45 4.4 基於邊際資訊之鑑別式語言模型相關實驗結果 49 4.5 特徵選取運用於鑑別式語言模型 52 4.6 利用線性結合各種的鑑別式語言模型 54 第5章 結論及未來展望 59 參考文獻 61

