Author: |
劉鳳萍 |
---|---|
Thesis Title: |
使用鑑別式語言模型於語音辨識結果重新排序 Applying Discriminative Language Models to Reranking of M-best Speech Recognition Results |
Advisor: | 陳柏琳 |
Degree: |
碩士 Master |
Department: |
資訊工程學系 Department of Computer Science and Information Engineering |
Thesis Publication Year: | 2009 |
Academic Year: | 97 |
Language: | 中文 |
Number of pages: | 84 |
Keywords (in Chinese): | 鑑別式語言模型 、語言模型調適 、關鍵詞自動擷取方法 |
Keywords (in English): | discriminative language model, language model adaptation, Boosting, Perceptron, Minimum Sample Risk, keyword extraction |
Thesis Type: | Academic thesis/ dissertation |
Reference times: | Clicks: 229 Downloads: 4 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
語言模型代表語言的規律性,在語音辨識中,它可用以減輕聲學特徵混淆所造成的問題,引導辨識器在多個候選字串中作搜尋,並量化辨識器產生的最終辨識結果字串的可接受度高低。然而,隨著時空及領域的不同,語言產生差異,固定不變的語言模型無法符合實際需求。語言模型調適提供了一個解決之道,使用少量同時期或同領域的調適語料對語言模型進行調整,以增進效能。鑑別式語言模型為語言模型調適方法之一,它首先取得一些特徵(Feature),每一個特徵各有其對應之權重(Feature Weight),以代表語言中的句子或字串,並以這些特徵及其相關權重為基礎,構建出一套評分機制,用以對基礎辨識器(Baseline Recognizer)所產生的多個辨識結果進行重新排序(Reranking),以期最正確的詞序列可以成為最終辨識結果。本文提出以關鍵詞自動擷取方法所得結果,增加鑑別式語言模型之特徵。關鍵詞自動擷取方法是透過計算字或詞在語料庫中同時重複出現的次數以擷取出關鍵詞,其優點為可以在不依賴詞典(Lexicon)的情況下,擷取出新生詞彙或不存在詞典裡的語彙,這樣的特性也許會對鑑別式訓練有所助益,但實驗結果顯示未有顯著之改善效果。
A language model (LM) is designed to represent the regularity of a given language. When applied to speech recognition, it can be used to constrain the acoustic analysis, guide the search through multiple candidate word strings, and quantify the acceptability of the final word string output from a recognizer. However, the regularity of a language would change along with time and cross domains, such that a static or invariable language model cannot meet the realistic demand. Language model adaptation seems to provide a solution, by using a small amount of contemporaneous or in-domain data to adapt the original language model, for better performance. The discriminative model is one of the representative approaches for language model adaptation in speech recognition. It first derives a set of indicative features, where each feature has a different weight, to characterize sentences or word strings in a language, and then build a sentence scoring mechanism on the basis of these features and the associated weights. This mechanism is used to re-rank the M-best recognition results such that the most correct candidate word string is expected to be on the top of the rank. This paper proposes an approach which takes the results of a fast keyword extraction method as additional features for the discriminative model. This method extracts keywords by counting the repetition of co-occurrences of characters or words in the speech corpus, such that these keywords may capture the regularity of language being used. A nice property is that it extracts keyword without the need of a lexicon, so it can extract new keywords and the keywords which do not exist in, or contain words of the lexicon. This property may be useful for discriminative language modeling, but, however, empirical experiments show it only provides insignificant improvements.
[Aubert 2002] X. Aubert, “An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, Vol. 16, pp. 89-114, 2002.
[Bacchiani et al. 2003] M. Bacchiani and B. Roark.,” Unsupervised Language Model Adaptation”, ICASSP , 2003.
[Brown et al. 1992] Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, JennifeC. Lai, and Robert L. Mercer. "Class-based N-gram Models of Natural Language", Computational Linguistics, 18(4):467–479, December, 1992.
[Chen et al. 2002 ] Z. Chen, K. F. Lee and M. J. Li, “Discriminative Training on Language Model”, ICSLP, 2002.
[Cherry et al. 2008.] C. Cherry and C. Quirk, “Discriminative, Syntactic Language Modeling through Latent SVMs”, ATMA, 2008.
[Collins et al. 2000] M. Collins, T. Koo, “Discriminative Reranking for Natural Language Parsing”, ICML, 2000.
[Collins 2002] M. Collins, “Discriminative Training Methods for Hidden Markov Models : Theorey and Experiments with Perceptron Algorithms”, EMNLP, 2002.
[Collins 2003] Machine Learning Approaches for Natural Language Processing , Lecture Slide 14, “Global Linear Models”, http://www.ai.mit.edu/courses/6.891-nlp/l14.pdf
[Collins et al. 2005] M. Collins, B. Roark and M. Saraclar, “Discriminative Syntactic Language Modeling for Speech Recognition”, ACL 2005.
[Dempster et al. 1977] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society, Series B, Vol.39, no. 1, pages 1-38, 1977.
[Duda et al. 2001] R. O. Duda, P.E. Hart and D. G. Stork, “Pattern Classification”, Wiley, New York, 2001.
[Freund et al. 1996] Y. Freund and R. E. Schapire, “Experiments with a New Boosting Algorithm ”, ICML 1996.
[Freund et al. 1998] Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer, “An Efficient Boosting Algorithm for Combining Preferences”, In Machine Learning: Proceedings of the Fifteenth International Conference, 1998.
[Friedman et al. 1998] J, Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: a statistical view of boosting”, Dept. of Statistics, Stanford University, Stanford, CA, 1998.
[Gao et al 2005a] J. Gao, H. Yu, W. Yuan and P. Xu, “Minimum Sample Risk Methods for Language Modeling,” HLT/EMNLP, 2005.
[Gao et al. 2005b] J. Gao, H. Suzuki, W. Yuan, “An Empirical Study on Language Model Adaptation”, ACM Transactions on Asian Language Information Processing, Vol. 5, No. 3, September 2005, pp. 209-227
[Gao et al. 2005c] J. Gao, H. Suzuki, B. Yu, “Approximation Lasso Methods for Language Modeling”, ACL, 2006.
[Gao et al. 2007] J. Gao, G. Andrew, M. Johnson and K. Toutanova ,“A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing”, ACL, 2007.
[Hebb 1949] D.O. Hebb, “The Organization of Behavior : A Neuropsychological Theorey”, Wiley, 1949.
[Katz 1987] S. M. Katz. Estimation of Probabilities from Sparse Data for the Language Model Component of A Speech Recognizer. IEEE Trans. On Acoustics,Speech and Signal Processing, Volume 35 (3), pp. 400-401, March 1987.
[Kneser et al. 1995] R. Kneser and H. Ney, “Improved Backing-off for M-gram Language Modeling”, ICASSP, 1995.
[Kuhn et al. 1990] R. Kuhn and R. De Mori., “A Cache-based Natural Language Model for Speech Reproduction”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990.
[Kuo et al. 2002] H.-K. J. Kuo, E. Fosler-Lussier, H. Jiang and C. H. Lee, “Discriminative Training of Language Models for Speech Recognition”, ICASSP, 2002.
[Kuo et al. 2007] H.-K. J. Kuo, B. Kingsbury, G. Zweig, “Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition”, ICASSP, 2007.
[Kuo et al. 2005] J. W. Kuo and B. Chen, “Minimum Word Error Based Discriminative Training of Language Models”, Eurospeech, 2005.
[Lin et al. 2005] S. S. Lin and F. Yvon, “Discriminative training of finite-state decoding graphs,” Proc. InterSpeech, 2005.
[Lippman 1987] R.P. Lippman, “An Introduction to Computing With Neural Nets”, IEEE ASSP Magazine, vol. 4, pp.4-22, 1987.
[McCulloch et al. 1943] W. S. McCulloch, and W. Pitts, “A logical calculus of the ideas immanent in nervous activity”, Bulletin of Mathematical Biophysics, Vol. 5, pp.115-133, 1943.
[Mitchell 1997] T. Mitchell, “Machine Learning”, New York, 1997
[Okanohara et al. 2007] D. Okanohara, J. Tsujii, “A Discriminative Language Model with Pseudo-Negative Samples”, ACL, 2007.
[Rigazio et al. 1998] L. Rigazio, J.-C. Junqua, M. Galler, “Multilevel Discriminative Training for Spelled Word Recognition”, ICASSP, 1998.
[Roark et al. 2004a] B. Roark, M. Saraclar and M. Collins, “Corrective Language Modeling for Large Vocabulary ASR with the Perceptron Algorithm”, ICASSP, 2004.
[Roark et al. 2004b] B. Roark, M. Saraclar, M. Collins, M. Johnson, “Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm”, ACL 2004.
[Roark et al. 2007] B. Roark, M. Saraclar and M. Collins,“Discriminative N-gram Language Modeling”, Computer Speech and Language, 2007.
[Tseng 1997] Y. H. Tseng, “Fast Keyword Extraction of Chinese Documents in a Web Environment”, International Workshop on Information Retrieval with Asian Languages , pp.81-87, 1997.
[Warnke et al. 1999] V. Warnke, S. Harbeck, E. Noth, H. Niemann and M. Levit., “Discriminative Estimation of Interpolation Parameters for Language Model Classifiers”, ICASSP, 1999.
[Woodland et al. 2000] P. C. Woodland and D Povey, “Large Scale Discriminative Training for Speech Recognition,” ASR-Speech Recognition: Challenges for the Millenium, pp. 7-16, 2000.
[Zhao et al. 2004] P. Zhao, B. Yu, “Boosted Lasso”, Tech Report, Statistic Department, U. C. Berkeley.
[Zhou et al. 2006] Z. Zhou, J. Gao, F. K. Soong and H. Meng, “A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization”, ICASSP, 2006.
[Zhou et al. 2008] Z. Zhou and H. Meng, “Recasting the Discriminative N-gram Model as a Pseudo-conventional N-gram Model for LVCSR”, ICASSP ,2008.
[邱炫盛 2007] 邱炫盛,《利用主題與位置相關語言模型於中文連續語音辨識》,國立台灣師範大學資訊工程所碩士論文, 2007.
《語法與修辭》聯編組,《語法與修辭》,新學識文教出版中心,台北,1998。