研究生: |
郝柏翰 |
---|---|
論文名稱: |
運用鄰近與概念資訊於語言模型調適之研究 Leveraging Proximity Cues and Concept Information for Language Model Adaptation in Speech Recognition |
指導教授: | 陳柏琳 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 中文 |
論文頁數: | 64 |
中文關鍵詞: | 語音辨識 、語言模型 、鄰近資訊 、概念資訊 |
英文關鍵詞: | Automatic Speech Recognition, Language Modeling, Proximity Cues, Concept Information |
論文種類: | 學術論文 |
相關次數: | 點閱:208 下載:12 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文研究語言模型調適技術用於中文大詞彙連續語音辨識,其主要貢獻有兩個部分:第一部分探討主題模型(Topic Models)之延伸與改進,除了希望能放寬詞袋假設的限制之外,更藉由融入鄰近資訊(Proximity Information)期望使主題模型有更好的預測效能;第二部分提出概念模型(Concept Language Model, CLM),其主要目的為近似使用者心中所想之概念,並藉此觀察較為相關之用詞;同時,本論文更嘗試以不同方式來估測概念模型。本論文實驗以字錯誤率(Character Error Rate, CER)與語言複雜度(Perplexity)為評估依據;結果顯示本論文所提出方法對辨識效能之提升有明顯的幫助。
This thesis investigates and develops language model adaptation techniques for Mandarin large vocabulary continuous speech recognition (LVCSR) and its main contribution is two-fold. First, the so-called “bag-of-words” assumption of conventional topic models is relaxed by additionally incorporating word proximity cues into the model formulation. By doing so, the resulting topic models can achieve better prediction capabilities for use in LVCSR. Second, we propose a novel concept language modeling (CLM) approach to rendering the relationships between a search history and an upcoming word. The instantiations of CLM can be constructed with different levels of lexical granularities, such as words and document clusters. A series of experiments on a LVCSR task demonstrate that our proposed language models can offer substantial improvements over the baseline N-gram system, and achieve performance competitive to, or better than, some state-of-the-art language models.
[1] K.-F. Lee, “Automatic Speech Recognition: The Development of the SPHINX Recognition System,” Boston: Kluwer Academic Publishers, 1989.
[2] C. Manning and H. Schutze, “Foundations of statistical natural language processing,” Cambridge, MA: MIT Press, 1999.
[3] P. F. Brown, V. J. Della Pietra, S. A. Della Pietra and R. L. Mercer, “The mathematics of statistical machine translation : Parameter estimation,” Computational Linguistics, Vol. 19, No. 2, pp. 263–311, 1993.
[4] C. Zhai and J. Lafferty, “A study of smoothing methods for language models applied to ad hoc information retrieval,” in Proceedings of the ACM Special Interest Group on Information Retrieval (SIGIR), pp. 334–342, 2001.
[5] W.-Y. Ma and K.-J. Chen, “Introduction to CKIP Chinese word segmentation system for the first international Chinese word segmentation bakeoff,” in Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 168–171, (http://ckipsvr.iis.sinica.edu.tw/).
[6] A. Mansikkaniemi and M. Kurimo, “Unsupervised topic adaptation for morph-based speech recognition,” in Proceedings of the International Speech Communication Association (INTERSPEECH), pp. 2693–2697, 2013.
[7] M. Kozielski, D. Rybach, S. Hahn, R. Schlüter and H. Ney, “Open vocabulary handwriting recognition using combined word-level and character-level language models,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 8257–8261, 2013.
[8] C. Chelba, D. Bikel, M. Shugrina, P. Nguyen and S. Kumar, “Large scale language modeling in automatic speech recognition,” Technical report, Google, 2012.
[9] T.-H. Wen, A Heidel, H.-Yi. Lee, Y Tsao and L.-S. Lee, “Recurrent neural network based language model personalization by social network crowdsourcing”, in Proceedings of the International Speech Communication Association (INTERSPEECH), pp. 2703–2707, 2013.
[10] F. Jelinek, “Up from trigrams! The struggle for improved language models,” in Proceedings of the International Speech Communication Association (INTERSPEECH), pp. 1037–1040, 1991.
[11] G. Tur and A. Stolcke, “Unsupervised language model adaptation,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.173–176, 2007.
[12] J. R. Bellegarda, “Statistical language model adaptation: review and perspectives,” Speech Communication, Vol. 42, No. 1, pp. 93–108, 2004.
[13] J. R. Bellegarda, “A multispan language modeling framework for large vocabulary speech recognition,” IEEE Transactions on Acoustic, Speech and Signal Processing, Vol. 6, No. 5, pp. 456–467, 1998.
[14] J. Goodman, “A bit of progress in language modeling (extended version),” Machine Learning and Applied Statistics Group, Technique Report, Microsoft, 2001.
[15] R. Rosenfeld, “Two decades of statistical language modeling: where do we go from here,” IEEE, Vol. 88, No. 8, pp. 1270–1278, 2000.
[16] I. J. Good, “The population frequencies of species and the estimation of population parameters,” Biometrika, Vol. 40, No. 3–4, pp. 237–264, 1953.
[17] R. Kneser and H. Ney, “Improved backing-off for N-gram language modeling,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 181–184, 1995.
[18] P. F. Brown, V. J. Della Pietra, P. V. deSouza, J. C. Lai and R. L. Mercer, “Class-based N-gram models of natural language,” Computational Linguistics, Vol. 18, No. 4, pp. 467–479, 1992.
[19] X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee and R. Rosenfeld, “The SPHINX-II speech recognition system: An overview,” Computer, Speech, and Language, Vol. 7, No. 2, pp. 137–148, 1993.
[20] R. Lau, R. Rosenfeld and S. Roukos, “Trigger-based language models: a maximum entropy approach,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 45–48, 1993.
[21] C. Chelba, “A structured language model,” in Proceedings of the Annual Meeting on Association for Computational Linguistics (ACL), pp. 498–450, 1997.
[22] C. Chelba and F. Jelinek, “Exploiting syntactic structure for language modeling,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 225–231, 1998.
[23] J. R. Bellegarda, “A latent semantic analysis framework for large–span language modeling,” in Proceedings of European Conference on Speech Communication and Technology (Eurospeech), pp.1451–1454, 1997.
[24] T. Hofmann, “Probabilistic latent semantic indexing,” in Proceeding of the ACM Special Interest Group on Information Retrieval (SIGIR), pp. 50–57, 1999.
[25] D. M. Blei, A. Y. Ng and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, Vol. 3, Jan, pp. 993–1022, 2003.
[26] Z. Chen, K. F. Lee and M. J. Li, “Discriminative training on language model,” in Proceedings of the International Speech Communication Association (INTERSPEECH), pp. 493–496, 2000.
[27] T. Mikolov, M. Karafiát, L. Burget, J. Černocký and S. Khudanpur, “Recurrent neural network based language model,” in Proceedings of the International Speech Communication Association (INTERSPEECH), pp. 1045–1048, 2010.
[28] T. Mikolov, S. Kombrink, A. Deoras, L. Burget and J. Černocký, “RNNLM – Recurrent neural network language modeling toolkit,” in Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2011.
[29] T. Mikolov, S. Kombrink, L. Burget, J. Černocký and S. Khudanpur, “Extensions of recurrent neural network language model,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5528–5531, 2011.
[30] E. Arisoy, T. N. Sainath, B. Kingsbury and B. Ramabhadran, “Deep neural network language models,” in Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, pp. 20–28, 2012.
[31] H. Le, A. Allauzen and F. Yvon, “Measuring the influence of long range dependencies with neural network language models,” in Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL), pp. 1–10, 2012.
[32] A. Mnih and Y. Teh, “A fast and simple algorithm for training neural probabilistic language models,” in Proceedings of the International Conference on Machine Learning (ICML), pp. 1751–1758, 2012.
[33] S. M. Katz, “Estimation of probabilities from sparse data for the language model component of a speech recognizer,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-35, No. 3, pp. 400–401, 1987.
[34] D. Gildea and T. Hofmann, “Topic-based language models using EM,” in Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), pp. 2167–2170, 1999.
[35] J. Nie, R. Li, D. Luo and X. Wu, “Refine bigram PLSA model by assigning latent topics unevenly,” in Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 141–146, 2007.
[36] M. Bahrani and H. Sameti, “A new bigram PLSA language model for speech recognition,” EURASIP Journal on Advances in Signal Processing, Vol. 2010, July, pp. 1–8, 2010.
[37] M. A. Haidar and D. O’Shaughnessy, “Comparison of a bigram PLSA and a novel context-based PLSA language model for speech recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 8440–8444, 2013.
[38] V. Lavrenko and W. Croft, “Relevance-based language models,” in Proceeding of the ACM Special Interest Group on Information Retrieval (SIGIR), pp. 120–127, 2001.
[39] R. Baeza-Yates and B. Ribeiro-Neto, “Modern Information Retrieval: the Concepts and Technology behind Search,” Addison-Wesley Professional, 2011.
[40] K.-Y. Chen and B. Chen, “Relevance language modeling for speech recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5568–5571, 2011.
[41] B. Chen and K.-Y. Chen, “Leveraging relevance cues for language modeling in speech recognition,” Information Processing & Management, Vol. 49, No 4, pp. 807–816, 2013.
[42] Y.-W. Chen, B.-H. Hao, K.-Y. Chen and B. Chen, “Incorporating proximity information for relevance language modeling in speech recognition,” in Proceedings of the International Speech Communication Association (INTERSPEECH), pp. 2683–2687, 2013.
[43] Y. Lv and C. Zhai, “Positional language models for information retrieval,” in Proceedings of the ACM Special Interest Group on Information Retrieval (SIGIR), pp. 299–306, 2009.
[44] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,” Cornell Aeronautical Laboratory, Psychological Review, Vol. 65, No. 6, pp. 386–408, 1958.
[45] F. J. Och, “Minimum error rate training in statistical machine translation,” in Proceedings of Annual Meeting on Association for Computational Linguistics (ACL), pp. 160–167, 2003.
[46] B. Roark, M. Saraclar, M. Collins and M. Johnson, “Discriminative N-gram language modeling,” Computer Speech and Language, Vol. 21, No. 2, pp. 373–392, 2007.
[47] T. Oba, T. Hori and A. Nakamura, “A comparative study on methods of weighted language model training for reranking LVCSR N-best hypotheses,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5126–5129, 2010.
[48] Y. Bengio, P. Simard and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, Vol. 5, No. 2, pp. 157–166, 1994.
[49] 邱炫盛,“利用主題與位置相關語言模型於中文連續語音辨識,”國立臺灣師範大學資訊工程所碩士論文,2007。
[50] 陳冠宇,“主題模型於語音辨識使用之改進,”國立臺灣師範大學資訊工程所碩士論文,2010。
[51] 劉家妏,“多種鑑別式語言模型應用於語音辨識之研究,” 國立臺灣師範大學資訊工程所碩士論文,2010。
[52] 賴敏軒,“實證探究多種鑑別式語言模型於語音辨識之研究,” 國立臺灣師範大學資訊工程所碩士論文,2011。
[53] 黃邦烜,“遞迴式類神經網路語言模型使用額外資訊於語音辨識之研究,” 國立臺灣師範大學資訊工程所碩士論文,2012。
[54] 李俊毅,“語音評分,”國立清華大學資訊工程所碩士論文,2002。
[55] S. Kullback and R. Leibler, “On information and sufficiency,” Annals of Mathematical Statistics, Vol. 22, No.1, pp. 79–86, 1951.
[56] S.-Y. Kong and L.-S. Lee, “Improved spoken document summarization using probabilistic latent semantic analysis (PLSA),” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 941–944, 2006.
[57] P.-N. Tan, M. Steinbach and V. Kumar, “Introduction to Data Mining,” Addison-Wesley, pp. 500, 2005.
[58] D. Povey and P. C. Woodland, “Minimum phone error and I-smoothing for improved discriminative training,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 105–108, 2002.
[59] H.-M. Wang, B. Chen, J.-W. Kuo and S.-S. Cheng, “MATBN: a Mandarin Chinese broadcast news corpus,” International Journal of Computational Linguistics & Chinese Language Processing, Vol. 10, No. 1, pp. 219–235, 2005.
[60] A. Stolcke, SRI Language Modeling Toolkit (http://www.speech.sri.com/projects/srilm/), 2000.
[61] D. Guthrie, B. Allison, W. Liu, L. Guthrie and Y. Wilks, “A closer look at skip-gram modelling,” in Proceedings of the international Conference on Language Resources and Evaluation (LREC), pp. 1222–1225, 2006.
[62] S. F. Chen, and J. Goodman, “An empirical study of smoothing techniques for language modeling,” in Proceedings of the Annual Meeting on Association for Computational Linguistics (ACL), pp. 310–318, 1996.
[63] J-T. Chien and C-H. Chueh, “Latent Dirichlet language model for speech recognition”, in Proceedings of IEEE Workshop on Spoken Language Technology (SLT), pp. 201-204, 2008.
[64] S.Watanabe, T. Iwata, T. Hori, A. Sako and Y. Ariki, “Topic tracking language model for speech recognition,” Journal of Computer Speech & Language, vol. 25, No. 2, pp. 440–461, 2011.