研究生: |
吳佳樺 Wu, Chia-Hua |
---|---|
論文名稱: |
探究有效偵測及修正語音辨識錯誤技術之研究 A Study on Effective Detection and Correction Techniques for Speech Recognition Errors |
指導教授: |
陳柏琳
Chen, Berlin |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 38 |
中文關鍵詞: | 語音辨識 、辨識錯誤 、錯誤偵測 、錯誤修正 、未知詞 |
英文關鍵詞: | Speech Recognition, Recognition Errors, Error Detection, Error Correction, Out-of-Vocabulary Words |
DOI URL: | http://doi.org/10.6345/NTNU202000395 |
論文種類: | 學術論文 |
相關次數: | 點閱:219 下載:21 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文著重在研究語音辨識錯誤相關的幾個重要面向,尤其是當一般的語音辨識系統應用於特殊領域下所產生的未知詞問題。為此目的,我們提出一個兩階段的方法,包括了語音錯誤偵測和錯誤內容修補。在錯誤偵測階段,我們嘗試比較多種序列標記方法去偵測不同型態的錯誤。更進一步,在錯誤修正階段,藉由上一階段所偵測的結果作為依據,利用音素比對方法以特殊領域的關鍵詞表來修正錯誤。在四種應用領域,包括教育議題、工業技術相關訪談、語音記事及會議錄音,所進行的一系列實驗。由實驗結果顯示,我們提出的方法可以使得一般語音辨識系統在上述應用領域中有某種程度上的提升。
This paper sets out to study several important aspects pertaining to speech recognition errors, especially the out-of-vocabulary (OOV) word problem that is caused by using generic speech recognition systems for a specific application domain. To this end, a two-stage processing method, involving error detection and error correction, is proposed. For error detection, we explore and compare disparate sequence labeling methods to detect possible errors of different types. Further, in the error correction stage, an effective phone-level matching mechanism along with a domain-specific keyword list is exploited to correct errors of different types detected by the previous stage. Extensive experiments conducted on four application domains, including educational issues, industrial technology-related interviews and speech memos and meeting recordings, show that our proposed methods can boot the performance of a given general speech recognition system on the aforementioned application domains to some extent.
[1] Hinton, Geoffrey, et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." IEEE Signal processing magazine 29.6 (2012): 82-97.
[2] Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
[3] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436-444.
[4] Schmidhuber, Jürgen. "Deep learning in neural networks: An overview." Neural networks 61 (2015): 85-117.
[5] Ogawa, Atsunori, and Takaaki Hori. "Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks." Speech Communication 89 (2017): 70-83.
[6] Qin, Long, Ming Sun, and Alexander Rudnicky. "OOV detection and recovery using hybrid models with different fragments." Twelfth Annual Conference of the International Speech Communication Association. 2011.
[7] Bazzi, Issam. Modelling out-of-vocabulary words for robust speech recognition. Diss. Massachusetts Institute of Technology, 2002.
[8] Bennacef, S. K., et al. "A spoken language system for information retrieval." Third International Conference on Spoken Language Processing. 1994.
[9] Mishra, Taniya, and Srinivas Bangalore. "Qme!: A speech-based question-answering system on mobile devices." Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010.
[10] Misu, Teruhisa, and Tatsuya Kawahara. "Speech-based interactive information guidance system using question-answering technique." 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07. Vol. 4. IEEE, 2007.
[11] Hori, Chiori, and Sadaoki Furui. "Advances in automatic speech summarization." Seventh European Conference on Speech Communication and Technology. 2001..
[12] Davis, Steven, and Paul Mermelstein. "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences." IEEE transactions on acoustics, speech, and signal processing 28.4 (1980): 357-366.
[13] Li, Jinyu, et al. "An overview of noise-robust automatic speech recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 22.4 (2014): 745-777.
[14] Szoke, Igor, et al. "Sub-word modeling of out of vocabulary words in spoken term detection." 2008 IEEE Spoken Language Technology Workshop. IEEE, 2008.
[15] Klakow, Dietrich, Georg Rose, and Xavier Aubert. "OOV-detection in large vocabulary system using automatically defined word-fragments as fillers." Sixth European Conference on Speech Communication and Technology. 1999.
[16] Bisani, Maximilian, and Hermann Ney. "Open vocabulary speech recognition with flat hybrid models." Ninth European Conference on Speech Communication and Technology. 2005.
[17] Schaaf, Thomas. "Detection of OOV words using generalized word models and a semantic class language model." Seventh European Conference on Speech Communication and Technology. 2001.
[18] Wessel, Frank, et al. "Confidence measures for large vocabulary continuous speech recognition." IEEE Transactions on speech and audio processing 9.3 (2001): 288-298.
[19] Sun, Hui, et al. "Using word confidence measure for OOV words detection in a spontaneous spoken dialog system." Eighth European Conference on Speech Communication and Technology. 2003.
[20] Lin, Hui, et al. "OOV detection by joint word/phone lattice alignment." 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU). IEEE, 2007.
[21] Burget, Lukas, et al. "Combination of strongly and weakly constrained recognizers for reliable detection of OOVs." 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2008.
[22] Rastrow, Ariya, Abhinav Sethy, and Bhuvana Ramabhadran. "A new method for OOV detection using hybrid word/fragment system." 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2009.
[23] Parada, Carolina, et al. "Contextual information improves OOV detection in speech." Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010.
[24] Rastrow, Ariya, Abhinav Sethy, and Bhuvana Ramabhadran. "A new method for OOV detection using hybrid word/fragment system." 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2009.
[25] Twiefel, Johannes, et al. "Improving domain-independent cloud-based speech recognition with domain-dependent phonetic post-processing." Twenty-Eighth AAAI Conference on Artificial Intelligence. 2014.
[26] Bechet, Frederic, and Benoit Favre. "Asr error segment localization for spoken recovery strategy." 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013.
[27] Bechet, Frederic, and Benoit Favre. "Asr error segment localization for spoken recovery strategy." 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013.
[28] Li, Jinyu, et al. "An overview of noise-robust automatic speech recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 22.4 (2014): 745-777.
[29] Kim, Yoon, Horacio Franco, and Leonardo Neumeyer. "Automatic pronunciation scoring of specific phone segments for language instruction." Fifth European Conference on Speech Communication and Technology. 1997.