研究生: |
賴子婷 Lai, Tzu-Ting |
---|---|
論文名稱: |
英文初學者發音自動評分之研究 The Research of Automatic Pronunciation Evaluation for Beginners |
指導教授: |
李忠謀
Lee, Chung-Mou |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 中文 |
論文頁數: | 44 |
中文關鍵詞: | 語音辨識 、語言學習 、字串相似度 、發音評估 |
英文關鍵詞: | Speech Recognition, Language learning, String matching, Pronunciation evaluation |
DOI URL: | https://doi.org/10.6345/NTNU202203532 |
論文種類: | 學術論文 |
相關次數: | 點閱:219 下載:62 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
電腦輔助發音訓練(Computer Assisted Pronunciation Training,CAPT)是常用的一種語言學習方式,可以針對初學者的英文發音提供回饋讓初學者可以反覆的練習。本研究利用語音辨識以及字串相似度比對的技術,建置一個適合初學者英文發音的辨識模型用以輔助初學者發音練習。
本研究包含兩部分,第一部分為建置語音辨識模型,使用本研究自行錄製的JTES語料庫建置初始模型,再挑選JTJS中較優初學者的語音進行模型調適,作為整體的語音辨識模型;第二部分為評估是採用字串比對方式藉由本研究所提出的Levenshtein Distance-Like作為相似度計算且藉由cubic polynomial fit找到四個等級(好、尚可、待加強、重錄)的門檻值。
實驗結果呈現,當分成四個等級時人工評分與系統評分的正確率為75%,代表系統有一定的準確率,透過皮爾森係數得知人工評分與系統評分的相關性為0.71,呈現人工評分與系統評分是具有相關的,因此系統給予的回饋對於初學者是有一定的可信度,可以藉由此來提升口說技能。
“Computer Assisted Pronunciation Training “program is primary designed to assist students in language learning. The program provides the feedback based on each individual need and it helps beginners to repeat practice proper pronunciation. The research utilizes the speech recognition and string matching to build speech recognize model for beginners to practice pronunciation.
Research consisted two main parts. First part is to build speech recognize model, which is to record JTES corpus. The next step is to select the top speeches in JTJS corpus to do model adaption. The second part is to evaluate speeches by using string-matching method. We proposal Levenshtein Distance-Like approach and using cubic polynomial fit to find threshold. Those approaches help us to separate into four levels of the evaluating standards (excellent, average, inferior, and re-recording).
The result from the experiment shows the accuracy of evaluating process is around 75% when the program is separated into four levels. This is supported by both human and systematic evaluation. Based on the analysis of Pearson correlation, correlation between human and system evaluation is 0.71, which mean two variables are correlated. Therefore, the system is credible for beginners to learn and enhance their verbal skill.
[1] Murray, D. E., "A Case for Online English Language Teacher Education," The International Research Foundation for English Language Education 2013
[2] Coniam D., "Voice Recognition Software Accuracy with Second Language Speakers of English," System 27 1999, p49-64
[3] Nguyen, H., et. al., “Automatic Speech Recognition for Vietnamese Using HTK System”, International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), Hanoi, November 2010
[4] ISLE, "Interactive Spoken Language Education", University of Hamburg. [Online:http://nats-www.informatik.uni-hamburg.de/~isle/]
[5] Franco H., Abrash V., Precoda K., Bratt H., Rao R., Butzberger J., "The SRI EduSpeak System: Recognition and Pronunciation Scoring for Language Learning", Proceedings of INSTIL 2000, p123-128.
[6] Mak, B.S., Ng, M., Tam, Y.-C., Chan, Y.-C., Chan, K.-W., Leung, K.Y., Ho, S., Chong, F.H., Wong, J., Lo, J., "PLASER: Pronunciation Learning via Automatic Speech Recognition,", Proceedings of HLT-NAACL 2003, p23-29
[7] 羅瑞麟,"以語音辨識與評分輔助口說英文學習",國立清華大學碩士論文,2004年
[8] Mel-scale Frequency Cepstral [Online:http://en.wikipedia.org/wiki/Mel-frequency_cepstrum]
[9] Katz, S. M., "Estimation of probabilities from sparse data for the language model component of a speech recognizer," Proceedings of IEEE Transactions on Acoustics, Speech and Signal Processing 1987, p400-401
[10] Kneser, R., Ney, H., "Improved backing-off for m-gram language modeling," Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 1995, p181-184
[11] Chen, S. F., Goodman, J., "An empirical study of smoothing techniques for language modeling," Proceedings of the 34th annual meeting on Association for Computational Linguistics 1996, p310-318
[12] 李俊毅,"語音評分",國立清華大學碩士論文,2002年
[13] Witt, S.M., Young, S., "Phone-level Pronunciation Scoring and Assessment for Interactive Language Learning," Speech Communication 2000, 95-108
[14] Chen, L., Zechner, K., "Applying rhythm features to automatically assess non-native speech," Proceedings of Interspeech 2011, p1861-1864
[15] Google Voice Search [Online:http://www.google.com/intl/zh-TW/insidesearch/features/voicesearch/index-chrome.html]
[16] VoiceGo [Online:http://www.cyberon.com.tw/traditional/index.html.php]
[17] Windows SAPI [Online:https://msdn.microsoft.com/zh-CN/library/ms862685.aspx]
[18] HTKBook[Online:Online:http:// htk.eng.cam.ac.uk/docs/docs.shtml]
[19] HTK Introduction [Online:http://mirlab.org/jang/books/audiosignalprocessing/htkIntro_chinese.asp?title=16-1%20HTK%20Introduction%20(HTK%20%C2%B2%A4%B6]
[20] Evanini K., Wang X., "Automated speech scoring for non-native middle school students with multiple task types," Proceedings of Interspeech 2013, p2435-243
[21] Allison L., Dix, T.I., "A bit-string-longest-common-subsequence algorithm. Information Processing Letters," Information Processing Letters 1986, p305-310
[22] Apostolico, A., Guerra, C., Pizzi, C., "Alignment Free Sequence Similarity with Bounded Hamming Distance," Proceedings of 2014 Data Compression Conference, Compression Conference 2014, p183-192
[23] TEIN, V I., "Binary codes capable of correcting deletions, insertions, and reversals," Soviet Physics Doklady 1966, p707–710
[24] Chowdhury, S.D., Bhattacharya, U., Parui, S.K, "Online handwriting recognition using Levenshtein distance metric," Proceedings of the 12th International Conference on Document Analysis and Recognition 2013, p79-83