簡易檢索 / 詳目顯示

研究生: 蘇鈺婷
論文名稱: 以噪音分類為基礎之深度學習噪音消除法提升人工電子耳使用者之語音理解度表現
A noise classification-based deep learning noise reduction approach to improve speech intelligibility for cochlear implant recipients
指導教授: 葉榮木
Yeh, Zong-Mu
賴穎暉
Lai, Ying-Hui
學位類別: 碩士
Master
系所名稱: 機電工程學系
Department of Mechatronic Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 74
中文關鍵詞: 人工電子耳噪音消除DDAE噪音分類器深度學習
英文關鍵詞: cochlear implant, noise reduction, deep denoising autoencoder, noise classification, deep learning
DOI URL: https://doi.org/10.6345/NTNU202202926
論文種類: 學術論文
相關次數: 點閱:185下載:14
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 人工電子耳(cochlear implant, CI)是現今唯一可幫助全聾患者重新聽見聲音的重要科技。於過去的研究證明人工電子耳於安靜的溝通環境下能有效的幫助患者提升語音理解能力。但在噪音環境下,其效益仍存在許多改進空間,並期望能發展出更有效的訊號處理來提升使用者之滿意度。近年,一個基於深度學習理論所發展出的噪音消除方法被提出,即是 deep denoising autoencoder(DDAE)。其研究成果證明,DDAE 噪音消除法在人工電子耳模擬測試下,有顯著的語音理解力的改善效益。但對於真實人工電子耳使用者來說,其 DDAE 之效益仍未有研究證據。有鑑於此,本論文將基於 DDAE 噪 音 消 除 法 進 行 改 良 , 並 提 出 一 個 新 的 噪 音 消 除 方 法 , 稱 noise classification+DDAE (NC+DDAE)。此外,也將所提出之方法進行真實人工電子耳使用者之臨床效益驗證。從客觀之聲電指標驗證及語音聽辨力測試結果發現,在噪音環境下,NC+DDAE 能比兩個常見的傳統噪音消除法(logMMSE, KLT)有更佳之語音理解力表現,特別是噪音是己知情況。更具體的來說,當噪音情境是已知時,其 NC+DDAE 分別在不同測試條件下能比其他方法最多提升了 41.5 %之語音理解度表現;當噪音情境是未知的情況下其 NC+DDAE 能比其他方法最多提升了 17.5 %之語音理解度表現。有鑑於上述之結果證明,本論文所提出之 NC+DDAE 噪音消除法將能有效的提升人工電子耳使用者於噪音情境下之聆聽效益。

    Cochlear implant (CI) is the only technology to help deaf hearing loss individual to hear sound again. Previous studies demonstrate that the CI technologies has enabled many CI users to enjoy a high level of speech understanding in quiet; however, for the most CI users, listening under noisy conditions remains challenging and desire the efficient signal processing be proposed to overcome this issue. More recently, deep learning-based NR approach, called deep denoising autoencoder (DDAE), have been proposed and confirmed to be effective in various NR tasks. In addition, the previous study indicated that the DDAE-based NR approach yielded higher intelligibility scores than those obtained with conventional NR techniques in CI simulation; however, the efficacy of the DDAE NR approach for real CI recipients remains unevaluated. In view of this, this study further to evaluate the performance of DDAE-based NR in real CI subject. In addition, a new DDAE-based NR model, called NC+DDAE, has been proposed in this study to further improve the intelligibility performance for CI users. The experimental results of objective evaluation and listening test indicate that, under challenging listening conditions, the proposed NC+DDAE NR approach yields higher intelligibility scores than two classical NR techniques (i.e., logMMSE, KLT), especially under match training condition. More specifically, the NC+DDAE improve speech recognition up to 41.5 % and 17.5 % at most under test conditions when the noise has ever been provided and never provided in training phase. The present study demonstrates that, under challenging listening conditions, the proposed NC+DDAE NR approach could improve speech recognition more effectively when compared to conventional NR techniques. Furthermore, the results shows that NC+DDAE has superior noise suppression capabilities, and provides less distortion of speech envelope information for Mandarin CI recipients, compared to conventional techniques. Therefore, the proposed NC+DDAE NR approach can potentially be integrated into existing CI processors to overcome the degradation of speech perception caused by noise.

    摘要 I Abstract II 誌謝 IV 目錄 V 圖目錄 VIII 表目錄 X 第一章 緒論 1 1.1 研究動機 1 1.2 研究目的 3 1.3 論文架構 4 第二章 文獻回顧 5 2.1 聽覺系統 5 2.2 聽力損失的原因與分類 6 2.3 人工電子耳設備介紹 7 2.3.1 人工電子耳使用者聽覺系統 8 2.3.2 人工電子耳之語音處理器 9 2.4 機械學習 11 2.4.1 人工神經網路 13 2.4.2 深層類神經網路 17 2.4.3 深度類神經網路之聲音情境分類 18 2.5 單麥克風之噪音消除策略 23 2.5.1 非監督式 24 2.5.2 監督式噪音消除法: DDAE 29 2.6 主、客觀效益評估方法 36 2.6.1 分類器準確率 36 2.6.2 客觀的聲電分析:NCM 36 2.6.3 語音辨識力測試 37 第三章 研究方法 39 3.1 提出方法:以噪音分類為基礎之深度學習噪音消除法 39 3.2 實驗設計一:客觀聲電效益驗證 41 3.2.1 噪音分類器 41 3.2.2 深度學習架構之噪音消除法 45 3.3 實驗設計二:臨床效益驗證 48 3.3.1 臨床實驗方法 48 3.3.2 評分方法 49 3.3.3 臨床實驗流程 49 3.4 受測者 50 3.5 實驗語料 53 3.5.1 訓練語料 54 3.5.2 測試語料 55 3.5.3 練習語料 56 第四章 實驗結果與討論 57 4.1 分類器之辨識效益 57 4.2 語音頻譜與胞絡線分析 60 4.3 客觀聲電效益驗證:NCM 63 4.4 臨床測試實驗結果 65 第五章 結論與未來展望 68 5.1 結論 68 5.2 未來展望 68 參考文獻 70

    [1] 行政院主計總處, "105 年 9 月底領有身心障礙手冊人數統計," 2017,
    Available:
    https://www.stat.gov.tw/public/Data/7120162454CA7EZUC2.pdf.
    [2] WHO, "Deafness and hearing loss," 2017, Available:
    http://www.who.int/mediacentre/factsheets/fs300/en/.
    [3] M. Bansal, Diseases of ear, nose and throat. JP Medical Ltd, 2012.
    [4] J. G. Clark, "Uses and abuses of hearing loss classification," Asha, vol. 23,
    no. 7, p. 493, 1981.
    [5] F. Chen, Y. Hu, and M. Yuan, "Evaluation of Noise Reduction Methods for
    Sentence Recognition by Mandarin-Speaking Cochlear Implant Listeners,"
    Ear and hearing, vol. 36, no. 1, pp. 61-71, 2015.
    [6] P. Loizou, "Speech processing in vocoder-centric cochlear implants," in
    Cochlear and brainstem implants, vol. 64: Karger Publishers, 2006, pp.
    109-143.
    [7] K. Nie, G. Stickney, and F.-G. Zeng, "Encoding frequency modulation to
    improve cochlear implant performance in noise," IEEE Transactions on
    Biomedical Engineering, vol. 52, no. 1, pp. 64-73, 2005.
    [8] M. W. Skinner, P. L. Arndt, and S. J. Staller, "Nucleus® 24 Advanced
    Encoder conversion study: Performance versus preference," Ear and
    Hearing, vol. 23, no. 1, pp. 2S-17S, 2002.
    [9] I. S. Kerber and I. B. U. Seeber, "Sound localization in noise by normal-
    hearing listeners and cochlear implant users," Ear and hearing, vol. 33, no.
    4, p. 445, 2012.
    [10] L. S. Eisenberg et al., "Sentence recognition in quiet and noise by pediatric
    cochlear implant users: Relationships to spoken language," Otology &
    Neurotology, vol. 37, no. 2, pp. e75-e81, 2016.
    [11] A. Rezayee and S. Gazor, "An adaptive KLT approach for speech
    enhancement," IEEE Transactions on Speech and Audio Processing, vol. 9,
    no. 2, pp. 87-95, 2001.
    [12] Y. Hu and P. C. Loizou, "A generalized subspace approach for enhancing
    speech corrupted by colored noise," IEEE Transactions on Speech and
    Audio Processing, vol. 11, no. 4, pp. 334-341, 2003.
    [13] Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-
    square error log-spectral amplitude estimator," IEEE Transactions on
    Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443-445, 1985.
    [14] S. Kamath and P. Loizou, "A multi-band spectral subtraction method for
    enhancing speech corrupted by colored noise," in ICASSP, 2002, vol. 4, pp.
    44164-44164: Citeseer.
    [15] P. Scalart, "Speech enhancement based on a priori signal to noise
    estimation," in Acoustics, Speech, and Signal Processing, 1996. ICASSP-
    96. Conference Proceedings., 1996 IEEE International Conference on,
    1996, vol. 2, pp. 629-632: IEEE.
    [16] G. S. Stickney, F.-G. Zeng, R. Litovsky, and P. Assmann, "Cochlear implant
    speech recognition with speech maskers," The Journal of the Acoustical
    Society of America, vol. 116, no. 2, pp. 1081-1091, 2004.
    [17] Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, "A regression approach to speech
    enhancement based on deep neural networks," IEEE/ACM Transactions on
    Audio, Speech and Language Processing (TASLP), vol. 23, no. 1, pp. 7-19,
    2015.
    [18] G. Hinton et al., "Deep neural networks for acoustic modeling in speech
    recognition: The shared views of four research groups," IEEE Signal
    Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
    [19] P. Y. Simard, D. Steinkraus, and J. C. Platt, "Best Practices for
    Convolutional Neural Networks Applied to Visual Document Analysis," in
    ICDAR, 2003, vol. 3, pp. 958-962: Citeseer.
    [20] D. Ciregan, U. Meier, and J. Schmidhuber, "Multi-column deep neural
    networks for image classification," in Computer Vision and Pattern
    Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 3642-3649:
    IEEE.
    [21] Y. Xu, Q. Huang, W. Wang, and M. D. Plumbley, "Hierarchical learning for
    DNN-based acoustic scene classification," arXiv preprint
    arXiv:1607.03682, 2016.
    [22] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, "Speech enhancement based on
    deep denoising autoencoder," in Interspeech, 2013, pp. 436-440.
    [23] Y.-H. Lai, F. Chen, S.-S. Wang, X. Lu, Y. Tsao, and C.-H. Lee, "A Deep
    Denoising Autoencoder Approach to Improving the Intelligibility of
    Vocoded Speech in Cochlear Implant Simulation," IEEE Transactions on
    Biomedical Engineering, 2016.
    [24] A. Moctezuma and J. Tu, "An overview of cochlear implant systems," BIOE,
    vol. 414, pp. 1-20, 2011.
    [25] P. C. Loizou, "Introduction to cochlear implants," IEEE Engineering in
    Medicine and Biology Magazine, vol. 18, no. 1, pp. 32-42, 1999.
    [26] A. S.-L.-H. Association, "Type, Degree, and Configuration of Hearing
    Loss," 2015.
    [27] B. C. Papsin and K. A. Gordon, "Cochlear implants for children with
    severe-to-profound hearing loss," New England Journal of Medicine, vol.
    357, no. 23, pp. 2380-2387, 2007.
    [28] Cochlear's- implant portfolio. Available:
    http://www.cochlear.com/wps/wcm/connect/au/home/discover/cochlear-
    implants/the-nucleus-6-system/cochlears-implant-portfolio
    [29] I. J. Hochmair‐Desoyer, E. S. Hochmair, and K. Burian, "DESIGN AND
    FABRICATION OF MULTIWIRE SCALA TYMPANI ELECTRODESa,"
    Annals of the New York Academy of Sciences, vol. 405, no. 1, pp. 173-182,
    1983.
    [30] M. W. Skinner et al., "Evaluation of a new spectral peak coding strategy for
    the Nucleus 22 Channel Cochlear Implant System," Otology & Neurotology,
    vol. 15, pp. 15-27, 1994.
    [31] M. VONDRÁŠEK, P. Sovka, and T. TICHÝ, "ACE Strategy with Virtual
    Channels," Radioengineering, vol. 17, no. 4, 2008.
    [32] A. C. S. Kam, I. H. Y. Ng, M. M. Y. Cheng, T. K. C. Wong, and M. C. F.
    Tong, "Evaluation of the ClearVoice strategy in adults using HiResolution
    fidelity 120 sound processing," Clinical and experimental
    otorhinolaryngology, vol. 5, no. Suppl 1, p. S89, 2012.
    [33] G. Clark, Cochlear implants: fundamentals and applications. Springer
    Science & Business Media, 2006.
    [34] P. P. Khing, B. A. Swanson, and E. Ambikairajah, "The effect of automatic
    gain control structure and release time on cochlear implant speech
    intelligibility," PloS one, vol. 8, no. 11, p. e82263, 2013.
    [35] P. J. Blamey, "Adaptive dynamic range optimization (ADRO): a digital
    amplification strategy for hearing aids and cochlear implants," Trends in
    amplification, vol. 9, no. 2, pp. 77-98, 2005.
    [36] F.-G. Zeng and R. V. Shannon, "Psychophysical laws revealed by electric
    hearing," Neuroreport, vol. 10, no. 9, pp. 1931-1935, 1999.
    [37] J. H. Johnson, C. W. Turner, J. J. Zwislocki, and R. H. Margolis, "Just
    noticeable differences for intensity and their relation to loudness," The
    Journal of the Acoustical Society of America, vol. 93, no. 2, pp. 983-991,
    1993.
    [38] F.-G. Zeng and R. V. Shannon, "Loudness balance between electric and
    acoustic stimulation," Hearing research, vol. 60, no. 2, pp. 231-235, 1992.
    [39] Y.-H. Lai, Y. Tsao, and F. Chen, "Effects of adaptation rate and noise
    suppression on the intelligibility of compressed-envelope based speech,"
    PloS one, vol. 10, no. 7, p. e0133519, 2015.
    [40] Y. Kodratoff, Introduction to machine learning. Morgan Kaufmann, 2014.
    [41] E. Eyob, Social Implications of Data Mining and Information Privacy:
    Interdisciplinary Frameworks and Solutions: Interdisciplinary
    Frameworks and Solutions. IGI Global, 2009.
    [42] S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Transactions
    on knowledge and data engineering, vol. 22, no. 10, pp. 1345-1359, 2010.
    [43] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no.
    7553, pp. 436-444, 2015.
    [44] S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, "Supervised machine learning:
    A review of classification techniques," ed, 2007.
    [45] C. E. Rasmussen and C. K. Williams, Gaussian processes for machine
    learning. MIT press Cambridge, 2006.
    [46] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural
    networks," in Proceedings of the Fourteenth International Conference on
    Artificial Intelligence and Statistics, 2011, pp. 315-323.
    [47] J. Malik and P. Perona, "Preattentive texture discrimination with early
    vision mechanisms," JOSA A, vol. 7, no. 5, pp. 923-932, 1990.
    [48] K. Fukushima and S. Miyake, "Neocognitron: A self-organizing neural
    network model for a mechanism of visual pattern recognition," in
    Competition and cooperation in neural nets: Springer, 1982, pp. 267-285.
    [49] J. Schmidhuber, "Deep learning in neural networks: An overview," Neural
    networks, vol. 61, pp. 85-117, 2015.
    [50] A. Narayanan and D. Wang, "Ideal ratio mask estimation using deep neural
    networks for robust speech recognition," in Acoustics, Speech and Signal
    Processing (ICASSP), 2013 IEEE International Conference on, 2013, pp.
    7092-7096: IEEE.
    [51] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, "Ensemble modeling of denoising
    autoencoder for speech spectrum restoration," in INTERSPEECH, 2014, vol.
    14, pp. 885-889.
    [52] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural
    network," arXiv preprint arXiv:1503.02531, 2015.
    [53] L. Muda, M. Begam, and I. Elamvazuthi, "Voice recognition algorithms
    using mel frequency cepstral coefficient (MFCC) and dynamic time
    warping (DTW) techniques," arXiv preprint arXiv:1003.4083, 2010.
    [54] P. C. Loizou, Speech enhancement: theory and practice. CRC press, 2013. [55] Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean
    square error short-time spectral amplitude estimator," IEEE Transactions
    on Acoustics, Speech, and Signal Processing, vol. 32, no. 6, pp. 1109-1121,
    1984.
    [56] J. Du and Q. Huo, "A speech enhancement approach using piecewise linear
    approximation of an explicit model of environmental distortions," in Ninth
    Annual Conference of the International Speech Communication Association,
    2008.
    [57] J. Ma, Y. Hu, and P. C. Loizou, "Objective measures for predicting speech
    intelligibility in noisy conditions based on new band-importance functions,"
    The Journal of the Acoustical Society of America, vol. 125, no. 5, pp. 3387-
    3405, 2009.
    [58] H. Jiang, "Confidence measures for speech recognition: A survey," Speech
    communication, vol. 45, no. 4, pp. 455-470, 2005.
    [59] S. Ideas, "Sample CD: XV MP3 Series SI-XV-MP3. In.," ed, 2002.
    [60] L. Ma, B. Milner, and D. Smith, "Acoustic environment classification,"
    ACM Transactions on Speech and Language Processing (TSLP), vol. 3, no.
    2, pp. 1-22, 2006.
    [61] R. Y. Rubinstein, A. Ridder, and R. Vaisman, "Cross‐Entropy Method," Fast
    Sequential Monte Carlo Methods for Counting and Optimization, pp. 6-36.
    [62] D. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv
    preprint arXiv:1412.6980, 2014.
    [63] W. R. Wilson, F. M. Byl, and N. Laird, "The efficacy of steroids in the
    treatment of idiopathic sudden hearing loss: a double-blind clinical study,"
    Archives of Otolaryngology, vol. 106, no. 12, pp. 772-776, 1980.
    [64] L. K. Holden et al., "Factors affecting open-set word recognition in adults
    with cochlear implants," Ear and hearing, vol. 34, no. 3, p. 342, 2013.
    [65] 黃銘緯, "台灣地區噪音下漢語語音聽辨測試," 2005.
    [66] S. Haykin, Advances in spectrum analysis and array processing (vol. III).
    Prentice-Hall, Inc., 1995.
    [67] R. V. Shannon, F.-G. Zeng, V. Kamath, J. Wygonski, and M. Ekelid,
    "Speech recognition with primarily temporal cues," Science, vol. 270, no.
    5234, p. 303, 1995.
    [68] F.-G. Zeng et al., "Speech dynamic range and its effect on cochlear implant
    performance," The Journal of the Acoustical Society of America, vol. 111,
    no. 1, pp. 377-386, 2002.

    下載圖示
    QR CODE