簡易檢索 / 詳目顯示

研究生: 陳建勛
Chen, Chien-Hsun
論文名稱: 以非負矩陣分解法提升維納濾波器架構之噪聲消除效能
Improving the performance of Wiener filtering technology noise reduction based on non-negative matrix factorization
指導教授: 葉榮木
Yeh, Zong-Mu
賴穎暉
Lai, Ying-Hui
學位類別: 碩士
Master
系所名稱: 機電工程學系
Department of Mechatronic Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 58
中文關鍵詞: 語音增強雜訊追蹤噪聲消除
英文關鍵詞: Speech Enhancement, Noise Tracking, Noise Reduction
DOI URL: https://doi.org/10.6345/NTNU202204429
論文種類: 學術論文
相關次數: 點閱:75下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 語音是人類生活中傳遞訊息最直接的方法,也是人類極為重要的訊息來源。然而這些語音訊息也往往容易受到噪聲的干擾而影響了生活的品質。有鑑於此,在過去數十年間,多種噪聲消除演算法不斷的被提出來試圖來消除背景噪聲並提升語音品質。現今最為廣泛應用之噪聲消除演算法為非監督式(Unsupervised)架構,其成功的例子有: Wiener Filter, LogMMSE, KLT…等。過去的研究指出,非監督式噪聲消除法在穩態噪聲(例如:低頻穩定噪聲、粉紅色雜訊…等)情況下已有卓越的表現,但對於非穩態噪聲(例如:人聲噪聲)類型的消噪能力仍存在許多挑戰。近年,許多學者開始採用監督式(Supervised)噪聲消除法來達成噪聲消除,以克服非監督式噪聲消除法之缺點,成功的例子例如:Deep Denoisy Autoencoder (DDAE)法。當訓練語料量足夠的情況下,DDAE法比非監督式噪聲消除法有更佳的噪聲消除能力。然而,在不易取得大量訓練語料的情況下,此方法在應用上將有所限制。有鑑於此,本研究提出一個新式的噪聲消除演算法,稱Adaptive Wiener-NMF (AWNMF),以解決上述非監督式與監督式噪聲消除法之缺點,如:(1).非穩態噪聲情境下之效益不彰、(2).需大量訓練語料進行訓練。由多項的客觀聲音評估指標(PESQ, SSNRI,SDI)證明,多種噪聲環境下(例如:嬰兒哭聲、警笛 …..等等),本論文所提出之AWNMF演算法比目前常見之噪聲消除法(例如: LogMMSE, KLT, Wiener, NMF-Based)有更佳之噪聲消除效益。此外,當訓練語句極少之情況下,本論文發展出之AWNMF演算法比DDAE噪聲消除法有更佳之噪聲消除效益。總結上述的研究結果,AWNMF演算法將是一個創新且有效之噪聲消除法。

    Speech is one of the most direct ways for humans to communicate. However, vocal messages are often susceptible to noise, perhaps even to the extent that one’s quality of life may get affected. In the past few decades, a variety of noise reduction algorithms have been developed with the aim of eliminating background noise in order to improve the quality of speech sounds. The unsupervised algorithms, such as Wiener filter, logMMSE, KLT, etc., are among the most widely used and successful noise reduction (NR) techniques. They have been reported in numerous studies and the unsupervised algorithms exhibit outstanding performances under the stationary-noise-environment, e.g. low-frequency noise, pink noise, etc.. Even so, there are still many challenges for unsupervised noise reduction under non-stationary noise conditions. Take vocal noise for example. Recently, many researchers have employed the supervised noise reduction technique to reduce non-stationary noise while also attempting to overcome the disadvantages of unsupervised noise reduction methods. With sufficiently large data training, the deep denoisy autoencoder (DDAE) method has been shown to perform well on noise reduction. However, due to the low availability of training data, its application would be limited. In our work, we propose a new noise reduction algorithm, called Adaptive Wiener-NMF (AWNMF), to solve the problems of both the unsupervised and supervised noise reduction methods: poor performance for non-stationary noise and the requirement of training data. We show that the AWNMF method has better performance than the common method in the analyses of the sound of objective evaluation index (PESQ, SSNRI, SDI) and the DDAE method when lacking in training data. In conclusion, we have developed an innovative and effective noise reduction method.

    摘 要 I ABSTRACT III 誌謝 V 目 錄 VI 表目錄 VIII 圖目錄 IX 第一章 緒論 1 1.1前言 1 1.2研究目的 2 1.3 論文架構 3 第二章 文獻回顧 4 2.1非監督式噪聲消除架構 4 2.1.1 噪聲追蹤 5 2.1.2增益函數 8 2.2監督式噪聲消除系統 15 2.2.1非負矩陣分解法 16 2.2.2深層降噪自動編碼演算法 19 2.3 最佳化參數搜尋演算法:基因演算法 21 第三章 研究方法 23 3.1 以非負矩陣分解法為基礎之雜訊估計 23 3.2 增益模型: 24 3.3參數曲線: 25 第四章 實驗設計與結果 27 4.1 實驗環境 27 4.2 研究一: AWNMF與現型常見之噪聲消除法效益比較 29 4.3 研究二: 訊噪比對AWNMF之影響 31 4.4 研究三: AWNMF噪聲消除法於多種非穩態噪聲下之效益表現 32 4.5 研究四: 少量訓練語料下之效益研究 43 第五章 結論與未來展望 55 參考文獻 56

    [1] Y. Xu, J. Du, L. R. Dai, and C. H. Lee, "An Experimental Study on Speech Enhancement Based on Deep Neural Networks," IEEE Signal Processing Letters, vol. 21, pp. 65-68, 2014.
    [2] J. Chen, J. Benesty, Y. Huang, and E. Diethorn, "Fundamentals of noise reduction in spring handbook of speech processing," ed: Springer, 2008.
    [3] S. S. Wang, H. T. Hwang, Y. H. Lai, Y. Tsao, X. Lu, H. M. Wang, et al., "Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm," in 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015, pp. 365-369.
    [4] S. Rangachari and P. C. Loizou, "A noise-estimation algorithm for highly non-stationary environments," Speech communication, vol. 48, pp. 220-231, 2006.
    [5] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, "Speech enhancement based on deep denoising autoencoder," in Interspeech, 2013, pp. 436-440.
    [6] Y. Xu, J. Du, L. R. Dai, and C. H. Lee, "A Regression Approach to Speech Enhancement Based on Deep Neural Networks," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, pp. 7-19, 2015.
    [7] N. Mohammadiha, P. Smaragdis, and A. Leijon, "Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, pp. 2140-2151, 2013.
    [8] J.-H. Chang, J. W. Shin, and N. S. Kim, "Likelihood ratio test with complex laplacian model for voice activity detection," in INTERSPEECH, 2003.
    [9] J. W. Shin, H. J. Kwon, S. H. Jin, and N. S. Kim, "Voice Activity Detection Based on Conditional MAP Criterion," IEEE Signal Processing Letters, vol. 15, pp. 257-260, 2008.
    [10] D. Malah, R. V. Cox, and A. J. Accardi, "Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments," in Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on, 1999, pp. 789-792 vol.2.
    [11] J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE signal processing letters, vol. 6, pp. 1-3, 1999.
    [12] J.-H. Chang, J. Shin, and N. Kim, "Voice activity detector employing generalised Gaussian distribution," Electron. Lett, vol. 40, pp. 1561-1563, 2004.
    [13] J. W. Shin, J.-H. Chang, and N. S. Kim, "Voice activity detection based on a family of parametric distributions," Pattern Recogn. Lett., vol. 28, pp. 1295-1299, 2007.
    [14] I. Cohen, "Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging," IEEE Transactions on Speech and Audio Processing, vol. 11, pp. 466-475, 2003.
    [15] N. Fan, J. Rosca, and R. Balan, "Speech Noise Estimation using Enhanced Minima Controlled Recursive Averaging," in 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, 2007, pp. IV-581-IV-584.
    [16] 蘇煜程, "基於頻譜回復技術之語音增強," 碩士論文,國立台北科技大學, 台灣,台北, 2011.
    [17] Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, pp. 443-445, 1985.
    [18] I. Cohen, "Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator," IEEE Signal Processing Letters, vol. 9, pp. 113-116, 2002.
    [19] Y. Ephraim and H. L. V. Trees, "A signal subspace approach for speech enhancement," IEEE Transactions on Speech and Audio Processing, vol. 3, pp. 251-266, 1995.
    [20] H. Yi and P. C. Loizou, "A generalized subspace approach for enhancing speech corrupted by colored noise," IEEE Transactions on Speech and Audio Processing, vol. 11, pp. 334-341, 2003.
    [21] A. Rezayee and S. Gazor, "An adaptive KLT approach for speech enhancement," IEEE Transactions on Speech and Audio Processing, vol. 9, pp. 87-95, 2001.
    [22] D. G. Luenberger and Y. Ye, Linear and nonlinear programming vol. 2: Springer, 1984.
    [23] P. C. Loizou, Speech enhancement: theory and practice: CRC press, 2013.
    [24] D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," in Advances in neural information processing systems, 2001, pp. 556-562.
    [25] D. Whitley, "A genetic algorithm tutorial," Statistics and computing, vol. 4, pp. 65-85, 1994.
    [26] F. Guely and P. Siarry, "Gradient descent method for optimizing various fuzzy rule bases," in Fuzzy Systems, 1993., Second IEEE International Conference on, 1993, pp. 1241-1246 vol.2.
    [27] W.-T. Pan, "A new Fruit Fly Optimization Algorithm: Taking the financial distress model as an example," Knowledge-Based Systems, vol. 26, pp. 69-74, 2// 2012.
    [28] Y. Hu and P. C. Loizou, "Evaluation of Objective Quality Measures for Speech Enhancement," IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, pp. 229-238, 2008.
    [29] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," in Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on, 2001, pp. 749-752 vol.2.
    [30] L. L. Wong, S. D. Soli, S. Liu, N. Han, and M.-W. Huang, "Development of the Mandarin hearing in noise test (MHINT)," Ear and hearing, vol. 28, pp. 70S-74S, 2007.

    下載圖示
    QR CODE