簡易檢索 / 詳目顯示

研究生: 陳韋豪
論文名稱: 使用空間-時間之特徵分布資訊於強健性語音辨識之研究
Feature Normalization Exploiting Spatial-Temporal Distribution Characteristics for Robust Speech Recognition
指導教授: 陳柏琳
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 99
中文關鍵詞: 強健式語音辨識統計圖等化法
論文種類: 學術論文
相關次數: 點閱:127下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 統計圖等化法(Histogram Equalization, HEQ)是一種概念簡單且有效的語音強健技術。在傳統的做法中,語音特徵向量的各個維度特徵值是獨立進行正規化。換言之,大部份方法都只個別考慮每一維度特徵值與其相對應分布之統計資訊進行正規化。不僅如此,不同的統計圖等化法有各自較顯著的缺點。例如查表式統計圖等化法(Table-Lookup Histogram Equalization, THEQ)相較於分位差統計圖等化法(Quantile-Based Histogram Equalization, QHEQ),其耗費較大的記憶體空間;分位差統計圖等化法則需較大的處理器計算量。在本文吾人首先探討語音訊號與強健式語音訊號在空間與時間上之特徵分布關係,並利用該關係提出了空間與時間之特徵分布統計圖等化法(Spatial-Temporal Distribution Characteristics Histogram Equalization, STHEQ),降低不同的聲學環境所產生的偏差(Mismatch)。並且嘗試消除傳統統計圖等化法無法處理的問題,即雜訊的隨機特性(Random Behavior)對語音所產生的影響。此外,相較於前述二個傳統方法,空間與時間之特徵分布統計圖等化法所耗費之記憶體空間與處理器計算量皆顯著地下降。再者,以結合空間與時間之特徵分布資訊(Joint Spatial-Temporal Distribution Information, JSTDI)為基礎,吾人提出一個更廣泛的(General)語音特徵正規化架構,稱之為以空間與時間之特徵分布為基礎之正規化架構(Spatial-Temporal Distribution-Based Normalization Framework, STDNF)。此架構不僅能有效地結合不同正規化法,更能利用不同的空間轉換函數之求解法則來增進語音特徵參數正規化之功效。本論文之語音辨識實驗以Aurora-2語料庫為研究題材,實驗結果顯示在乾淨語料訓練模式下,吾人所提出的方法相較於基礎實驗結果,能顯著地降低字錯誤率,並且成效也較其它傳統語音強健方法來的好。

    一、序論 1 1.1 研究背景 1 1.2 強健性語音技術 2 1.3 研究內容與貢獻 12 1.4 研究內容架構 13 二、文獻回顧 15 2.1語音特徵參數轉換法 15 2.1.1資料相關線性語音特徵空間轉換 156 2.1.2語音特徵參數進行正規化 17 2.1.2.1相對頻譜法(RASTA) 17 2.1.2.2階動差正規化法(Moment Normalization) 18 2.1.2.3統計圖等化法(HEQ) 19 2.1.2.4分位差統計圖等化法(QHEQ) 22 2.1.2.5多項式擬合統計圖等化法(PHEQ) 23 2.1.2.6自動迴歸移動平均(ARMA) 24 2.2語音特徵參數補償法 26 2.2.1編碼詞相關倒頻譜正規化法(CDCN) 26 2.2.2訊噪比相關倒頻譜正規化法(SDCN) 27 2.2.3機率最佳化過濾法(POF) 27 2.2.4雙聲源為基礎分段線性補償(SPLICE) 29 2.2.5隨機特徵向量對映法(SVM) 31 2.2.6使用向量泰勒展開式(VTS)於強健性於音辨識 35 2.3語音特徵參數重建法 38 2.3.1遺失特徵重建法作用在前端語音特徵擷取上 38 2.3.2遺失特徵重建法作用在後端語音解碼上 41 三、實驗語料庫與相關基礎實驗結果 43 3.1 實驗語料庫 43 3.2 實驗設定 43 3.3 辨識效能評估方式 45 3.4 基礎實驗結果 47 四、改良方法與實驗結果 52 4.1空間與時間之特徵分布補償法 52 4.1.1 空間與時間之特徵分布統計圖轉換法(STHEQ) 53 4.1.2 空間與時間之特徵分布統計圖轉換法相關實驗結果 57 4.2核心函數平滑化(Kernel Smoother) 63 4.2.1以高斯為核心之位移式音框平滑化函數(GKSWS) 63 4.2.2以高斯為核心之位移式音框平滑化函數相關實驗結果 65 五、以空間與時間之特徵分布統計圖轉換法之一般化延伸 67 5.1空間與時間之特徵分布為基礎之正規化法 67 5.1.1語音特徵正規化 67 5.1.2目標函數 70 5.1.3以空間與時間之特徵分布為基礎之正規化架構之流程 76 5.2以空間與時間之特徵分布為基礎之正規化架構相關實驗結果 78 5.3使用不同目標函數於以空間與時間之特徵分布為基礎之正規化架構的相關實驗結果 82 六、結論與未來展望 87 6.1結論 87 6.2未來展望 88 七、參考文獻 90

    [Acero and Stern 1990] A. Acero and R.M. Stern (1990), "Environmental Robustness in Automatic Speech Recognition," In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '90), Albuquerque, New Mexico, 1990.
    [Acero and Stern 1991] A. Acero and R.M. Stern (1991), "Robust Speech Recognition by Normalization of the Acoustic Space," In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '91), Toronto, Canada, 1991.
    [Atal 1974] B. S. Atal (1974), “Effectiveness of Linear Prediction Characteristics of The Speech Wave for Automatic Speaker Identification and Verification,” J. Acoust. Soc. Am. 55(6):1304-1312, (1974)
    [Abolhassani et al. 2007] A. H. Abolhassani et al. (2007), “Speech Enhancement Using Pca and Variance of the Reconstruction Error in Distributed Speech Recognition, “ in Proc. Asru 2007.
    [Acero 1990] A. Acero. (1990) “Acoustic and environmental robustness in automatic speech recognition,” PHD these, Carnegie Mellon University, Pittsburgh, PA, U.S.A., September 1990.
    [Beyerlein et al. 2002] P. Beyerlein et al. (2002), "Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach," Speech Communication. 37: pp. 109-131, 2002.
    [Barker et al. 2001] J. Barker et al. (2001), "Robust ASR based on Clean Speech Models: An Evaluation of Missing Data Techniques for Connected Digit Recognition in Nois,." In Proc. Interspeech'2001 - 7th European Conference on Speech Communication and Technology(Eurospeech), Alaborg, Denmark, pp. 213-216, 2001.
    [Bernard et al. 2004] A. Bernard et al. (2004), “Can Back-Ends Be More Robust than Front-Ends? Investigation over The Aurora-2 Database,” In Porc. International Conference on Acoustics, Speech and Signal Processing, Pages 1025-1028, Montreal, Canada, May 2004.
    [Benesty et al. 2008] Jacob Benesty et al., (2008) “Springer Handbook of Speech Processing,” part E, 33.3, 2008.
    [Boll 1979] S.F. Boll (1979), "Supperssion of Acoutstic Noise in Speech Using Spectral Subtraction," IEEE Transactions on Acoustics, Speech, and Signal Processing. 27(2): pp. 113-120, April, 1979.
    [Chen et al. 2008] W. H. Chen, S. H. Lin and B. Chen (2008), “Exploiting Spatial-Temporal Distribution Characteristics for Robust Speech Recognition,” In Porc. International Conference on Spoken Language Processing, pages 2004-2007, Brisbane, Australia, Sep 2008.
    [Chen et al. 2002] C. P. Chen et al. (2002), “Low-Resource Noise-Robust Feature Post-Processing on Aurora 2.0,”Interspeech’2002-7th International Conference on Spoken Language Processing (ICSLP), Denver, Colorado, 2002.
    [Chen and Bilmes 2007] C. P. Chen and J. Bilmes (2007), “MVA Processing of Speech Features,”IEEE Transactions on Audio, Speech, and Signal Processing, vol. 15(1): pp. 257-270. 2007.
    [Cooke et al. 2001] M.P. Cooke et al. (2001), “Robust Automatic Speech Recognition with Missing and Unreliable Acoustic Data,” Speech Communication, 34(3):267-285, June 2001.
    [Cooke et al. 1997] M.P. Cooke et al. (1997), “Missing Data Techniques for Robust Speech Recognition,” In Proc.International Conference on Acoustics, Speech and Signal Processing, Pages 863-866, Munich, Germany, April 1997.
    [Davis 1980] S.B. Davis and P. Mermelstein (1980), "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Transactions on Acoustics, Speech, and Signal Processing. 28(4): pp. 357-366, 1980.
    [Droppo 2008] J. Droppo (2008). Tutorial of International Conference on Spoken Language Processing(Interspeech), 2008.
    [Dharanipragada and Padmanabhan 2000] S. Dharanipragada and M. Padmanabhan (2000), "A Nonlinear Unsupervised Adaptation Technique for Speech Recognition," In Interspeech'2000 - 6th International Conference on Spoken Language Processing(ICSLP). 2000: Beijing, China
    [Duda and Hart 1973] R. O. Duda and P. E. Hart (1973), Pattern Classification and Scene Analysis, John Wiley and Sons, New York, 1973
    [Duda et al. 2001] R. O. Duda, P. E. Hart and D. G. Stork, (2001), Pattern Classification, Wiley Interscience, 2001
    [Deng et al. 2000] L. Deng et al. (2000), "Large Vocabulary Speech Recognition under Adverse Acoustic Environments," In Proc. Beijing, China, 2000.
    [Droppo et al. 2002] J. Droppo et al. (2002), "Evaluation of SPLICE on the Aurora 2 and 3 Tasks" In Proc. Interspeech'2002 - 7th International Conference on Spoken Language Processing(ICSLP), Denver, Colorado, 2002.
    [Droppo et al. 2005] J. Droppo et al. (2005), "How to Train a Discriminative Front End with Stochastic Gradient and Maximum Mututal Information,." In Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU'05), San Juan, Puerto Rico 2005.
    [Droppo et al. 2001] J.Droppo, L.Deng, and A. Acero. (2001) “Evaluation of the SPLICE algorithm on the Aurora2 database ,” In Proc. European Conference no Speech Communication and Technology, pages 217-220, Aalborg, Denmark, September 2001.
    [Deng et al. 2001] L. Deng, A. Aero, L. Jiang, J. Droppo, and X.D. Huang. (2001)“High-performance robust speech recognition using stereo training data,” In Proc. International Conference on Spoken Language Processing, pages 301-304, Salt Lake City, UT, U.S.A., May 2001.
    [Droppo and Acero 2005] J. Droppo and A. Acero (2005), "Maximum Mutual Information SPLICE Transform for Seen and Unseen Conditions," In Interspeech'2005 - 9th European Conference on Speech Communication and Technology(Eurospeech). 2005: Lisbon, Portugal.
    [Ephraim and Van Trees 1995] Y. Ephraim and H.L. Van Trees. (1995) “A signal subspace approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, 3(4):251-266, July 1995.
    [Fruri 1981] S. Fruri (1981), "Cepstral Analysis Techniques for Automatic Speaker Verification," IEEE Transaction on Acoustic, Speech and Signal Processing. 29(2): pp. 254-272, 1981.
    [Gales 1995] M.J.F. Gales (1995), Model-Based Techniques for Noise Robust Speech Recognition.. PhD thesis, University of Cambridge, UK, September 1995.
    [Gales 2002] M.J.F Gales (2002), “Maximum Likelihood Multiple Subspace Projections for Hidden Markov Models,” IEEE Transactions on Speech and Audio Processing, 10(2), 2002.
    [Gales and Flego 2009] M.J.F. Gales and F. Flego (2009), “Combining VTS Model Compensation and Support Vector Machines,” In Proc. ICASSP, 2009.
    [Gauvain and Lee 1994] J.-L. Gauvain and C.-H. Lee (1994), “Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Transactions on Speech and Audio Processing, 2(2):291-298, April 1994.
    [Gibson and Gray 1989] Koo, J.D. Gibson and S.D. Gray (1989), "Filtering of Colored Noise for Speech Enhancement and Coding." In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '89), Glasgow, Scotland, pp. 349-352, 1989.
    [Gong 2003] Y. Gong (2003), “Model-Space Compensation of Microphone and Noise for Speaker-Independent Speech Recognition,” In Proc. International Conference on Acoustics, Speech and Signal Processing, Pages 660-663, Hong Kong, China, April 2003.
    [Gopinath 1998] S. Mika. "Fisher Discriminant Analysis With Kernels." In Proc. IEEE International Workshop on Neural Networks for Signal Processing, pp. 41-48, 1999.
    [Gopinath 1998] R.A. Gopinath (1998), "Maximum likelihood modeling with Gaussian distributions," In Proc. IEEE International Conference on Acoustics, Speech, Signal processing (ICASSP '98), Washington, USA, pp. 661-664, 1998.
    [Hain et al. 2005] T. Hain (2005), "Automatic Transcription of Conversational Telephone Speech," IEEE Transactions on Speech and Audio Processing. 13(6): pp. 1173-1185, 2005.
    [Hamme 2004] H.V. Hamme (2004), "Robust Speech Reocgnition Using Cepstral Domain Missing Data Techniques and Noisy Mask," In IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '04). 2004: Quebec, Canada.
    [Hermansky and Morgan 1994] H. Hermansky and N. Morgan. (1994), "RASTA processing of speech," IEEE Transactions on Speech and Audio Processing. 2(4): pp. 578-589, 1994
    [Hermansjy 1991] H. Hermansky (1991), "Perceptual Linear Predictive (PLP) Analysis of Speech," Journal of the Acoustical Society of America. 87: pp. 1738-1752, 1991.
    [Hermus and Wambacq 2004] K. Hermus and P. Wambacq. (2004) “Assessment of signal subspace based speech enhancement for noise robust speech recognition,” In Proc. International Conference on Acoustics, Speech and Signal Processing, pages 945-948, Montreal, Canada, May 2004.
    [Hilger and Ney 2006] F. Hilger and H. Ney (2006), “Quantile Based Histogram Equalization for Noise Robust Large Vocabulary Speech Recognition,” IEEE Transaction on Audio, Speech and Language Processing, vol. 14(3):845-854, 2006
    [Hilger and Ney 2001] F. Hilger and H. Ney (2001), "Quantile Based Histogram Equalization for Noise Robust Speech Recognition," In Proc. Interspeech'2001 - 7th European Conference on Speech Communication and Technology(Eurospeech), Aalborg, Denmark, 2001.
    [Hirsch and Pearce 2000] H. G. Hirsch and D. Pearce (2000), “The AURORA Experimental Framework for the Performance Evaluations for Speech Recognition Systems under Noisy Conditions,” in Proc. ISCA ITRW ASR2000, Paris France, 2000.
    [Hsu and Lee 2004] C.W. Hsu and L.S. Lee (2004), "Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition," In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '04), Quebec, Canada, pp. 197-200, 2004.
    [Hsu and Lee 2006] C.W. Hsu and L.S. Lee (2006), "Extension and Further Analysis of Higher Order Cepstral Moment Normalization (HOCMN) for Robust Features in Speech Recognitio,." In Proc. Interspeech'2006 - 9th International Conference on Spoken Language Processing (ICSLP), Pittsburgh, Pennsylvania, pp. 41-44, 2006.
    [Hung 2001] J. W. Hung et al., (2001), “Comparative Analysis for Data-Driven Temporal Filters Obtained via Principle Component Analysis (PCA) and Linear Discriminant Analysis (LDA) in Speech Recognition,” In Porc. International Conference on Spoken Language Processing, pages 1959-1962, Aalborg, Denmark, Sep 2001.
    [Huang et al., 2001] X. Huang, A. Acero et al. (2001), "Spoken Language Processing: A Guide to Theory, Algorithm and System Development," Upper Saddle River, NJ, USA: Prentice Hall PTR, 2001.
    [Huo and Zhu 2006] Q. Huo and D. Zhu (2006), "A Maximum Likelihood Training Approach to Irrelevant Variability Compensation Based on Piecewise Linear Transformations," In Proc. Interspeech'2006 - 9th International Conference on Spoken Language Processing (ICSLP), Pittsburgh, Pennsylvania, pp.1129-1132, 2006.
    [Josifovski et al. 1999] L. Josifovski et al. (1999), "State Based Imputation of Missing Data for Robust Speech Recognition and Speech Enhancemen,." In Proc. Interspeech'1999 - 6th European Conference on Speech Communication and Technology(Eurospeech), Budapest, Hungary, 1999.
    [Kuhn et al. 2000] A. H. Kuhn et al. (2000), “Rapid Speaker Adaptation in Eigenvoice Space,” IEEE Transactions on Speech and Audio Processing, 8, 2000.
    [Kocsor et al. 2000] A. Kocsor et al. (2000), “A Comparative Study of Several Feature Transformation and Learing Methods for Phoneme Classication,” International journal of speech technology 3, 263-276, 2000.
    [Kumar 1997] N. Kumar (1997), Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition, Ph. D. dissertation, John Hopkins University, Baltimore, 1997.
    [Koehler et al. 1994] J. Koehler et al. (1994), "Integrating RASTAPLP into Speech Recognitio,." In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '94), Albuquerque, New Mexico, pp. 421-424, 1994.
    [Kim et al. 1998] N. S. Kim et al. (1998), “Speech Recognition in Noisy Environments Using First-Order Vector Taylor Series,” Speech Communication, 1998.
    [Kalinli et al. 2009] O. Kalinli* et al. (2009), “Noise Adaptive Training Using A Vector Taylor Series Approach Noise Robust Automatic Speech Recognition,” In Proc. ICASSP 2009.
    [Lasry and Stern 1984] M.J. Larsry and R.M. Stern (1984), “ A Posteriori Estimation of Correlated Jointly Gaussian Mean Vectors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(4):530-535, July 1984.
    [Leggetter and Woodland 1995] C.J. Leggetter and P.C. Woodland (1995), “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,” Computer Speech and Language, 9(2):171-185, April 1995.
    [Lieb and Fischer 2001] M. Lieb and A. Fischer (2001), "Experiments with the Philips Continuous ASR System on the AURORA Noisy Digits Database," In Proc. Interspeech'2001 - 7th European Conference on Speech Communication and Technology(Eurospeech), Aalborg, Denmark, 2001.
    [Lin et al. 2006] S.H. Lin et al. (2006), “Exploiting Polynomial-Fit Histogram Equalization and Temporal Average for Robust Speech Recognition,” In Proc. International Conference on Spoken Language Processing. Pittsburgh PA. USA. September 2006.
    [Lin 2007] S.H. Lin (2007),” Exploiting the Use of Data Fitting and Clustering Techniques for Robust Speech Reocgnition,” Master Thesis, National Taiwan Normal University, Taiwan, September, 2007.
    [Lockwood and Boudy 1992] P. Lockwood and J. Boudy. (1992)“Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars,” Speech Communication, 11(2-3):215-228, June 1992.
    [Moreno et al. 1995] P.J. Moreno, B. Rai, E. Gouvea, and R.M. Stern. (1995) “Multivariate-gaussian-based cepstral normalization of robust speech recognition,” In Proc. International Conference on Acoustics, Speech and Signal Processing, pages 137-140, Detroit, MI, U.S.A., May 1995.
    [Moreno et al. 1996] P.J. Moreno, B. Raj, and R.M.J. Stern. (1996) “A Vector Taylor Series approach for environment-independent speech recognition,”In Proc. International Conference on Acoustics, Speech and Sinal Processong, pages 733-736, Atlanta, U.S.A., May 1996.
    [Moreno et al. 1996] P.J. Moreno, B. Raj, and R.M.J. Stern. (1996) “A Vector Taylor Series approach for environment-independent speech recognition,”In Proc. International Conference on Acoustics, Speech and Sinal Processong, pages 733-736, Atlanta, U.S.A., May 1996.
    [Molau 2003] S. Molau (2003), “Normalization in the Acoustic Feature Space for Improved Speech Recognition,” Ph. D. Dissertation, Computer Science Department, RWTH Aachen University, Germany, 2003.
    [Molau et al 2003] S. Molau et al. (2003),. "Feature Space Normalization in Adverse Acoustic Conditions," In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), Hong Kong, pp. 656-659, 2003.
    [Molau et al. 2001] S. Molau et al. (2001), "Histogram Based Normalization in the Acoustic Feature Space," In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '01), Trento,Italy, pp. 21-24, 2001.
    [Morgan et al. 2005] N. Morgan et al., (2005), “Pushing the Envelope – Aside” IEEE Signal Processing Magazine, vol. 22, 2005.
    [Mika 1999] S. Mika (1999), “Fisher Discriminant Analysis With Kernels." In Proc. IEEE International Workshop on Neural Networks for Signal Processing, pp. 41-48, 1999.
    [Maliki and Drygajlo 1999] M. EL-Maliki and A. Drygajlo (1999), "Missing Features Detection and Handling for Robust Speaker Verification," In Proc. Interspeech'1999 - 6th European Conference on Speech Communication and Technology(Eurospeech), Budapest, Hungary, pp. 975-978, 1999.
    [Neumeyer and. Weintraub 1994] L. Neumeyer and M. Weintraub (1994), "Probabilistic Optimum Filtering for Robust Speech Recognition." In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '94), Albuquerque, New Mexico, pp. 417-420, 1994.
    [Palomaki et al. 2004] K.J. Palomaki et al. (2004), "A Binaural Processor for Missing Data Speech Recognition in the Presence of Noise and Small-Room Reverberation," Speech Communication. 43(4): pp. 361-378, 2004
    [Raj and Stern 2005] B. Raj and R.M. Stern (2005), "Missing-feature Approaches in Speech Recognition," Signal Processing Magazine. 22(5): pp. 101-116, 2005.
    [Raj et al. 2004] B. Raj et al. (2004), "Reconstruction of Missing Features for Robust Speech Recognition," Speech Communication. 43(4): pp. 275-296, 2004.
    [Raj 2000] B. Raj, Reconstruction of Incomplete Spectrograms for Robust Speech Recognition, Ph. D. dissertation, ECE Department, Carnegie Mellon University, Pittsburgh, 2000.
    [Soan et al. 2006] G. Soan, S. Dharanipragada and D. Povey (2006), “Feature Space Gaussianization,” In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '2006), pp. 329-332, 2006.
    [Soan et al. 2000] G. Soan, et al. (2000), “Maximum Likelihood Discriminant Feature Spaces,” In Proc IEEE International Conference on Acoustics, Speech, Signal processing (ICASSP '00), Istanbul, Turkey, pp. 1129-1132, 2000.
    [Suk et al. 1999] Y.H. Suk et al. (1999), "Cepstrum Third-Order Normalisation Method for Noisy Speech Recognition," Electronics Letters. 35(7): pp. 527-528, 1999.
    [Segura et al. 2004] J.C. Segura et al. (2004), "Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speech Recognition," IEEE Signal Processing Letters. 11(5): pp. 517-520, 2004.
    [Torre and Peinado 2005] A. D. L. Torre and A. M. Peinado (2005), “Histogram Equalization of Speech Reocgnition for Robust Speech Recognition,”IEEE Trainsactions on Acoustics, Speech and Signal Processing, 13(3):355-366, May 2005
    [Torre et al. 2002] A. Torre et al., ()2002, "Non-Linear Transformations of the Feature Space for Robust Speech Recognition," In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '02), Orlando, Florida, pp. 401-404, 2002.
    [Torre et al. 2005] A. Torre, A. M. Peinado, et al. (2005), “Non-Linear Transformations of the Feature Space for Robust Speech Recognition,” In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '2006), Orlando, Floria, 2002.
    [Van hamme 2003] H. Van hamme (2003), “Robust Speech Recognition Using Missing Feature Theory in The Cepstral or Lda Domain,” In Proc. European Conference on Speech Communication and Technology, Pages 3089-3092, Geneva, Switzerland, September 2003.
    [Van hamme 2004] H. Van hamme (2004), “PROSPECT Feature and Their Application to Missing Data Techniques for Robust Speech Recognition,” In Proc. International Conference on Spoken Language Processing, Pages 101-104, Jeju Island, South-Korea, October 2004.
    [Vikki and Laurila 1998] A. Viikki and K. Laurila, (1998), “Cepstral Domain Segmental Feature Vector Normalization for Noise Robust Speech Recognition,” Speech Communication vol. 25, 1998.
    [Vizinho et al. 1999] A. Vizinho et al. (1999), "Missing Data Theory, Spectral Subtraction and Signal-to-Noise estimation for Robust ASR." In Proc. Interspeech'1999 - 6th European Conference on Speech Communication and Technology(Eurospeech), Budapest, Hungary, pp. 2407-2410, 1999.
    [Varga and Moore 1990] A.P. Varga and R.K. Moore (1990), “Hidden Markov Model Decomposition of Speech and Noise,” In Porc. International Conference on Acoustics, Speech and Signal Processing, pages 845-848, Albuquerque, NM, U.S.A., April 1990.
    [Wan and Lee 2006] C.Y. Wan and L.S. Lee (2006), "Joint Uncertainty Decoding (JUD) with Histogram-Based Quantization (HQ) for Robust and/or Distributed Speech Recognition," In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '06), Toulouse, France, pp. 125-128, 2006
    [Wan and Lee 2005] C.Y. Wan and L.S. Lee (2005), "Histogram-based Quantization (HQ) for Robust and Scalable Distributed Speech Recognition," In Proc. Interspeech'2005 - 9th European Conference on Speech Communication and Technology(Eurospeech), Lisbon, Portugal, 2005.
    [Wu et al. 2005] J. Wu et al. (2005), "An Environment Compensated Maximum Likelihood Training Approach based on Stochastic Vector Mapping," In IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '05). 2005: Philadelphia, USA.
    [Wu et al. 2006] J. Wu et al. (2006), "An Environment Compensated Maximum Likelihood Training Approach based on Stochastic Vector Mapping," In IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '05). 2005: Philadelphia, USA.
    [Xu and Chin 2009] H. Xu and K.K. Chin (2009), “Joint Uncertainty Decoding with The Second Order Approximation for Noise Robust Speech Recognition,” In Proc. ICASSP, 2009.
    [Yamaguchi et al. 1997] Y. Yamaguchi et al. (1997), “Fast Adaptation of Acoustic Models to Environmental Noise Using Jacobian Adaptation Algorithm,” In Proc. European Conference on Speech Communication and Technology, Pages 2051-2054, Rhodes, Greece, September 1997.
    [Yapanel et al. 2001] U. Yapanel et al. (2001), "Robust Digit Recognition in Noise: An Evaluation using the AURORA Corpus," In Proc. Interspeech'2001 - 7th European Conference on Speech Communication and Technology(Eurospeech), Aalborg, Denmark, 2001.
    [Yoshizawa 1992] S. Yoshizawa et al., (1992), “Ceptral Gain Normalization for Noise Robust Speech Recognition,” in Proc. ICASSP 2004.
    [Young et al. 2006] S. Young et al. (2006), “The HTK Book (for HTK Verson 3.4), Camgridge University, (2006).

    下載圖示
    QR CODE