Author: |
郭人瑋 Jen-Wei Kuo |
---|---|
Thesis Title: |
最小化音素錯誤鑑別式聲學模型學習於中文大詞彙連續語音辨識之初步研究 An Initial Study on Minimum Phone Error Discriminative Learning of Acoustic Models for Mandarin Large Vocabulary Continuous Speech Recognition |
Advisor: |
陳柏琳
Chen, Berlin |
Degree: |
碩士 Master |
Department: |
資訊工程學系 Department of Computer Science and Information Engineering |
Thesis Publication Year: | 2005 |
Academic Year: | 93 |
Language: | 中文 |
Number of pages: | 154 |
Keywords (in Chinese): | 最小化音素錯誤 、大詞彙連續語音辨識 、聲學模型訓練 、聲學模型調適 、最大化交互資訊 |
Keywords (in English): | MPE, LVCSR, Acoustic Model Training, Acoustic Model Adaptation, MMI |
Thesis Type: | Academic thesis/ dissertation |
Reference times: | Clicks: 184 Downloads: 33 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
近來,有不少文獻針對鑑別式聲學模型訓練加以研究改進,本論文則延伸最小化音素錯誤(Minimum Phone Error, MPE)聲學模型訓練及調適,並使之應用在中文大詞彙連續語音辨識上。本論文以公視新聞外場記者語料作為實驗平台,在實驗中,先對聲學模型進行最大化相似度(Maximum Likelihood, ML)聲學模型訓練,再來則比較最小化音素錯誤與最大化交互資訊(Maximum Mutual Information, MMI)兩種鑑別式訓練,最小化音素錯誤訓練相較於最大化相似度訓練能大幅降低15.52%的相對音節錯誤率、12.33%的相對字錯誤率及10.02%的相對詞錯誤率,明顯優於最大化交互資訊的訓練方式。此外,在非監督式聲學模型調適上,本論文探討了在聲學模型空間及特徵空間上透過轉換矩陣間接調適的調適技術。然而,因為缺少正確轉譯文句(Correct Transcripts)可供最小化音素錯誤估測原始正確率,故需以辨識所產生對應的轉譯文句來取代,使得非監督式最小化音素錯誤調適技術無法對聲學模型參數做良好的估測,導致辨識效能顯著地下降。為了改善此現象,本論文提出了「原始正確率預測模型」(Raw Accuracy Prediction Model, RAPM)用來改良非監督式最小化音素錯誤之調適,對辨識效能有少許的提升。
Discriminative training of acoustic models has been an active focus of much current research in automatic speech recognition (ASR) in the past few years. This thesis extensively investigated the use of the Minimum Phone Error (MPE) approaches for discriminative training and adaptation of acoustic models for Mandarin large vocabulary continuous speech recognition (LVCSR). All experiments were carried out on the Mandarin broadcast news corpus (MATBN). The experimental results show that MPE training can give significant improvements over the baseline systems whose acoustic models were trained based on the Maximum Likelihood (ML), Maximum Mutual Information (MMI) principles. Comparing to the ML-trained acoustic models, relative reductions of 15.52% syllable error rate (SER), 12.33% character error rate (CER) and 10.02% word error rate (WER) were respectively obtained by using the MPE-trained models. Moreover, unsupervised adaptation of acoustic models via the MPE-trained linear transformation in either the model space or the feature space was studied as well with promising results indicated. However, because there was no correct reference transcript that can be used for accuracy calculation and only the top one automatic transcript can be used instead, the unsupervised MPE-based adaptation techniques may not always accumulate good estimates for the acoustic model parameters and thus their performance will be substantially degraded. To tackle this problem, in this thesis a novel Raw Accuracy Prediction Model (RAPM) was proposed to ameliorate the MPE-based adaptation techniques and slight performance gains were initially demonstrated.
[Anastasakos et al. 1996] T. Anastasakos, J. McDonough, R. Schwartz, J. Makhoul (1996). “A Compact Model for Speaker-Adaptive Training,” in Proc. ICSLP’96.
[Aubert 2002] X. L. Aubert (2002). “An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, Vol. 16, January 2002.
[Bahl et al. 1983] L. R. Bahl, F. Jelinek and R. L. Mercer (1983). “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. PAMI-5, No.2, pp.179-190, March 1983.
[Bahl et al. 2001] L. R. Bahl, P. F. Brown, P. V. de Souza and R. L. Mercer (1986). “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition,” in Proc. ICASSP’86.
[Barras et al. 1986] C. Barras, E. Geoffrois, Z. B. Wu, and M. Liberman (2001). “Transcriber: Development and Use of a Tool for Assisting Speech Corpora Production,” Speech Communication, Vol. 33, pp.5-22, 2001.
[Baum 1972] L. E. Baum (1972). “An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes,” Inequalities, Vol. 3, No. 1, pp.1-8, 1972.
[Brown 1987] P. F. Brown (1987). “The Acoustic-Modeling Problem in Automatic Speech Recognition,” Ph.D Dissertation, Carnegie Mellon University, Pittsburg, 1987.
[Cardin et al. 1991] R. Cardin, Y. Normandin and R. De Mori (1991). “High Performance Connected Digit Recognition using Maximum Mutual Information Estimation,” in Proc. ICASSP’91.
[Chen et al. 2002] B. Chen, H.-M. Wang, and L.-S. Lee (2002). “Discriminating Capabilities of Syllable-Based Features and Approaches of Utilizing Them for Voice Retrieval of Speech Information in Mandarin Chinese,” IEEE Trans. Speech and Audio Processing, Vol. 10, No. 5, pp.303-314, July 2002.
[Chen et al. 2004] B. Chen, J.-W. Kuo, W.-H. Tsai (2004). “Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription,” in Proc. ICASSP 2004.
[Chen et al. 2005] B. Chen, J.-W. Kuo, W.-H. Tsai (2005). “Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription,” International Journal of Computational Linguistics and Chinese Language Processing, Vol. 10, No. 1, pp.1-18, March 2005.
[Chengalvarayan 1998] R. Chengalvarayan (1998). “Speaker Adaptation using Discriminative Linear Regression on Time-Varying Mean Parameters in Trended HMM,” IEEE Signal Processing Letters, Vol. 5, No. 3, pp.63-65, March 1998.
[Chou et al. 1993] W. Chou, C.-H. Lee, B.-H. Juang (1993). “Minimum Error Rate Training based on N-Best String Models,” in Proc. ICASSP’93.
[Chow 1989] Y.-L. Chow (1990). “Maximum Mutual Information Estimation of HMM Parameters for Continuous Speech Recognition using the N-best Algorithm,” in Proc. ICASSP’90.
[Clarkson & Rosenfeld 1997] P. Clarkson and R. Rosenfeld (1997). “Statistical Language Modeling using the CMU-Cambridge Toolkit,” in Proc. Eurospeech’97.
[CNA] Central News Agency. http://www.cna.com.tw/.
[Davis & Mermelstein 1980] S. B. Davis and P. Mermelstein (1980). “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans. Acoustic, Speech, and Signal Processing, Vol. 28, No. 4, pp.357-366, 1980.
[Dempster et al. 1977] A. P. Dempster, N. M. Laird and D. B. Rubin (1977). “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of Royal Statistical Society B, Vol. 39, No. 1, pp.1-38, 1977.
[Deng et al. 2005] L. Deng, J. Wu, J. Droppo and A. Acero (2005). “Analysis and Comparison of Two Speech Feature Extraction/Compensation Algorithms,” IEEE Signal Processing Letters, Vol. 12, No. 6, pp.477-480, June 2005.
[Digalakis et al. 1995] V. Digalakisk, D. Rtischev and L. Neumeyer (1995). “Speaker Adaptation using Constrained Estimation of Gaussian Mixtures,” IEEE Trans. Speech and Audio Processiong, Vol. 3, No. 5, pp.357-366, September 1995.
[Doumpiotis et al. 2003] V. Doumpiotis, S. Tsakalidis, W. Byrne (2003). “Discriminative Training for Segmental Minimum Bayes Risk Decoding,” in Proc. ICASSP’03.
[Doumpiotis et al. 2004] V. Doumpiotis, S. Tsakalidis, W. Byrne (2004). “Lattice Segmentation and Minimum Bayes Risk Discriminative Training,” in Proc. Eurospeech’04.
[Doumpiotis & Byrne 2004] V. Doumpiotis and W. Byrne (2004). “Pinched Lattice Minimum Bayes Risk Discriminative Traning for Large Vocabulary Continuous Speech Recognition,” in Proc. ICSLP’04.
[Duda et al. 2000] R. O. Duda, P. E. Hart and D. G. Stork (2000). Pattern Classification, Second Edition. New York: John & Wiley, 2000.
[Fahlman 1988] S. E. Fahlman (1988). “Faster Learning Variations on Backpropagation: An Empirical Study,” in Proceedings of the 1988 Connectionist Models Summer School, San Mateo, CA: Morgan Kaufmann, pp.38-50, 1988.
[Fiscus 1997] J. Fiscus (1997). “A Post-processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER),” in Proc. ASRU’97.
[Gales & Woodland 1996] M. J. F. Gales and P. C. Woodland (1996). “Mean and Variance Adaptation within the MLLR Framework,” Computer Speech and Language, Vol. 10, pp.249-264, 1996.
[Gales 1998] M. J. F. Gales (1998). “Maximum Likelihood Linear Transformations for HMM-based Speech Recognition,” Computer Speech and Language, Vol. 12, pp.75-98, 1998.
[Gauvain & Lee 1994] J.-L. Gauvain and C.-H. Lee (1994). “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Trans. Speech and Audio Processing, Vol. 2, No. 2, pp.291-298, April 1994.
[Gentle 1998] J. E. Gentle (1998). “Cholesky Factorization,” §3.2.2 in Numerical Linear Algebra for Applications in Statistics, Berlin: Springer-Verlag, pp.93-95, 1998.
[Goel & Byrne 2000] V. Goel and W. Byrne (2000). “Minimum Bayes-Risk Automatic Speech Recognition,” Computer Speech and Language, Vol. 14, pp.115-135, 2000.
[Gopalakrishnan et al. 1991] P. S. Gopalakrishnan, D. Kanevsky, A. Ndas & D. Nahamoo (1991). “An Inequality for Rational Functions with Applications to Some Statistical Estimation Problems,” IEEE Trans. Information Theory, Vol. 37, pp.107-113, 1991.
[Gunawardana & Byrne 2001] A. Gunawardana and W. Byrne (2001). “Discriminative Speaker Adaptation with Conditional Maximum Likelihood Linear Regression,” in Proc. Eurospeech’01.
[He & Chou 2003a] X. He and W. Chou (2003). “Minimum Classification Error Linear Regression for Acoustic Model Adaptation of Continuous Density HMMs,” in Proc. ICASSP’03.
[He & Chou 2003b] X. He and W. Chou (2003). “Minimum Classification Error (MCE) Model Adaptation of Continuous Density HMMs,” in Proc. Eurospeech’03.
[Huang & Chien 2005] C.-H. Huang and J.-T. Chien (2005). “Aggregate a Posteriori Linear Regression for Speaker Adaptation,” in Proc. ICASSP’05.
[Jiang et al 2002] H. Jiang, O. Siohan, F. K. Soong and C.-H. Lee (2002). “A Dynamic In-Search Discriminative Training Approach for Large Vocabulary Speech Recognition,” in Proc. ICASSP’02.
[Juang & Rabiner 1990] B.-H. Juang and S. Katagiri (1992). “The Segmental K-means Algorithm for Estimating Parameters of Hidden Markov Models,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 38, No. 9, pp.1639-1641, September 1990.
[Juang & Katagiri 1992] B.-H. Juang and S. Katagiri (1992). “Discriminative Learning for Minimum Error Classification,” IEEE Trans. Signal Processing, Vol. 40, No. 12, pp. 3043-3054, 1992.
[Juang et al. 1997] B.-H. Juang, W. Chou and C.-H. Lee (1997). “Minimum Classification Error Rate Methods for Speech Recognition,” IEEE Trans. Signal Processing, Vol. 5, No. 3, pp. 257-265, May 1997.
[Kaiser et al. 2000] J. Kaiser, B. Horvat, Z. Kacic (2000). “A Novel Loss Function for the Overall Risk Criterion Based Discriminative Training of HMM Models,” in Proc. ICSLP’00.
[Kaiser et al. 2002] J. Kaiser, B. Horvat, Z. Kacic (2002). “Overall Risk Criterion Estimation of Hidden Markov Model Parameters,” Speech Communication, Vol. 38, pp.383-398, 2002.
[Kapadia 1998] S. Kapadia (1998). “Discriminative Training of Hidden Markov Models,” Ph.D Dissertation, Downing College, University of Cambridge, March 1998.
[Katz 1987] S. M. Katz (1987). “Estimation of Probabilities from Sparse Data for Other Language Component of a Speech Recognizer,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 35, No.3, pp. 400-401, 1987.
[Kuhn et al. 2000] R. Kuhn, J.-C. Junqua, P. Nguyen and N. Niedzielski (2000). “Rapid Speaker Adaptation in Eigenvoice Space,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 8, No. 6, pp. 695-707, 2000
[Kuo et al. 2002] H.-K. J. Kuo, E. Fosler-Lussier, H. Jiang, C.-H. Lee (2002). “Discriminative Training of language Models for Speech Recognition,” in Proc. ICASSP’02.
[Kuo & Chen 2005] J.-W. Kuo and B. Chen (2005). “Minimum Word Error Based Discriminative Training of Language Models,” to appear in Proc. Eurospeech’05.
[LDC] Linguistic Data Consortium: http://www.ldc.upenn.edu.
[Leggetter & Woodland 1995] C. J. Leggetter, P. C. Woodland (1995). “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,” Computer Speech and Language, Vol. 9, pp.171-185, 1995.
[Levenshtein 1966] A. Levenshtein (1966). “Binary Codes Capable of Correcting Deletions, Insertions and Reversals,” Soviet Physics Doklady, Vol. 10, No. 8, pp.707-710, 1966.
[Li & Juang 2002] Q. Li and B.-H. Juang (2002). “A New Algorithm for Fast Discriminative Training,” in Proc. ICASSP’02.
[Li & Juang 2003] Q. Li and B.-H. Juang (2003). “Fast Discriminative Training for Sequential Observations with Application to Speaker Identification,” in Proc. ICASSP’03.
[Li 2004] Q. Li (2004). “Discovering Relations among Discriminative Training Objectives,” in Proc. ICASSP’04.
[Liu et al. 2005] B. Liu, H. Jiang, J.-L. Zhou and R.-H. Wang (2005). “Discriminative Training based on the Criterio of Least Phone Competing Tokens for Large Vocabulary Speech Recognition,” in Proc. ICASSP’05.
[Ljolje 2001] A. Ljolje (2001). “The AT&T LVCSR-2001 System,” in Proc. NIST LVCSR Workshop. NIST, 2001.
[Mangu et al. 2000] L. Mangu, E. Brill and A. Stolcke (2000). “Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks,” Computer Speech and Language, Vol. 14, pp.373-400, 2000.
[McDermott & Katagiri 1997] E. McDermott and S. Katagiri (1997). “String-Level MCE for Continuous Phoneme Recognition,” in Proc. Eurospeech’97.
[McDermott & Katagiri 2005] E. McDermott and S. Katagiri (2005). “Minimum Classification Error for Large Scale Speech Recognition Tasks using Weighted Finite State Transducers,” in Proc. ICASSP’05.
[McDonough et al. 2002] J. McDonough, T. Schaaf and A. Waibel (2002). “On Maximum Mutual Information Speaker-Adapted Training,” in Proc. ICASSP’02.
[Merialdo 1988] B. Merialdo (1988). “Phonetic Recognition using Hidden Markov Models and Maximum Mutual Information Training,” in Proc. ICASSP’88.
[Na et al. 1995] K. Na, B. Jeon, D. Chang, S. Chae, and S. Ann (1995). “Discriminative Training of Hidden Markov Models using Overall Risk Criterion and Reduced Gradient Method,” in Proc. Eurospeech’95.
[Ney et al. 1994] H. Ney, U. Essen, and R. Kneser (1994). “On structuring probabilistic dependences in stochastic language modeling.” Computer Speech and Language, Vol. 8, pp.1-38, 1994.
[NIST] National Institute of Standards and Technology. http://www.nist.gov/.
[Normandin 1991] Y. Normandin (1991). “Hidden Markov Models, Maximum Mutual Information Estimation, and the Speech Recognition Problem,” Ph.D Dissertation, McGill University, Montreal, 1991.
[Normandin et al. 1994] Y. Normandin, R. Lacouture, R. Cardin (1994). “MMIE Training for Large Vocabulary Continuous Speech Recognition,” Proc. ICSLP’94.
[NTNU 2004] Speech Lab, Graduate Institute of Computer Science & Information Engineering, National Taiwan Normal University. http://speech.csie.ntnu.edu.tw/.
[Ortmanns et al. 1997] S. Ortmanns, H. Ney, X. Aubert, “A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, Vol. 11, pp.11-72, 1997.
[Papoulis 1991] A. Papoulis (1991). Probability, Random Variables, and Stochastic Proceses, Third Edition, McGraw Hill, Inc., New York, 1991.
[Povey & Woodland 2002] D. Povey and P. C. Woodland (2002). “Minimum Phone Error and I-smoothing for Improved Discriminative Training,” in Proc. ICASSP’02.
[Povey et al. 2003a] D. Povey, M. J. F. Gales, D. Y. Kim and P. C. Woodland (2003). “MMI-MAP and MPE-MAP for Acoustic Model Adaptation,” in Proc. Eurospeech’03.
[Povey et al. 2003b] D. Povey, P. C. Woodland and M. J. F. Gales (2003). “Discriminative MAP for Acoustic Model Adaptation,” in Proc. ICASSP’03.
[Povey 2004] D. Povey (2004). “Discriminative Training for Large Vocabulary Speech Recognition,” Ph.D Dissertation, Peterhouse, University of Cambridge, July 2004.
[Povey et al. 2005] D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau and G. Zweig (2005). “fMPE: Discriminatively Trained Features for Speech Recognition,” in Proc. ICASSP’05.
[PTS] Public Television Service Foundation. http://www.pts.org.tw.
[Rabiner 1989] L. R. Rabiner (1989). “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition ,” Procedings of the IEEE, Vol. 77, No. 2, February 1989.
[Rigazio et al. 1998] L. Rigazio, J.-C. Junqua, M. Galler (1998). “Multilevel Discriminative Training for Spelled Word Recognition,” in Proc. ICASSP’98.
[Saul & Rahim 2000] L. K. Saul and M. G. Rahim (2000). “Maximum Likelihood and Minimum Classification Error Factor Analysis for Automatic Speech Recognition,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 8, No. 2, pp. 115-125, March 2000.
[Schafer 1996] R. D. Schafer (1996). An Introduction to Nonassociative Algebras. New York: Dover, p.12, 1996.
[Schlter 2000] R. Schlter (2000). “Investigations on Discriminative Training Criteria,” Ph.D Dissertation, RWTH Aachen - University of Technology, September 2000.
[Schlter et al. 2001] R. Schlter, W. Macherey, B. Muller, H. Ney (2001). “Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition,” Speech Communication, Vol. 34, pp. 287-310, 2001.
[SLG] Spoken Language Group at Chinese Information Processing Laboratory, Institute of Information Science, Academia Sinica. http://sovideo.iis.sinica.edu.tw/SLG/index.htm.
[SRILM 2000] A. Stolcke, “SRI language Modeling Toolkit, ” version 1.3.3, http://www.speech.sri.com/projects/srilm/.
[Tam & Mak 2002] Y.-C. Tam and B. Mak (2002). “An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classifcation Error Training,” in Proc. ICASSP’02.
[Tsakalidis et al. 2002] S. Tsakalidis, V. Doumpiotis, W. Byrne (2002). “Discriminative Linear Transforms for Feature Normalization and Speaker Adaptation in HMM Estimation,” in Proc. ICSLP’02.
[Uebel & Woodland 2001] L. F. Uebel and P. C. Woodland (2001). “Improvements in Linear Transform based Speaker Adaptation,” in Proc. ICASSP’01.
[Valente & Wellekens 2003] F. Valente and C. Wellekens (2003). “Minimum Classification Error/ Eigenvoices Training for Speaker Identification,” in Proc. ICASSP’03.
[Valtchev et al. 1996] V. Valtchev, J. J. Odell, P. C. Woodland, S. J. Young. (1996). “Lattice-Based Discriminative Training for Large Vocabulary Speech Recognition,” in Proc. ICASSP’96.
[Valtchev et al. 1997] V. Valtchev, J. J. Odell, P. C. Woodland, S. J. Young. (1997). “MMIE Training of Large Vocabulary Recognition Systems,” Speech Communication, Vol. 22, No. 4, pp.303-314, September 1997.
[Viterbi 1967] A. J. Viterbi (1967). “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm,” IEEE Trans. Information Theory, Vol. 13, No. 2, April 1967.
[Wang 2003a] H.-M. Wang (2003). “MATBN 2002: A Mandarin Chinese Broadcast News Corpus,” in Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR’03).
[Wang & Woodland 2003b] L. Wang and P. C. Woodland (2003). “Discriminative Adaptive Training using the MPE Criterion,” in Proc.ASRU’03.
[Wang & Woodland 2004] L. Wang and P. C. Woodland (2004). “MPE-Based Discriminative Linear Transform for Speaker Adaptation,” in Proc. ICASSP’04.
[Wang et al. 2005] H.-M. Wang, B. Chen, J.-W. Kuo, and S.-S. Cheng (2005). “MATBN: A Mandarin Chinese Broadcast News Corpus,” International Journal of Computational Linguistics and Chinese Language Processing, Vol. 10, No.2, pp.219-236, June 2005.
[Warnke et al. 1999] V. Warnke, S. Harbeck, E. Noth, H. Niemann and M. Levit (1999). “Discriminative Estimation of Interpolation Parameters for Language Model Classifiers,” in Proc. ICASSP’99.
[Woodland & Povey 2002] P. C. Woodland and D. Povey (2002). “Large Scale Discriminative Training of Hidden Markov Models for Speech Recognition,” Computer Speech and Language, Vol. 16, pp.25-47, 2002.
[Wu & Huo 2002a] J. Wu and Q. Huo (2002). “Supervised Adaptation of MCE-Trained CDHMMs using Minimum Classification Error Linear Regression,” in Proc. ICASSP’02.
[Wu & Huo 2002b] J. Wu and Q. Huo (2002). “A Comparative Study of Quickprop and GPD Optimization Algorithms for MCELR Adaptation of CDHMM Parameters,” in Proc. ISCSLP’02.
[Young et al. 2002] S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev and P. C. Woodland (2002). The HTK Book. Version 3.2, 2002. http://htk.eng.cam.ac.uk/.
[Zhang & Matsoukas 2005] B. Zhang and S. Matsoukas (2005). “Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition,” in Proc. ICASSP’05.