簡易檢索 / 詳目顯示

研究生: 陳冠宇
Kuan-Yu Chen
論文名稱: 主題模型於語音辨識使用之改進
Improved Topic Modeling Techniques for Speech Recognition
指導教授: 陳柏琳
Chen, Berlin
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 175
中文關鍵詞: 中文大詞彙連續語音辨識共同出現關係語言模型
英文關鍵詞: large vocabulary continuous speech recognition, co-occurrence relationships, language model
論文種類: 學術論文
相關次數: 點閱:162下載:32
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文探討自然語言中詞與詞之間在各種不同條件下的共同出現關係,並推導出許多不同的語言模型來描述之,進而運用於中文大詞彙連續語音辨識。當我們想要探索語言中兩個詞彼此間的共同出現關係(Co-occurrence Relationships),傳統的做法是由整個訓練語料中統計這兩個詞在一個固定長度的移動窗(Fixed-size Moving Window)內的共同出現頻數(Frequency),據此以估測出兩個詞之間的聯合機率分布。有別於僅從整個訓練語料中的共同出現頻數來推測任兩個詞之間的關係,本論文嘗試分析兩個詞在不同條件下共同出現的情形,進而推導出多種描述詞與詞關係的語言模型以及其估測方式;像是在不同的主題、文件或文件群的情況下,它們是否皆經常共同出現。本論文的實驗語料收錄自台灣的中文廣播新聞,由一系列的大詞彙連續語音辨識實驗結果顯示,我們所提出的各式語言模型皆可以明顯地提昇基礎語音辨識系統的效能。

    This thesis investigates word-word co-occurrence relationships embedded in a natural language. A variety of language models deduced from such relationships are leveraged for Mandarin large vocabulary continuous speech recognition (LVCSR). When measuring the co-occurrence relationship between a given pair of words in a language, the most common approach is to estimate the joint probability of these two words by simply computing how many times the two words occur within some fixed-size window of each other that moves along the entire training corpus. Apart from doing this, in this study, we discuss the co-occurrence relationships between any pair of words under various conditions such as topics, documents, document clusters, to name a few, and hence derive several language models used to characterize such relationships. All experiments are conducted on a Mandarin broadcast news corpus compiled in Taiwan, and the associated results seem to demonstrate the feasibility of the proposed approaches.

    第1章 緒論 1 1.1. 統計式語音辨識(STATISTICAL SPEECH RECOGNITION) 1 1.1.1. 特徵向量擷取(Feature Extraction) 3 1.1.2. 聲學模型(Acoustic Modeling) 4 1.1.3. 語言模型(Language Modeling) 8 1.1.4. 語言解碼(Linguistic Decoding) 11 1.2. 統計式語言模型 12 1.2.1. 語言模型研究 12 1.2.2. 語言模型演進 15 1.2.3. 語言模型調適 19 1.2.4. 語言模型平滑化 22 1.3. 本論文研究內容、貢獻與成果 25 1.4. 論文架構 27 第2章 各式常見語言模型介紹 29 2.1. 詞彙規則模型(WORD-REGULARITY MODELS) 29 2.1.1. N連(N-gram)語言模型 29 2.1.2. 略詞模型(Skip Model) 30 2.1.3. 快取模型(Cache Model) 31 2.1.4. 類別N連模型(Class-based N-gram Model) 32 2.1.5. 聚合式馬可夫模型(Aggregate Markov Model, AMM) 34 2.1.6. 觸發詞對模型(Trigger-pair Model) 35 2.1.7. 混合式語言模型(Mixture-based Language Model) 36 2.2. 主題模型(TOPIC MODELS) 38 2.2.1. 潛藏語意分析(Latent Semantic Analysis, LSA) 39 2.2.2. 機率式潛藏語意分析(Probabilistic Latent Semantic Analysis, PLSA) 41 2.2.3. 潛藏狄利克里分配(Latent Dirichlet Allocation, LDA) 44 2.2.4. 詞主題模型(Word Topic Model, WTM) 47 2.3. 連續型語言模型(CONTINUOUS LANGUAGE MODELS) 49 2.3.1. 類神經機率語言模型(Neural Probabilistic Language Model) 49 2.3.2. 高斯混合語言模型(Gaussian Mixture Language Model, GMLM) 51 2.4. 鑑別式語言模型(DISCRIMINATIVE LANGUAGE MODELS) 54 2.4.1. 機率式鑑別模型調適法(Probabilistic Discriminative Model Adaptation Methods) 55 2.4.2. 鑑別式分數法(Discriminative Score Methods) 56 第3章 使用相鄰詞資訊之語言模型 63 3.1. 研究緒論 63 3.2. 相鄰詞彙資訊之語言模型於大詞彙語音辨識 64 3.2.1. 詞關聯模型(Word Association Model, WAM) 64 3.2.2. 詞關聯混合模型(Word Association Mixture Model, WAMM) 66 3.2.3. 詞相鄰模型(Word Vicinity Model, WVM) 67 3.2.4. 鄰近特徵語言模型(Vicinity Feature Language Model, VFLM) 70 3.2.5. 狄利克里相鄰模型(Dirichlet Vicinity Model, DVM) 72 3.2.6. 各式模型之比較 75 3.3. 鄰近資訊於語音辨識 (PROXIMITY INFORMATION FOR SPEECH RECOGNITION) 79 3.4. 混合主題模型(HYBRID TOPIC MODEL, HTM) 82 第4章 大詞彙連續語音辨識之實驗架構與結果 85 4.1. 實驗架構 85 4.1.1. 大詞彙連續語音辨識系統 85 4.1.1.1. 前端處理與聲學模型 85 4.1.1.2. 詞典建立 86 4.1.1.3. 詞彙樹複製與搜尋 87 4.1.1.4. 詞圖搜尋 88 4.1.2. 語言模型評估方式 89 4.1.2.1. 語言複雜度 89 4.1.2.2. 辨識錯誤率 90 4.1.3. 實驗語料 91 4.2. 基礎實驗結果 93 4.2.1. N連語言模型 93 4.2.2. 略詞模型 98 4.2.3. 快取模型 100 4.2.4. 類別N連語言模型 103 4.2.5. 混合式模型 106 4.2.6. 潛藏語意分析 109 4.2.7. 機率式潛藏語意分析 114 4.2.8. 潛藏狄利克里分配 117 4.2.9. 詞主題模型 120 4.3. 本論文所提出之各式語言模型實驗結果與分析 123 4.3.1. 詞關聯模型(Word Association Model, WAM) 123 4.3.2. 詞關聯混合模型(Word Association Mixture Model, WAMM) 127 4.3.3. 詞相鄰模型(Word Vicinity Model, WVM) 131 4.3.4. 鄰近特徵語言模型(Vicinity Feature Language Model, VFLM) 142 4.3.5. 狄利克里相鄰模型(Dirichlet Vicinity Model, DVM) 145 4.3.6. 混合主題模型(Hybrid Topic Model, HTM) 148 第5章 結論與未來展望 155 參考文獻 159 作者相關學術著作 175

    [Afify et al. 2007] M. Afify, O. Siohan, R. Sarikaya, “Gaussian Mixture Language Models for Speech Recognition”, ICASSP 2007
    [Alpaydin 2004] E. Alpaydin, Introduction to Machine Learning, The MIT Press, 2004
    [Andres-Ferrer and Ney 2009] J. Andres-Ferrer, H. Ney, ”Extensions of absolute discounting (Kneser-Ney method),” Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on , vol., no., pp.4729-4732, 19-24 April, 2009.
    [Aubert 2002] X. Aubert, “An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, Vol. 16, pp. 89-114, 2002.
    [Bacchiani and Roark 2003] M. Bacchiani and B. Roark, “Unsupervised Language Model Adaptation.” In Proc. ICASSP, 2003.
    [Bahl et al. 1980] L. R. Bahl, R. Bakis, F. Jelinek and R. L. Mercer, “Language-model/acoustic channel balance mechanism”, IBM Technical Dkclosure Bulletin, 23(7B), pp.3464-3465, Dec. 1980.
    [Bahl et al. 1983] L. R. Bahl, F. Jelinek and R. L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. PAMI-5, No.2, pp.179-190, 1983
    [Bahl et al. 1986] L. R. Bahl, P. F. Brown, P. V. de Souza and R. L. Mercer, “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition.” In Proc. ICASSP, 1986.
    [Beal 2003] M. J. Beal, “Variational Algorithms for Approximate Bayesian Inference” PhD. Thesis, Gatsby Computational Neuroscience Unit, University College London. 2003
    [Bellegarda 1997] J. Bellegarda, “A latent semantic analysis framework for large-span language modeling.” In Eurospeech-97, Rhodes, Greece, September 1997.
    [Bellegarda 1998] J. R. Bellegarda, “A Multispan Language modeling Framework for Large Vocabulary Speech Recognition,” IEEE Trans. on Acoustic, Speech and Signal Processing, Vol. 6, No. 5, pp. 456-467, 1998.
    [Bellegarda 2000] J. R. Bellegarda, “Exploiting latent semantic information in statistical language modeling.” Proceedings of the IEEE, Volume 88, pages 1279-1296, August 2000.
    [Bellegarda 2004] J. R. Bellegarda, “Statistical language model adaptation: review and perspectives.” Speech Communication, 42, 2004.
    [Bellegarda 2005] J. R. Bellegarda, “Latent Semantic Mapping.” IEEE Signal Processing Magazine, Vol. 22. No. 5, pp. 70- 80, 2005
    [Bengio et al. 2000] Y. Bengio, R. Ducharme, and P. Vincent, “A Neural Probabilistic Language Model.” Technical Report 1178, Departement d'informatique et recherche operationnelle, Universite de Montreal, 2000.
    [Bengio and Ducharme 2001] Y. Bengio and R. Ducharme, “A neural probabilistic language model.” In Advances in Neural Information Processing Systems, vol. 13. Morgan Kaufmann, 2001.
    [Bengio et al. 2003] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A Neural Probabilistic Language Model.” Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.
    [Berger 1997] A. L. Berger, “The improved iterative scaling algorithm: A gentle introduction.” Tech. Rep., Carnegie Mellon University, 1997.
    [Bilmes 1998] J. A. Bilmes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models.” U.C. Berkeley TR-97-021, 1998.
    [Blei et al. 2003] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet Allocation.” In Journal of Machine Learning Research, 2003.
    [Blitzer et al. 2003] J. Blitzer, A. Globerson, and F. Pereira, “Distributed Latent Variable Models of Lexical Co-occurrences.” In Proc. AISTATS, 2005.
    [Brown et al. 1992] P. F. Brown, V. J. Della Pietra, P. V. deSouza, J. C. Lai, and R. L. Mercer, “Class-based n-gram models of natural language.” Computational Linguistics, 18(4):467–479, December, 1992
    [Brown et al. 1993] P. F. Brown, V. J. Della Pietra, S. A. Della Pietra, and R. L. Mercer, “The mathematics of statistical machine translation: Parameter estimation.” Computational Linguistics,19(2):263–311, 1993.
    [Büttcher et al. 2006] S. Buttcher, C. Clarke, and B. Lushman, “Term proximity scoring for ad-hoc retrieval on very large text collections.” In Proc. SIGIR, 2006.
    [Chan and Togneri 2006] O. Chan and R. Togneri, “Prosodic Features for a Maximum Entropy Language Model.” In Proc. Interspeech, 2006.
    [Chelba 1997] C. Chelba , “A Structured Language Model.” In Proc. ACL, 1997.
    [Chelba and Jelinek 1999] C. Chelba and F. Jelinek, “Recognition performance of a structured language model.” In Proc. 6th Eur. Conf. Speech Commun. Technol., Budapest, Hungary, Sept. 1999, vol. 4, pp. 1567–1570.
    [Chen et al. 2000] Z. Chen, K. F. Lee and M. J. Li, “Discriminative Training on Language Model.” In Proc. Interspeech, 2000.
    [Chen and Chen 2007] B. Chen, Y.-T. Chen, “Word Topical Mixture Models for Extractive Spoken Document Summarization.” In Proc. ICME, 2007.
    [Chen 2009] L. Chen, K. K. Chin, K. Knill, “Improved language modelling using bag of word pairs.” In Proc. Interspeech, 2009.
    [Chen and Goodman 1999] S. F. Chen, J. Goodman, “An Empirical Study of Smoothing Techniques for Language Modeling.” Computer Speech and Language, 13, 1999.
    [Chen and Ma 2002] K.-J. Chen and W.-Y. Ma, “Unknown Word Extraction for Chinese Documents.” In Proc. COLING, 2002.
    [Chen et al. 2003] L. Chen, J.-L. Gauvuin, L. Lamel, and G. Addu, “Unsupervised Language Model Adaptation for Broadcast News.” In Proc. ICASSP, 2003.
    [Chen et al. 2004a] B. Chen, J.-W. Kuo and W.-H. Tsai, “Lightly Supervised and Data-driven Approaches to Mandarin Broadcast News Transcription.” In Proc. ICASSP, 2004.
    [Chen et al. 2004b] B. Chen, H.-M. Wang, L.-S. Lee, “A Discriminative HMM/N-Gram-Based Retrieval Approach for Mandarin Spoken Documents.” ACM Transactions on Asian Language Information Processing (ACM TALIP), Vol. 3, No. 2, 2004.
    [Chen et al. 2006a] B. Chen, Y.-M. Yeh, Y.-M. Huang, Y.-T. Chen, “Chinese Spoken Document Summarization Using Probabilistic Latent Topical Information.” In Proc. ICASSP, 2006.
    [Chen et al. 2006b] Y.-T. Chen, S. Yu, H.-M. Wang, B. Chen, “Extractive Chinese Spoken Document Summarization Using Probabilistic Ranking Models.” In Proc. ISCSLP, 2006.
    [Chen et al. 2007] Y.-T. Chen, H.-S. Chiu, H.-M. Wang, B. Chen, “A Unified Probabilistic Generative Framework for Extractive Spoken Document Summarization.” In Proc. Interspeech, 2007.
    [Chen 2009] B. Chen, “Word topic models for spoken document retrieval and transcription.” ACM Transactions on Asian Language Information Processing, Vol. 8, No.1, pp. 2:1-2:27, March 2009.
    [Chien et al. 2004] J.-T. Chien, M.-S. Wu and H.-J. Peng, “On latent semantic language modeling and smoothing.” In Proc. Interspeech, 2004.
    [Chien et al. 2005] J.-T. Chien, M.-S. Wu and C.-S. Wu, “Bayesian learning for latent semantic analysis.” In Proc. Interspeech, 2005.
    [Chiu and Chen 2007] H.-S. Chiu, B. Chen, “Word Topical Mixture Models for Dynamic Language Model Adaptation.” In Proc. ICASSP, 2007.
    [Chou and Juang 2003] W. Chou and B. H. Juang, ”Pattern Recognition in Speech and Language Processing.” CRC Press, 2003.
    [Chueh and Chien 2008] C.-H. Chueh and J.-T. Chien, “Continuous topic language modeling for speech recognition.” In Proc. SLT, 2008.
    [Chueh and Chien 2009] C.-H. Chueh and J.-T. Chien “Nonstationary latent Dirichlet allocation for speech recognition.” In Proc. Interspeech, 2009.
    [Clarkson and Robinson 1997] P. R. Clarkson and A. J. Robinson, “Language Model Adaptation using Mixtures and an Exponentially Decaying Cache.” In Proc. ICASSP, 1997.
    [CNA News] Central News Agency, http://www.cna.com.tw/
    [Collins 2000] M. Collins, “Discriminative reranking for natural language parsing.” In Proc. ICML, 2000.
    [Collins 2002] M. Collins, “Discriminative Training Methods for Hidden Markov Models : Theory and Experiments with Perceptron Algorithms.” In Proc. EMNLP, 2002.
    [Croft and Lafferty 2003] W. B. Croft, J. Lafferty (Eds.). Language Modeling for Information Retrieval. Kluwer Academic Publishers (2003).
    [Cummins and O'Riordan 2009] Ronan Cummins and Colm O'Riordan, “Learning in a pairwise term-term proximity framework for information retrieval.” In Proc. SIGIR, 2009.
    [Darroch and Ratcliff 1972] J. N. Darroch and D. Ratcliff, “Generalized Iterative Scaling for Log-linear Models.” Annals of Mathematical Statistics, no. 43, 1470-1480, 1972
    [Davis and Mermelstein 1980] S. B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans. Acoustic, Speech, and Signal Processing, Vol. 28, No. 4, pp.357-366, 1980.
    [Deerwester et al. 1990] S. Deerwester, S. Dumais, G. W. Furnas, T. K. Landauer, R. Harshman. “Indexing by Latent Semantic Analysis”. Journal of the Society for Information Science 41 (6): 391-407. 1990
    [Dempster et al. 1977] A.P. Dempster, N. M. Laird, D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, Journal of Royal Statistical Society B, Vol. 39, No. 1, 1977.
    [Duda and Hart 1973] R. O. Duda and P. B. Hart, “Pattern Classification And Scene Analysis,” John Wiley and Sons, 1973.
    [Duda et al. 2001] R. O. Duda, P. E. Hart and D. G. Stork, (2001), Pattern Classification, Wiley Interscience, 2001
    [Fiscus 1997] J. G. Fiscus, “A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)”, in Proceedings IEEE Automatic Speech Recognition and Understanding Workshop, pp. 347–352, Santa Barbara, CA, 1997.
    [Foster 2000] G. Foster, “Incorporating position information into a maximum entropy/minimum divergence translation model”, In Proc. of CoNNL-2000 and LLL-2000, pages 37–52, Lisbon, Portugal, 2000.
    [Freund and Schapire 1996] Y. Freund and R. E. Schapire, “Experiments with a New Boosting Algorithm.” In Proc. ICML, 1996.
    [Freund et al. 1998] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, “An efficient boosting algorithm for combining preferences,” Journal of Machine Learning Research, 1998.
    [Gales and Young 2008] M.J.F. Gales and S.J. Young. “The Application of Hidden Markov Models in Speech Recognition.” Foundations and Trends in Signal Processing, 2008.
    [Gao et al. 2005a] J. Gao, H. Yu, W. Yuan and P. Xu, “Minimum Sample Risk Methods for Language Modeling.” In Proc. HLT/EMNLP, 2005.
    [Gao et al. 2005b] J. Gao, H. Suzuki, W. Yuan, “An Empirical Study on Language Model Adaptation”, ACM Transactions on Asian Language Information Processing, Vol. 5, No. 3, September 2005, pp. 209-227
    [Gao et al. 2005c] J. Gao, H. Suzuki, B. Yu, “Approximation Lasso Methods for Language Modeling.” In Proc. ACL, 2006.
    [Gao et al. 2007] J. Gao, G. Andrew, M. Johnson and K. Toutanova ,”A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing.” In Proc. ACL, 2007.
    [Gerani et al. 2010] S. Gerani, M. Carman, and F. Crestani, “Proximity-based opinion retrieval.” In Proc. SIGIR, 2010.
    [Girolami and Kaban 2003] M. Girolami and A. Kaban, “On an Equivalence between PLSI and LDA.” In Proc. SIGIR, 2003.
    [Glidea and Hofmann 1999] D. Gildea and T. Hofmann, “Topic-based language models using EM.” In Proc. Interspeech, 1999.
    [Good 1953] I. J. Good, “The population frequencies of species and the estimation of population parameters.” Biometrika, 40(3 and 4):237-264, 1953.
    [Goodman 2001] J. Goodman. “A Bit of Progress in Language Modeling (Extended Version).” Microsoft Research, Machine Learning and Applied Statistics Group, Technique Report, 2001.
    [Google 2006]http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
    [Gopinath 1998] R. A. Gopinath, “Maximum Likelihood Modeling with Gaussian Distributions,” In Proc. ICASSP, 1998.
    [Griffiths] T. Griffiths “Gibbs sampling in the generative model of Latent Dirichlet Allocation.” Technical Report
    [Griffiths and Steyvers 2004] T. Griffiths and M. Steyvers, “Finding Scientific Topics,” In Proc. National Academy of Science, 101(Suppl. 1):5228-5235, 2004
    [Gruber et al. 2007] A. Gruber, M. Rosen-Zvi and Y. Weiss, “Hidden Topic Markov Models,” In Proc. AISTATS, 2007.
    [Gunawardana et al. 2005] A. Gunawardana, M. Mahajan, A. Acero, J. C. Platt, “Hidden Conditional Random Fields for Phone Classification.” In Proc. Interspeech, 2005.
    [Guthrie 2006] David Guthrie, Ben Allison, Wei Liu, Louise Guthrie, Yorick Wilks, “A Closer Look at Skip-gram Modelling.” In Proceedings of the Fifth international Conference on Language Resources and Evaluation (LREC), Genoa, Italy, 2006.
    [Heigold 2009] G. Heigold, G. Zweig, X. Li, and P. Nguyen, “A flat direct model for speech recognition.” In Proc. ICASSP, 2009.
    [Hermansky 1990] H. Hermansky, “Perceptual linear predictive analysis of speech,” Journal of Acoustical Society of American, vol. 87, no.4 , pp. 1738-1752, 1990.
    [Ho 2003] Y. Ho,” An initial study on automatic summarization of Chinese spoken documents”, Master Thesis, National Taiwan University, July 2003.
    [Hofmann 1999] T. Hofmann, “Probabilistic latent semantic analysis.” In Proc. of Uncertainty in Arterial Intelligence, UAI'99, Stockholm, 1999.
    [Hsu and Glass 2006] B.-J. (Paul) Hsu, J. Glass, “Style & Topic Language Model Adaptation Using HMM-LDA”. In Proc. EMNLP, 2006.
    [Huang et al. 1993] X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee, and R. Rosenfeld. 1993. The SPHINX-II speech recognition system: An overview. Computer, Speech, and Language, 2:137–148. 1993
    [Huang et al. 2001] X. Huang, A. Acero, H.-W. Hon, “Spoken Language Processing: A Guide to Theory, Algorithm and System Development,” Prentice-Hall Co. Ltd, 2001.
    [Huang and Renals 2010] S. Huang and S. Renals, “Power Law Discounting for N-Gram Language Models.” In Proc. ICASSP, 2010.
    [Iyer and Ostendorf 1999] R. M. Iyer, M. Ostendorf, “Modeling long distance dependence in language: topic mixtures versus dynamic cache models.” Speech and Audio Processing, IEEE Transactions on , Vol.7 Issue: 1 , Jan. 1999.
    [Jelinek 1991] F. Jelinek, “Up from trigrams! The struggle for improved language models.” In Proc. Interspeech, 1991.
    [Jelinek 1999] F. Jelinek, “Statistical Methods for Speech Recognition,” the MIT press,1999.
    [Jordan 1999a] M. Jordan, editor. Learning in Graphical Models. MIT Press, Cambridge, MA, 1999.
    [Jordan et al. 1999b] M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul, “Introduction to variational methods for graphical models.” Machine Learning, 37:183–233, 1999.
    [Juang and Katagiri 1992] B.-H. Juang and S. Katagiri, “Discriminative Learning for Minimum Error Classification,” IEEE Trans. Signal Processing, Vol. 40, No. 12, pp. 3043-3054, 1992
    [Katz 1987] S. M. Katz, “Estimation of Probabilities from Sparse Data for the Language Model Component of A Speech Recognizer,” IEEE Trans. On Acoustics, Speech and Signal Processing, Volume 35 (3), pages 400-401, March 1987.
    [Kikuchi et al. 2003] T. Kikuchi, S. Furui, and C. Hori, “Two-stage automatic speech summarization by sentence extraction and compaction,” in Proc. IEEE and ISCA Workshop on Spontaneous Speech Processing and Recognition, 2003, pp.207-210.
    [Kneser and Ney 1995] R. Kneser and H. Ney, “Improved backing-off for m-gram language modeling.” In Proc. ICASSP, 1995.
    [Korkmazsky et al. 2004] F. Korkmazsky, D. Fohr and I. Illina, “Using Linear Interpolation to Improve Histogram Equalization for Speech Recognition.” In Proc. Interspeech, 2004.
    [Kuhn 1988] R. Kuhn, “Speech recognition and the frequency of recently used words: A modified markov model for natural language”. In 12th International Conference on Computational Linguistics, pages 348–350, Budapest, August, 1988
    [Kuhn and Mori 1990] R. Kuhn and R. De Mori. A cache-based natural language model for speech reproduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(6):570–583, 1990.
    [Kumar 1997] N. Kumar, Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition, Ph. D. dissertation, John Hopkins University, Baltimore, 1997.
    [Kuo et al. 2002] H.-K. J. Kuo, E. Fosler-Lussier, H. Jiang and C. H. Lee,”Discriminative Training of Language Models for Speech Recognition.” In Proc. ICASSP, 2002.
    [Kuo and Chen 2005] J.-W. Kuo, B. Chen, “Minimum Word Error Based Discriminative Training of Language Models.” In Proc. Interspeech, 2005.
    [Kuo and Gao 2004] H.-K. J. Kuo and Y. Gao, “Maximum entropy direct model as a unified direct model for acoustic modeling in speech recognition.” In Proc. Interspeech, 2004.
    [Lafferty 2001] J. Lafferty, A. McCallum, F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data.” In Proc. ICML, 2001.
    [Lau et al. 1993] R. Lau, R. Rosenfeld and S. Roukos, “Trigger-Based Language Models: a Maximum Entropy Approach.” In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pages II 45–48, Minneapolis, MN, April 1993.
    [Lavrenko and Croft 2001] V. Lavrenko and W. Croft, “Relevance-based language models.” In Proc. SIGIR, 2001.
    [LDC] Linguistic Data Consortium: http://ldc.upenn.edu/.
    [Lee and Chen 2005] L.-S. Lee and B. Chen, “Spoken Document Understanding and Organization,” IEEE Signal Processing Magazine (IEEE SPM), Vol. 22, No. 5, Sept. 2005, pp. 42-60.
    [Lewis 1998] D. D. Lewis, “Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval.”In Proc. ICML, 1998.
    [Li and McCallum 2006] W. Li, and A. McCallum, “Pachinko allocation: DAG-structured mixture models of topic correlations.” In Proc. ICML, 2006.
    [Li et al. 2005] T.-H. Li, M.-H. Lee, B. Chen and L.-S. Lee, “Hierarchical Topic Organization and Visual Presentation of Spoken Documents Using Probabilistic Latent Semantic Analysis (PLSA) for Efficient Retrieval/Browsing Applications.” In Proc. of Eurospeech, 2005.
    [Lin et al. 2006] S.-H. Lin, Y.-M. Yeh, B. Chen,“Exploiting Polynomial-Fit Histogram Equalization and Temporal Average for Robust Speech Recognition.” In Proc. Interspeech, 2006.
    [Liu et al. 2009] X. Liu, M. J. F. Gales and P. C. Woodland, “Use of Contexts in Language Model Interpolation and Adaptation.” In Proc. Interspeech, 2009.
    [Lv and Zhai 2010] Y. Lv and C.-X. Zhai, “Positional Relevance Model for Pseudo-Relevance Feedback in Web Search.” In Proc. SIGIR, 2010.
    [Magdin and Jiang 2009] V. Magdin and H. Jiang, “Discriminative Training Of N-gram Language Models For Speech Recognition Via Linear Programming.” In Proc. ASRU, 2009.
    [Magdin and Jiang 2010] V. Magdin and H. Jiang, “Large Margin Estimation Of N-gram Language Models For Speech Recognition Via Linear Programming.” In Proc. ICASSP, 2010.
    [Manning and Schutze 1999] C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
    [Mikolov 2009] T. Mikolov, J. Kopecký, L. Burget, O. Glembek, and H. Cernocký, “Neural Network Based Language Models for Highly Inflective Languages.” In Proc ICASSP, 2009.
    [Mori and Takuma 2004] S. Mori and D. Takuma, “Word N-gram Probability Estimation From A Japanese Raw Corpus.” In Proc. INTERSPEECH, 2004.
    [Mrva and Woodland 2004] D. Mrva and P. C. Woodland, “A PLSA-based Language Model for Conversational Telephone Speech.” In Proc. INTERSPEECH, 2004.
    [Mrva and Woodland 2006] D. Mrva and P. C. Woodland ,”Unsupervised Language Model Adaptation for Mandarin Broadcast Conversation Transcription.” In Proc. INTERSPEECH, 2006
    [Naptali et al. 2009] W. Naptali, M. Tsuchiya, N. Seiichi, “Word Co-occurrence Matrix and Context Dependent Class in LSA based Language Model for Speech Recognition,” International Journal of Computers, Vol.3, No.1, pp.85-95, 2009.
    [Nie et al. 2007] J. Nie, R. Li, D. Luo, X. Wu, “Refine bigram PLSA model by assigning latent topics unevenly.” In Proc. ASRU, 2007.
    [Niesler and Willett 2002] T. Niesler and D. Willett, “Unsupervised language model adaptation for lecture speech transcription,” In Proc. INTERSPEECH, 2002.
    [NOW News] NOW News, http://www.nownews.com/
    [Novak and Mammone 2001] M. Novak, R. Mammone, “Use of Non-negative Matrix Factorization for Language Model. Adaptation in a Lecture Transcription Task.” In Proc. ICASSP, 2001.
    [Oba et al. 2007] T. Oba, T. Hori, and A. Nakamura, “A Study of Efficient Discriminative Word Sequences for Reranking of Recognition Results based on N-gram Counts.” In Proc. Interspeech, 2007.
    [Oba et al. 2010] T. Oba, T. Hori and A. Nakamura, “A Comparative Study on Methods Of Weighted Language Model Training For Reranking Lvcsr N-Best Hypotheses.” In Proc. ICASSP, 2010.
    [Ogawa et al. 1998] A. Ogawa, K. Takeda, and F. Itakura, “Balancing acoustic and linguistic probabilities.” In Proc. ICASSP, 1998.
    [Olsen et al. 2006] J. Olsen, D. Oria, N. Finland, “Profile Based Compression of N-gram Language Models.” In Proc. ICASSP, 2006
    [Okanohara et al. 2007] D. Okanohara, J. Tsujii, “A Discriminative Language Model with Pseudo-Negative Samples,” In Proc. ACL, 2007.
    [Ortmanns et al. 1997] S. Ortmanns, H. Ney and X. L. Aubert, “A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, Vol. 11, pp.43-72, 1997.
    [Picone 1993] J. W. Picone, "Signal modeling techniques in speech recognition," in Proc. the IEEE, 1993, pp. 1214-1247.
    [Povey and Woodland 2002] D. Povey and P. C. Woodland, “Minimum Phone Error and I-smoothing for Improved Discriminative Training.” In Proc. ICASSP, 2002.
    [Povey 2004] D. Povey, “Discriminative Training for Large Vocabulary Speech Recognition,” Ph.D Dissertation, Peterhouse, University of Cambridge, July 2004.
    [PTS] Public Television Service Foundation. http://www.pts.org.tw.
    [Rabiner 1989] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Procedings of the IEEE, Vol. 77, No. 2, 1989.
    [Rastrow et al. 2009] A. Rastrow, A. Sethy, B. Ramabhadran, “Constrained discriminative training of N-gram language models.” In Proc. ASRU, 2009.
    [Rigazio et al. 1998] L. Rigazio, J.-C. Junqua, M. Galler, “Multilevel Discriminative Training for Spelled Word Recognition.” In Proc. ICASSP, 1998.
    [Roark et al. 2004a] B. Roark, M. Saraclar and M. Collins, “Corrective Language Modeling for Large Vocabulary ASR with the Perceptron Algorithm.” In Proc. ICASSP, 2004.
    [Roark et al. 2004b] B. Roark, M. Saraclar, M. Collins, M. Johnson, “Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm.” In Proc. ACL, 2004.
    [Roark et al. 2007] B. Roark, M. Saraclar and M. Collins,”Discriminative N-gram Language Modeling”, Computer Speech and Language, 2007.
    [Rosenfeld 1994] R. Rosenfeld. Adaptive Statistical Language Modeling: A Maximum Entropy Approach. Ph.D. thesis, Carnegie Mellon University, April 1994. Also published as Technical Report CMUCS-94-138, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, April 1994.
    [Rosenfeld 1996] R. Rosenfeld, “A maximum entropy approach to adaptive statistical language modeling.” Computer, Speech, and Language, 10, 1996.
    [Rosenfeld 1997] R. Rosenfeld, “A whole sentence maximum entropy language model.” In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, 1997.
    [Rosenfeld et al. 1999] R. Rosenfeld, L. Wasserman, C. Cai, and X. Zhu, “Interactive feature induction and logistic regression for whole sentence exponential language models.” In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, Keystone, CO, December 1999.
    [Rosenfeld 2000] R. Rosenfeld, “Two Decades of Statistical Language Modeling: Where Do We Go from Here.” In Proceedings IEEE, Volume 88, no. 8, pages 1270-1278, 2000.
    [Rosenfeld et al. 2001] R. Rosenfeld, S. F. Chen, and X. Zhu, “Whole-sentence exponential language models: A vehicle for linguistic-statistical integration,” Computer Speech and Language 15(1), 2001.
    [Rosen-Zvi et al. 2004] M. Rosen-Zvi, T. Griffiths, M. Steyvers and P. Smyth, “The Author-Topic Model for Authors and Documents”, Proceedings of the Conference on Uncertainty in Artificial Intelligence volume 21, 2004
    [Sarikaya et al. 2009] R. Sarikaya, M. Afify, and B. Kingsbury, “Tied-mixture language modeling in continuous space.” In Proc. NAACL, 2009.
    [Saul and Pereira 1997] L. Saul and F. Pereira, “Aggregate and mixed-order Markov models for statistical language processing.” In Proc. EMNLP, 1997.
    [Schwenk and Gauvain 2002] H. Schwenk and J.-L. Gauvain, “Connectionist Language Modeling for Large Vocabulary Continuous Speech Recognition.” In Proc. ICASSP, 2002.
    [Singh-Miller and Collins 2007] N. Singh-Miller and M. Collins, “Trigger-based Language Modeling using a Loss-sensitive Perceptron Algorithm.” In Proc. ICASSP, 2007.
    [SLG] Spoken Language Group at Chinese Information Processing Laboratory, Institute of Information Science, Academia Sinica. http://sovideo.iis.sinica.edu.tw/SLG/.
    [Smucker et al. 2005] M. D. Smucker et al., “Dirichlet Mixtures for Query Estimation in Information Retrieval,” CIIR Technical Report, Center for Intelligent Information Retrieval, University of Massachusetts (2005)
    [SRILM] A. Stolcke. SRI Language Modeling Toolkit. http://www-speech.sri.com/projects/srilm/
    [Steyvers and Griffiths 2007] M. Steyvers and T. Griffiths, “Probabilistic topic models.” In T. Landauer, D. S. McNamara, S. Dennis, & W.Kintsch(Eds.), Handbook of Latent Semantic Analysis.Hillsdale, NJ: Erlbaum, 2007.
    [Tam and Schultz 2005] Y. C. Tam and T. Schultz, “Dynamic Language Model Adaptation using Variational Bayes Inference.” In Proc. Interspeech, 2005.
    [Teh 2006] Y. W. Teh, “A hierarchical Bayesian language model based on Pitman–Yor processes.” In Proc. Coling/ACL, 2006.
    [Tillmann and Ney 1996] C. Tillmann, H. Ney, “Selection Criteria for Word Trigger Pairs in Language Modeling,” International Colloquium on Grammatical Inference, pp. 95-106, 1996.
    [Troncoso et al. 2004] C. Troncoso, T. Kawahara, H. Yamamoto and G. Kikui, “Triggerbased language model construction by combining different corpora,” IEICE Technical Report, SP2004-100, 2004.
    [Tsai and Chen 2004] Y.-F. Tsai and K.-J. Chen, “Reliable and Cost-Effective Pos-Tagging”, International Journal of Computational Linguistics & Chinese Language Processing, Vol. 9 #1, pp83-96, 2004.
    [Tur and Stolcke 2007] G. Tur and A. Stolcke, “Unsupervised Language Model Adaptation For Meeting Recognition.” In Proc. ICASSP 2007
    [Viikki and Laurila 1998] O. Viikki and K. Laurila, “Cepstral Domain Segmental Feature Vector Normalization for Noise Robust Speech Recognition,” Speech Communication, Vol. 25, pp. 133-147, 1998.
    [Viterbi 1967] A. J. Viterbi, “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm,” IEEE Trans. Information Theory, Vol. 13, No. 2, 1967.
    [Wallach 2006] H. M. Wallach, “Topic modeling: Beyond bag-of-words.” In Proc. ICML, 2006.
    [Wang et al. 2001] S. Wang, R. Rosenfeld and Y. Zhao, “Latent Maximum Entropy Principle For Statistical Language Modeling,” IEEE Workshop on Automatic Speech Recognition and Understanding, December 2001
    [Wang et al. 2005] H.-M. Wang, B. Chen, J.-W. Kuo and S.-S. Cheng, “MATBN: A Mandarin Chinese Broadcast News Corpus,” International Journal of Computational Linguistics and Chinese Language Processing, Vol. 10, No.2, pp.219-236, 2005.
    [Wang and Li 2009] K. Wang and X. Li, “Efficacy of A Constantly Adaptive Language Modeling Technique for Web-Scale Applications.” In Proc ICASSP, 2009.
    [Wang et al. 2010a] K. Wang, X. Li, and J. Gao, “Multi-Style Language Model for Web Scale Information Retrieval.” In Proc SIGIR, 2010.
    [Wang et al. 2010b] K. Wang, C. Thrasher, E. Viegas, X. Li, and P. Hsu, “An Overview of Microsoft Web N-gram Corpus and Applications.” June 2010.
    [Wei and Croft 2007] X. Wei and B. W. Croft,”Modeling Term Associations for Ad-hoc Retrieval Performance within Language Modeling Framework." In Proc. ECIR, 2007.
    [Whittaker and Woodland 2001] E. W D. Whittaker and P. C. Woodland, “Efficient Class-Based Language Modelling For Very Large Vocabularies.” In Proc. ICASSP, 2001.
    [Witten and Bell 1991] I. H. Witten and T. C. Bell, “The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression.” IEEE Transaction on Information Theory, 37(4):1085-1094, 1991.
    [Xu et al. 2003] P. Xu, A. Emami, and F. Jelinek, “Training connectionist models for the structured language model.” In Proc. EMNLP, 2003.
    [Zhai and Lafferty 2004] C. Zhai and J. Lafferty, “A study of smoothing methods for language models applied to Ad Hoc Information Retrieval.” In Proc. SIGIR, 2004.
    [Zhao and Yun 2009] J. Zhao and Y. Yun, “A proximity language model for information retrieval.” In Proc. SIGIR, 2009.
    [Zhou et al. 2006] Z. Zhou, J. Gao, F. K. Soong and H. Meng, “A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization.” In Proc. ICASSP, 2006.
    [Zhou et al. 2008] Z. Zhou and H. Meng, “Recasting the Discriminative N-gram Model as a Pseudo-conventional N-gram Model for LVCSR.” In Proc. ICASSP,2008.
    [Zhu and Rosenfeld 2001] X. Zhu and R. Rosenfeld, “Improving trigram language modeling with the world wide web.” In Proc. ICASSP, 2001.
    [蔡文鴻 2005] 蔡文鴻, “語言模型訓練與調適技術於中文大詞彙連續語音辨識之初步研究,” 國立台灣師範大學資訊工程所碩士論文, 2005.
    [陳燦輝 2006] 陳燦輝, “信心度評估於中文大詞彙連續語音辨識之研究,” 國立台灣師範大學資訊工程所碩士論文, 2006.
    [邱炫盛 2007] 邱炫盛, “利用主題與位置相關語言模型於中文連續語音辨識,” 國立台灣師範大學資訊工程所碩士論文, 2007.
    [劉鳳萍 2009] 劉鳳萍, “使用鑑別式語言模型於語音辨識結果重新排序,” 國立台灣師範大學資訊工程所碩士論文, 2009.

    下載圖示
    QR CODE