簡易檢索 / 詳目顯示

研究生: 陳憶文
論文名稱: 探索虛擬關聯回饋技術和鄰近資訊於語音文件檢索與辨識之改進
Exploring Effective Pseudo-Relevance Feedback and Proximity Information for Speech Retrieval and Transcription
指導教授: 陳柏琳
Chen, Berlin
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 71
中文關鍵詞: 語音文件檢索語音辨識語言模型虛擬關聯回饋鄰近資訊
英文關鍵詞: Spoken document retrieval, Speech Recognition, Language Modeling, Pseudo-Relevance Feedback, Proximity
論文種類: 學術論文
相關次數: 點閱:107下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 虛擬文件檢索(Pseudo-Relevance Feedback)為目前最常見的查詢重建(Query Reformulation)典範。它假設預檢索(Initial-round of Retrieval)排名前端的文件都是相關的,所以可全用於查詢擴展(Query Expansion)。然而,預檢索所獲得的文件中,極可能同時包含重複性資訊(Redundant)和非關聯(Non-relevant)資訊,使得重新建立的查詢不能有良好檢索效能。有鑑於此,本論文探討運用不同資訊以在預檢索獲得的語音文件中挑選適當的關聯文件來建立查詢表示,讓語音文件檢索結果可以更準確。另一方面,關聯模型(Relevance Model )雖然可藉由詞袋(Bag-of-words)假設來簡化模型推導和估測,卻可能因此過度簡化問題,特別是用於語音辨識的語言模型。為了調適關聯模型,本論文有兩個貢獻。其一,本論文提出詞鄰近資訊使用於關聯模型以改善詞袋(Bag-of-words)假設於語音辨識的不適。其二,本論文也進一步探討主題鄰近資訊以強化鄰近關聯模型的架構。實驗結果證明本論文所提出之方法,不論在語音文件檢索還是語音辨識方面皆可有效改善現有方法的效能。

    Pseudo-relevance feedback is by far the most commonly-used paradigm for query reformulation in spoken document retrieval, which assumes that a small amount of top-ranked feedback documents obtained from the initial retrieval are relevant and can be utilized for query expansion. Nevertheless, simply taking all of the top-ranked feedback documents acquired from the initial retrieval for query modeling does not necessary work well, especially when the top-ranked documents contain much redundant or non-relevant cues. In view of this, we explore different kinds of information cues for selecting helpful feedback documents to further improve information retrieval. On the other hand, relevance model (RM) based on “bag-of-words” assumption, which can facilitate the derivation and estimation, may be oversimplified for the task of language modeling in speech recognition. Hence, we also enhance RM in two significant aspects. First, “bag-of-words” assumption of RM is relaxed by incorporating word proximity information into RM formulation. Second, topic-based proximity information is additionally explored to further enhance the proximity-based RM framework. Experiments conducted on not only a spoken document retrieval task but also a speech recognition task indicates that our approaches can bring competitive utilities to existing ones.

    1 Introduction 1 1.1 Motivation 1 1.1.1 Spoken Document Retrieval 1 1.1.2 Speech Recognition 6 1.2 Contribution 7 1.3 Outline of the Thesis 8 2 Related Work 10 2.1 Language Modeling for Spoken Document Retrieval 10 2.1.1 Retrieval Modeling Approaches 11 2.1.2 Pseudo-Relevance Feedback 15 2.1.3 Query Modeling 25 2.2 Language Modeling for Speech Recognition 31 2.2.1 N-gram Language Model 31 2.2.2 Topic-based Language Models 32 2.2.3 Trigger-based Language Model 33 2.2.4 Recurrent Neural Network Language Model vs. Discriminative Language Model 34 2.2.5 Relevance Modeling 34 3 Effective Pseudo-Relevance Feedback & Proximity Information 38 3.1 Diversity Measure 39 3.2 Density Measure 40 3.3 Non-Relevance Measure 42 3.4 Proximity Information for RM 44 3.5 Topic-based Proximity Information for RM 46 4 Experiments on Spoken Document Retrieval 47 4.1 Spoken Document Collections & Evaluation Metrics 47 4.2 Subword-level Index Features 49 4.3 Baseline Experiments 50 4.4 Using Effective Pseudo-Relevance Feedback 52 4.5 IDF-Based Term Weighting 54 4.6 Fusion of Different Levels of Indexing Features 54 5 Experiments on Speech Recognition 56 5.1 Speech Recognition Corpus & Evaluation Metrics 56 5.2 Baseline Experiments 58 5.3 Using Proximity Information 59 5.4. Using Latent Topic Proximity Information 60 6 Conclusion and Future Work 62 Bibliography 64

    [1] L. Lin-shan and B. Chen, "Spoken document understanding and organization," IEEE Signal Processing Magazine, vol. 22(5), pp. 42-60, 2005.
    [2] C. Chelba, T. J. Hazen, and M. Saraclar, "Retrieval and browsing of spoken content," IEEE Signal Processing Magazine, vol. 25(3), pp. 39-49, 2008.
    [3] M. Ostendorf, "Speech technology and information access," IEEE Signal Processing Magazine, vol. 25(3), pp. 152-150, 2008.
    [4] B. Chen, "Word topic models for spoken document retrieval and transcription," ACM Transactions on Asian Language Information Processing, vol. 8(1), pp. 1-27, 2009.
    [5] R. Baeza-Yates and B. Ribeiro-Neto, "Modern Information Retrieval: The Concepts and Technology behind Search": Addison-Wesley Professional, 2011.
    [6] J. S. Garofolo, C. G. P. Auzanne, and E. M. Voorhees, "The TREC spoken document retrieval track: A success story," in Proceeding 8th Text REtrieval Conference (TREC-8), 2000, pp. 107-129.
    [7] J. M. Ponte and W. B. Croft, "A language modeling approach to information retrieval," in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia, 1998, pp. 275-281.
    [8] D. R. H. Miller, T. Leek, and R. M. Schwartz, "A hidden Markov model information retrieval system," in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, United States, 1999, pp. 214-221.
    [9] B. Chen, H.-M. Wang, and L.-S. Lee, "A discriminative HMM/N-gram-based retrieval approach for mandarin spoken documents," vol. 3(2), pp. 128-145, 2004.
    [10] T. K. Chia, K. C. Sim, H. Li, and H. T. Ng, "Statistical lattice-based spoken document retrieval," ACM Transactions on Information Systems, vol. 28(1), pp. 1-30, 2010.
    [11] C. X. Zhai, "Statistical language models for information retrieval: A critical review", Foundations and Trends in Informational Retrieval, vol. 2,no. 3, pp. 137-213, 2008.
    [12] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," The Journal of Machine Learning Research, vol. 3(1), pp. 993-1022, 2003.
    [13] T. Hofmann, "Unsupervised Learning by Probabilistic Latent Semantic Analysis," Machine Learning, vol. 42(1), pp. 177-196, 2001.
    [14] D. Blei and J. Lafferty, "Topic models," in Text Mining: Theory and Applications, A. Srivastava and M. Sahami, Eds., ed New York: Taylor and Francis, 2009.
    [15] X. Yi and J. Allan, "A Comparative Study of Utilizing Topic Models for Information Retrieval," in Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, Toulouse, France, 2009, pp. 29-41.
    [16] V. T. Turunen and M. Kurimo, "Indexing confusion networks for morph-based spoken document retrieval," in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, The Netherlands, 2007, pp. 631-638.
    [17] S. Parlak and M. Saraclar, "Performance Analysis and Improvement of Turkish Broadcast News Retrieval," IEEE Transactions on Audio, Speech, and Language Processing, , vol. 20(3), pp. 731-741, 2012.
    [18] B. Chen, K.-Y. Chen, P.-N. Chen, and Y.-W. Chen, "Spoken Document Retrieval With Unsupervised Query Modeling Techniques," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20(9), pp. 2602-2612, 2012.
    [19] V. Lavrenko and W. B. Croft, "Relevance-based language models," in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, Louisiana, United States, 2001, pp. 120-127.
    [20] F. Jelinek, "Statistical methods for speech recognition", Cambridge, MA: MIT Press, 1999.
    [21] C. D. Manning and H. Schutze, "Foundations of statistical natural language processing", Cambridge, MA: MIT Press, 1999.
    [22] X. Wei and W. B. Croft, "LDA-based document models for ad-hoc retrieval," in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington, USA, 2006, pp. 178-185.
    [23] S. Kullback and R. A. Leibler, "On Information and Sufficiency," The Annals of Mathematical Statistics, vol. 22(1), pp. 79-86, 1951.
    [24] L. Shih-Hsiang, Y. Yao-Ming, and C. Berlin, "Leveraging Kullback-Leibler Divergence Measures and Information-Rich Cues for Speech Summarization " IEEE Transactions on Audio, Speech, and Language Processing, vol. 19(4), pp. 871-882, 2011.
    [25] C. Zhai and J. Lafferty, "A study of smoothing methods for language models applied to Ad Hoc information retrieval," in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ed. New Orleans, Louisiana, USA: ACM, 2001, pp. 334-342.
    [26] C. Zhai and J. Lafferty, "Model-based feedback in the language modeling approach to information retrieval," in Proceedings of the tenth international conference on Information and knowledge management, Atlanta, Georgia, USA, 2001, pp. 403-410.
    [27] X. Shen and C. Zhai, "Active feedback in ad hoc information retrieval," in Proceedings of the 28th annual international ACM SIGIR conference on Research and Development in Information Retrieval, Salvador, Brazil, 2005, pp. 59-66.
    [28] J. Xu and W. B. Croft, "Query Expansion Using Local and Global Document Analysis," in Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 1996, pp. 4-11.
    [29] L. Ballesteros and W. B. Croft, "Phrasal translation and query expansion techniques for cross-language information retrieval," in Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, Philadelphia, Pennsylvania, USA, 1997, pp. 84-91.
    [30] T. Sakai, M. Kajiura, and K. Sumita, "A First Step towards Flexible Local Feedback for Ad hoc Retrieval," in Proceedings of the fifth International Workshop on Information Retrieval with Asian Languages, Hong Kong, China, 2000, pp. 95-102.
    [31] J. Xu and W. B. Croft, "Improving the effectiveness of information retrieval with local context analysis," ACM Transactions on Information Systems, vol. 18(1), pp. 79-112, 2000.
    [32] S. E. Robertson and S. Walker, "Okapi/Keenbow at TREC-8," in The 8th Text REtrieval Conference (TREC 8), 2000, p. 151.
    [33] T. Sakai, T. Manabe, and M. Koyama, "Flexible pseudo-relevance feedback via selective sampling," ACM Transactions on Asian Language Information Processing, vol. 4(2), pp. 111-135, 2005.
    [34] J. Carbonell and J. Goldstein, "The use of MMR, diversity-based reranking for reordering documents and producing summaries," in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia, 1998, pp. 335-336.
    [35] C. X. Zhai, W. W. Cohen, and J. Lafferty, "Beyond independent relevance: methods and evaluation metrics for subtopic retrieval," in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, Toronto, Canada, 2003, pp. 10-17.
    [36] V. Lavrenko, "A generative theory of relevance," University of Massachusetts Amherst, 2004.
    [37] Y. Lv and C. Zhai, "A comparative study of methods for estimating query language models with pseudo feedback," in Proceedings of the 18th ACM conference on Information and knowledge management, Hong Kong, China, 2009, pp. 1895-1898.
    [38] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum Likelihood from Incomplete Data via the EM algorithm," Journal of the Royal Statistical Society B, vol. 39(1), pp. 1-38, 1977.
    [39] T. L. Griffiths and M. Steyvers, "Finding scientific topics," in Proceedings of the National Academy of Sciences, 2004, pp. 5228-5235.
    [40] R. Rosenfeld, "Two decades of statistical language modeling: where do we go from here?," Proceedings of the IEEE, vol. 88(8), pp. 1270-1278, 2000.
    [41] J. R. Bellegarda, "Statistical language model adaptation: review and perspectives," Speech Communication, vol. 42(1), pp. 93-108, 2004.
    [42] D. Gildea and T. Hofmann, "Topic-based language models using EM," in Proceedings of European Conference on Speech Communication and Technology, 1999, pp. 2167-2170.
    [43] Y.-C. Tam and T. Schultz, "Dynamic language model adaptation using variational Bayes inference," in Proceedings of the Annual Conference of the International Speech Communication Association, 2005, pp. 5-8.
    [44] R. Lau, R. Rosenfeld, and S. Roukos, "Trigger-based language models: a maximum entropy approach," in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993, pp. 45-48.
    [45] T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur, "Recurrent neural network based language model," in Proceedings of Annual Conference of the International Speech Communication Association, 2010, pp. 1045-1048.
    [46] B. Roark, M. Saraclar, and M. Collins, "Discriminative n-gram language modeling," Computer Speech and Language, vol. 21(2), pp. 373-392, 2007.
    [47] B. Chen and K.-Y. Chen, "Leveraging relevance cues for language modeling in speech recognition," Information Processing & Management, vol. 49(4), pp. 807-816, 2013.
    [48] B. Chen and S.-H. Lin, "A risk-aware modeling framework for speech summarization," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20(1), pp. 211-222, 2012.
    [49] Z. Xu, R. Akella, and Y. Zhang, "Incorporating diversity and density in active learning for relevance feedback," in Proceedings of the 29th European conference on IR research, Rome, Italy, 2007, pp. 246-257.
    [50] W. W. Edgar Meij, Jiyin He, Maarten de Rijke "Incorporating Non-Relevance Information in the Estimation of Query Models," in TREC, 2008.
    [51] S. Cronen-Townsend and W. B. Croft, "Quantifying query ambiguity," presented at the Proceedings of the second international conference on Human Language Technology Research, San Diego, California, 2002.
    [52] LDC, "Project Topic Detection and Tracking," Linguistic Data Consortium, 2000.
    [53] H.-M. Wang, B. Chen, J.-W. Kuo, and S.-S. Cheng, "MATBN: A Mandarin Chinese Broadcast News Corpus," International Journal of Computational Linguistics & Chinese Language Processing, vol. 10(1), pp. 219-235, 2005.
    [54] A. Stolcke, SRI Language Modeling Toolkit (http://www.speech.sri.com/projects/srilm/), 2000.
    [55] B. Chen, J.-W. Kuo, and W.-H. Tsai, "Lightly supervised and data-driven approaches to Mandarin broadcast news transcription," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004, pp. 777-780.
    [56] H.-S. Lee and B. Chen, "Generalized likelihood ratio discriminant analysis," in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, 2009, pp. 158-163.
    [57] T. Mikolov, S. Kombrink, A. Deoras, L. a. s. Burget, and J. H. C. ´y, "RNNLM-Recurrent neural network language modeling toolkit," in Proceedings of IEEE workshop on Automatic Speech Recognition and Understanding, 2011.
    [58] T. Oba, T. Hori, and A. Nakamura, "A comparative study on methods of weighted language model training for reranking LVCSR N-best hypotheses," in Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing 2010, pp. 5126-5129.
    [59] L. Gillick and S. J. Cox, "Some statistical issues in the comparison of speech recognition algorithms," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1989, pp. 532-535.
    [60] C. Berlin, H.-C. Chang, and K.-Y. Chen, "Sentence modeling for extractive speech summarization," in Proceedings of the IEEE International Conference on Multimedia & Expo, 2013.
    [61] Y.-W. Chen, K.-Y. Chen, H.-M. Wang, and B. Chen, "Effective pseudo-relevance feedback for spoken document retrieval," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2013.

    下載圖示
    QR CODE