Author: |
葉懿萱 |
---|---|
Thesis Title: |
網頁搜尋結果重要面向事實內容自動擷取之研究 Search Results Summarization for Multiple Query Aspects |
Advisor: | 柯佳伶 |
Degree: |
碩士 Master |
Department: |
資訊工程學系 Department of Computer Science and Information Engineering |
Thesis Publication Year: | 2014 |
Academic Year: | 102 |
Language: | 中文 |
Number of pages: | 84 |
Keywords (in Chinese): | 文字探勘 、文件摘要 、面向事實擷取 |
Keywords (in English): | text mining, document summarization, facet retrieval |
Thesis Type: | Academic thesis/ dissertation |
Reference times: | Clicks: 165 Downloads: 4 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
本論文的主要研究目的為,透過使用者給定的查詢字以及指定面向關鍵字,從大量的查詢回傳結果中,自動摘要出重要的面向資訊提供給使用者,讓使用者能快速得到所需的面向資訊。為了避免下載所有查詢結果文件並處理需花費相當多的時間,因此本論文採用查詢結果回傳的文件片段內容(snippets),作為探勘查詢字相關資訊的資料來源。本研究提出一個稱為SR-Summarization的方法,利用字詞在各面向查詢回傳結果中的分佈特性,提出評估字詞與查詢關鍵字的一般面向分數以及面向代表性分數的計算公式,進而評估一個句子的一般面向及面向代表性分數。此外,方法中也提出評估句子事實資訊性的計算公式,採用機器學習方法評估句子的品質好壞。最後,採用結合摘要內容的資訊量及內容多樣性為機制的句子挑選依據,產生"查詢字一般面向資訊”摘要,以及指定面向之”面向事實資訊”摘要。實驗結果顯示,本研究之方法能夠有效擷取出網頁搜尋結果中的重要面向事實內容,透過使用者問卷調查顯示,相較於相關研究的方法,使用者對於本研究方法找出的摘要結果有更高的滿意度。
The purpose of this thesis is to automatically summarize the important query-focused facet information from the huge number of search results according to a query and multiple facet terms given by users. From the results, users can quickly obtain the facet information they needed. Instead of spending much time to download all the search results of the query, in this study, the snippets of the search results are used as data resource. A method called SR-Summarization is proposed to estimate the general and facet score of a segment. First, the general and facet score of a segment is estimated according to the distribution of the contained terms among the search results of multiple aspects. Furthermore, a machine learning method is used to estimate the completeness quality of the segment. The weighted sum of the above two scores represents the informative score of a segment. Finally, the informative score and diversity score are both considered to select segments for generating a general summary and multiple facet summaries for the search result. The experiment results show that the SR-Summarization method can effectively extract important facet information from search results. The user survey shows that our approach has better performance than the related method on generating informative facet summaries.
[1] M., Ageev; D., Lagun; E., Agichtein, “Improving Search Result Summaries by Using Searcher Behavior Data,” Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pp. 13-22, 2013.
[2] R., Blanco; H., Zaragoza, “Finding Support Sentences for Entities,” Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp. 339-346, 2010.
[3] M., Bonzanini; M., Alvarez M.; T., Roelleke, “Extractive Summarisation via Sentence Removal: Condensing Relevant Sentences into a Short Summary,” Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pp. 893-896, 2013.
[4] J., Carbonell ; J., Goldstein, “The Use of MMR, Diversity-Based Reranking for Reordering Documents and producing Summaries,” Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 335-336, 1998.
[5] G., Erkan; D. R., Radev, “LexRank: Graph-based Lexical Centrality as Salience in Text Summarization,” Journal of Artificial Intelligence Research 22, pp. 457-479, 2004.
[6] E., Filatova ; V., Hatzivassiloglou, “A Formal Model for Information Selection in Multi-Sentence Text Extraction,” Proceedings of the 20th international conference on Computational Linguistics, 2004.
[7] M., Gabriel; R., Steve; C., Jean, “Extractive summarization of meeting recordings,” Proceedings, Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, pp. 593-596, 2005.
[8] D., Gillick1; K., Riedhammer; B., Favre; D., Hakkani-T¨ur, "A GLOBAL OPTIMIZATION FRAMEWORK FOR MEETING SUMMARIZATION," Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal, pp. 4769-4772, 2009.
[9] J., Goldstein; V., Mittal; J., Carbonell; M., Kantrowitz, “Multi-Document Summarization by Sentence Extraction,” Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4, pp. 40-48, 2000.
[10] S., Gupta; A., Nenkova ; D., Jurafsky, “Measuring Inportance and Query Relevance in Topic-focused Nulti-document Summarization,” Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 193-196, 2007.
[11] E., Hovy; L., Chin Yew, “Automated Text Summarization in SUMMARIST,” Advances in Automatic Text Summarization, 1999.
[12] R., McDonald, “A study of global inference algorithms in multi-document summarization,” Proceedings of the 29th European conference on IR research, pp. 557-564, 2007.
[13] X., Meng; F., Wei; X., Liu; M., Zhou; S., Li; H., Wang, “Entity-Centric Topic-Oriented Opinion Summarization in Twitter,” Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 379-387, 2012.
[14] S. E., Robertson; S., Walker, “Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval,” pp. 232-241, 1994.
[15] X., Shasha; L.., Yang, “Using Corpus and Knowledge-Based Similarity Measure in Maximum Marginal Relevance for Meeting Summarization,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4985-4988, 2008.
[16] W., Song; Q., Yu; Z., Xu; T., Liu; S., Li; J., Wen R., “Multi-Aspect Query Summarization by Composite Query,” Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pp. 325-334, 2012.
[17] L., Vanderwende; H., Suzki; C., Brockett; A., Nenkova, “Beyond SumBasic: Task-focused summarization with sentence simplification and lexical,” Information Processing and Management: an International Journal, pp. 1606-1618, 2007.
[18] X., Wan, “Topic Analysis for Topic-Focused Multi-Document Summarization,” Proceedings of the 18th ACM conference on Information and knowledge management, pp. 1609-1612, 2009.
[19] W., Yih; J., Goodman; L., Vanderwende; H., Suzuki, “Multi-Document Summarization by Maximizing Informative Content-Words,” Proceedings of the 20th international joint conference on Artificial intelligence, pp. 1776-1782, 2007.
[20] G., Yihong ; L., Xin, “Generic text summarization using relevance measure and latent semantic analysis,” Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 19-25, 2001.
[21] W., Yin; Y., Pei; F., Zhang; L., Huang, “Query-Focused Multi-document Summarization Based on Query-Sensitive Feature Space,” Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 1652-1656, 2012.
[22] LIBSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm/
[23] Wikipedia Page: Correlation and dependence http://en.wikipedia.org/wiki/Correlation_and_dependence