Basic Search / Detailed Display

Author: 葉懿萱
Thesis Title: 網頁搜尋結果重要面向事實內容自動擷取之研究
Search Results Summarization for Multiple Query Aspects
Advisor: 柯佳伶
Degree: 碩士
Master
Department: 資訊工程學系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2014
Academic Year: 102
Language: 中文
Number of pages: 84
Keywords (in Chinese): 文字探勘文件摘要面向事實擷取
Keywords (in English): text mining, document summarization, facet retrieval
Thesis Type: Academic thesis/ dissertation
Reference times: Clicks: 165Downloads: 4
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 本論文的主要研究目的為,透過使用者給定的查詢字以及指定面向關鍵字,從大量的查詢回傳結果中,自動摘要出重要的面向資訊提供給使用者,讓使用者能快速得到所需的面向資訊。為了避免下載所有查詢結果文件並處理需花費相當多的時間,因此本論文採用查詢結果回傳的文件片段內容(snippets),作為探勘查詢字相關資訊的資料來源。本研究提出一個稱為SR-Summarization的方法,利用字詞在各面向查詢回傳結果中的分佈特性,提出評估字詞與查詢關鍵字的一般面向分數以及面向代表性分數的計算公式,進而評估一個句子的一般面向及面向代表性分數。此外,方法中也提出評估句子事實資訊性的計算公式,採用機器學習方法評估句子的品質好壞。最後,採用結合摘要內容的資訊量及內容多樣性為機制的句子挑選依據,產生"查詢字一般面向資訊”摘要,以及指定面向之”面向事實資訊”摘要。實驗結果顯示,本研究之方法能夠有效擷取出網頁搜尋結果中的重要面向事實內容,透過使用者問卷調查顯示,相較於相關研究的方法,使用者對於本研究方法找出的摘要結果有更高的滿意度。

    The purpose of this thesis is to automatically summarize the important query-focused facet information from the huge number of search results according to a query and multiple facet terms given by users. From the results, users can quickly obtain the facet information they needed. Instead of spending much time to download all the search results of the query, in this study, the snippets of the search results are used as data resource. A method called SR-Summarization is proposed to estimate the general and facet score of a segment. First, the general and facet score of a segment is estimated according to the distribution of the contained terms among the search results of multiple aspects. Furthermore, a machine learning method is used to estimate the completeness quality of the segment. The weighted sum of the above two scores represents the informative score of a segment. Finally, the informative score and diversity score are both considered to select segments for generating a general summary and multiple facet summaries for the search result. The experiment results show that the SR-Summarization method can effectively extract important facet information from search results. The user survey shows that our approach has better performance than the related method on generating informative facet summaries.

    摘要 i Abstract ii 誌謝 iii 目錄 iv 附圖目錄 vi 附表目錄 vii 第一章 緒論 1 1.1 研究動機 1 1.2 研究目的 2 1.3 研究範圍與限制 4 1.4 論文方法 5 1.5 論文架構 6 第二章 文獻探討 7 2.1 文件摘要 7 2.2 網頁查詢結果摘要 13 2.3 文章內容事實資訊研究 14 第三章 研究方法與步驟 15 第四章 一般面向資訊摘要方法 17 4.1 資料前處理 17 4.2 句子代表字擷取 19 4.3 一般面向分數計算方法 21 4.4 文字內容品質評估方法 23 4.5 摘要句挑選方法 31 第五章 面向事實資訊摘要方法 34 5.1 面向代表性分數計算方法 34 5.2 事實資訊性分數計算方法 36 5.3 文字內容品質評估方法 37 5.4 摘要句挑選方法 41 第六章 實驗結果及討論 43 6.1. 資料集介紹 43 6.2. 摘要方法之公式內部參數調整實驗 44 6.3. 文字片段內容品質評估實驗 53 6.4. 摘要結果評估 59 第七章 結論與未來研究方向 68 7.1 結論 68 7.2 未來研究方向 69 參考文獻 70 附錄一 73 附錄二 75

    [1] M., Ageev; D., Lagun; E., Agichtein, “Improving Search Result Summaries by Using Searcher Behavior Data,” Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pp. 13-22, 2013.
    [2] R., Blanco; H., Zaragoza, “Finding Support Sentences for Entities,” Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp. 339-346, 2010.
    [3] M., Bonzanini; M., Alvarez M.; T., Roelleke, “Extractive Summarisation via Sentence Removal: Condensing Relevant Sentences into a Short Summary,” Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pp. 893-896, 2013.
    [4] J., Carbonell ; J., Goldstein, “The Use of MMR, Diversity-Based Reranking for Reordering Documents and producing Summaries,” Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 335-336, 1998.
    [5] G., Erkan; D. R., Radev, “LexRank: Graph-based Lexical Centrality as Salience in Text Summarization,” Journal of Artificial Intelligence Research 22, pp. 457-479, 2004.
    [6] E., Filatova ; V., Hatzivassiloglou, “A Formal Model for Information Selection in Multi-Sentence Text Extraction,” Proceedings of the 20th international conference on Computational Linguistics, 2004.
    [7] M., Gabriel; R., Steve; C., Jean, “Extractive summarization of meeting recordings,” Proceedings, Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, pp. 593-596, 2005.
    [8] D., Gillick1; K., Riedhammer; B., Favre; D., Hakkani-T¨ur, "A GLOBAL OPTIMIZATION FRAMEWORK FOR MEETING SUMMARIZATION," Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal, pp. 4769-4772, 2009.
    [9] J., Goldstein; V., Mittal; J., Carbonell; M., Kantrowitz, “Multi-Document Summarization by Sentence Extraction,” Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4, pp. 40-48, 2000.
    [10] S., Gupta; A., Nenkova ; D., Jurafsky, “Measuring Inportance and Query Relevance in Topic-focused Nulti-document Summarization,” Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 193-196, 2007.
    [11] E., Hovy; L., Chin Yew, “Automated Text Summarization in SUMMARIST,” Advances in Automatic Text Summarization, 1999.
    [12] R., McDonald, “A study of global inference algorithms in multi-document summarization,” Proceedings of the 29th European conference on IR research, pp. 557-564, 2007.
    [13] X., Meng; F., Wei; X., Liu; M., Zhou; S., Li; H., Wang, “Entity-Centric Topic-Oriented Opinion Summarization in Twitter,” Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 379-387, 2012.
    [14] S. E., Robertson; S., Walker, “Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval,” pp. 232-241, 1994.
    [15] X., Shasha; L.., Yang, “Using Corpus and Knowledge-Based Similarity Measure in Maximum Marginal Relevance for Meeting Summarization,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4985-4988, 2008.
    [16] W., Song; Q., Yu; Z., Xu; T., Liu; S., Li; J., Wen R., “Multi-Aspect Query Summarization by Composite Query,” Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pp. 325-334, 2012.
    [17] L., Vanderwende; H., Suzki; C., Brockett; A., Nenkova, “Beyond SumBasic: Task-focused summarization with sentence simplification and lexical,” Information Processing and Management: an International Journal, pp. 1606-1618, 2007.
    [18] X., Wan, “Topic Analysis for Topic-Focused Multi-Document Summarization,” Proceedings of the 18th ACM conference on Information and knowledge management, pp. 1609-1612, 2009.
    [19] W., Yih; J., Goodman; L., Vanderwende; H., Suzuki, “Multi-Document Summarization by Maximizing Informative Content-Words,” Proceedings of the 20th international joint conference on Artificial intelligence, pp. 1776-1782, 2007.
    [20] G., Yihong ; L., Xin, “Generic text summarization using relevance measure and latent semantic analysis,” Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 19-25, 2001.
    [21] W., Yin; Y., Pei; F., Zhang; L., Huang, “Query-Focused Multi-document Summarization Based on Query-Sensitive Feature Space,” Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 1652-1656, 2012.
    [22] LIBSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm/
    [23] Wikipedia Page: Correlation and dependence http://en.wikipedia.org/wiki/Correlation_and_dependence

    下載圖示
    QR CODE