研究生: |
王馨蘭 Hsin-Lan Wang |
---|---|
論文名稱: |
商品對比意見摘要技術之研究 An Effective Approach for Finding Comparative Sentence Pairs from Contrastive Opinioned Text |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 59 |
中文關鍵詞: | 對比意見摘要 、關聯圖模型 、重要性分數 、對比性分數 、動態更新句子群組 |
英文關鍵詞: | summarize comparative sentence pairs, association graph, updating algorithm |
論文種類: | 學術論文 |
相關次數: | 點閱:142 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文以論壇的意見評論句集合作為研究資料,探討如何從意見評論句集中自動摘要出具代表性的對比意見句組。由於使用者討論商品時大多針對特定功能或特徵提出意見,而功能與特徵多為名詞,因此本論文首先根據意見句中包含的名詞建立特徵向量後進行分群,將可能討論同一主題的句子群聚在一起。接著對同一分群中的句子依正反兩類意見分開,依同類意見句相互的相似程度值,建立句子間的關聯圖模型,計算出各個句子在該群該類中的重要性分數。接著從各群配對正反意見句產生候選對比句組,以權重值組合候選對比句組中兩句子之重要性分數及兩句子的相似度,計算出每個候選對比句組的對比性分數,以此對比性分數做為挑選對比句組的依據。此外,我們提出動態更新句子群組演算法,當資料新增時可將新增句動態加入原有意見句群組中,只需針對被更新的群組進行對比句擷取。實驗結果顯示,本論文提出之對比意見摘要技術對擷取對比意見句組較相關研究所提出的方法有更好的效果,且動態更新句子群組演算法對新增意見句的處理效率有明顯的提昇。
In this thesis, the opinioned reviews from web forum are used as the data source. Our goal is to provide an effective approach for automatically summarizing comparative sentence pairs from contractive opinioned text. Users usually give comments for a product on its features or functions, whose part of speech usually belong to nouns. Accordingly, each opinioned sentence is characterized by a noun feature vector according to the nouns appearing in the sentence. For the purpose of gathering the sentences describing on the same topic, clustering is performed on the opinioned sentences according to their noun feature vectors. Then, for each cluster, the positive and negative sentences are separated into two groups. In each group, after constructing the association graph of sentences according to their similarity degree, the representative score of each sentence is computed. For each positive and negative pairs selected from a cluster, the comparative score of the pair is obtained by performing a weighted sum to combine the representative scores of the two sentences and the similarity degree between the two sentences. The pair with the highest comparative score in a cluster will be selected as a comparative sentence pair. Moreover, we propose an efficient updating algorithm to insert a new opinioned sentence into the existing clusters of sentences incrementally. Then, it only requires performing comparative sentence pair selection from the updated cluster. The experimental results show that the effectiveness of the comparative sentence pair extraction method proposed in this thesis outperforms the related work. Especially, the proposed cluster updating algorithm has significant improvement on execution efficiency for processing newly inserted opinioned sentences.
[1] D. Das, A. Martins, “A Survey on Automatic Text Summarization, ” in Literature Survey for the Language and Statistics Ⅱ Course at CMU, 2007.
[2] Jade Goldstein, Vibhu Mittal, Jaime Carbonell, Mark Kantrowitz, “Multi-Document Summarization By Sentence Extraction, ” in Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4, 2000.
[3] Xuanjing Huang, W. Bruce Croft, “A Unified Relevance Model for Opinion Retrieval, ” in Proceedings of the 18th ACM conference of information and knowledge management, 2009.
[4] N. Jindal, B. Liu, “Identifying comparative sentences in text documents, ” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 244-251, 2006.
[5] Seung-Shik Kang, “Keyword-based document clustering, ” in Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages – Volume 11, 2003.
[6] Hyun Duk Kim, ChengXiang Zhai, “Generating Comparative Summaries of Contradictory Opinions in Text, ” in Proceeding of the 18th ACM conference on Information and knowledge management, 2009.
[7] Beibei Li, Shuting Xu, Jun Zhang, “Enhancing clustering blog documents by utilizing author/reader comments, ” in Proceedings of the 45th Annual Southeast Regional Conference, 2007.
[8] M. Litvak, M. Last, “Graph-Based Keyword Extraction for Single-Document Summarization, ” in Proceedings of the workshop on Multi-source Multilingual Information Extraction and Summarization, pages 17-24, 2008.
[9] J. B. MacQueen, “Some Methods for classification and Analysis of Multivariate Observations, ” in Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1:281-297, 1967.
[10] M. C. de Marneffe, A. N. Rafferty, C. D. Manning, “Finding contradictions in text, ” in Proceedings of ACL-08: HLT, pages 1039-1047, Columbus, Ohio, USA, 2008.
[11] Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, Pinar Duygulu, “Automatic multimedia cross-modal correlation discovery, ” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004.
[12] Michael J. Paul*, ChengXiang Zhai, Roxana Girju, “Summarizing Contrastive Viewpoints in Opinionated Text, ” in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010.
[13] T. Pedersen, S. Patwardhan, J. Michelizzi, “WordNet::Similarity: measuring the relatedness of concepts, ” in AAAI, pages 1024-1025, 2004.
[14] J. M. Ponte, W. B. Croft, “A language modeling approach to information retrieval, ” in Proceedings of the 21st Annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 275-281, 1998.
[15] Ellen Riloff, Janyce Wiebe, “Learning extraction patterns for subjective expressions, ” in Proceedings of the 2003 Conference on Empirical Mehods in Natural Language Processing, 2003.
[16] M. Thomas, B. Pang, L. Lee, “Get out the vote: Determining support or opposition from Congressional floor-debate transcripts, ” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 327-335, 2006.
[17] Tombros, J. M. Jose, I. Ruthven, “Clustering Top-Ranking Sentences for Information Access, ” in Proceedings of the 7th European Conference on Digital Libraries, 2003.
[18] Peter D. Turney, “Thumbs up or Thumbs down? Semantic orientation applied to unsupervised classification of reviews, ” in Proceedings of ACL-02, 40th Annual Meeting of the Association for Computational Linguistics, pages 417-424, 2002.
[19] J. H. Ward, "Hierarchical Grouping to Optimize an Objective Function, " Journal of the American Statistical Association 58 (301): 236–244, 1963.
[20] http://nlp.stanford.edu/software/tagger.shtml
[21] http://tartarus.org/~martin/PorterStemmer/
[22] http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html