簡易檢索 / 詳目顯示

研究生: 王馨蘭
Hsin-Lan Wang
論文名稱: 商品對比意見摘要技術之研究
An Effective Approach for Finding Comparative Sentence Pairs from Contrastive Opinioned Text
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 59
中文關鍵詞: 對比意見摘要關聯圖模型重要性分數對比性分數動態更新句子群組
英文關鍵詞: summarize comparative sentence pairs, association graph, updating algorithm
論文種類: 學術論文
相關次數: 點閱:142下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文以論壇的意見評論句集合作為研究資料,探討如何從意見評論句集中自動摘要出具代表性的對比意見句組。由於使用者討論商品時大多針對特定功能或特徵提出意見,而功能與特徵多為名詞,因此本論文首先根據意見句中包含的名詞建立特徵向量後進行分群,將可能討論同一主題的句子群聚在一起。接著對同一分群中的句子依正反兩類意見分開,依同類意見句相互的相似程度值,建立句子間的關聯圖模型,計算出各個句子在該群該類中的重要性分數。接著從各群配對正反意見句產生候選對比句組,以權重值組合候選對比句組中兩句子之重要性分數及兩句子的相似度,計算出每個候選對比句組的對比性分數,以此對比性分數做為挑選對比句組的依據。此外,我們提出動態更新句子群組演算法,當資料新增時可將新增句動態加入原有意見句群組中,只需針對被更新的群組進行對比句擷取。實驗結果顯示,本論文提出之對比意見摘要技術對擷取對比意見句組較相關研究所提出的方法有更好的效果,且動態更新句子群組演算法對新增意見句的處理效率有明顯的提昇。

    In this thesis, the opinioned reviews from web forum are used as the data source. Our goal is to provide an effective approach for automatically summarizing comparative sentence pairs from contractive opinioned text. Users usually give comments for a product on its features or functions, whose part of speech usually belong to nouns. Accordingly, each opinioned sentence is characterized by a noun feature vector according to the nouns appearing in the sentence. For the purpose of gathering the sentences describing on the same topic, clustering is performed on the opinioned sentences according to their noun feature vectors. Then, for each cluster, the positive and negative sentences are separated into two groups. In each group, after constructing the association graph of sentences according to their similarity degree, the representative score of each sentence is computed. For each positive and negative pairs selected from a cluster, the comparative score of the pair is obtained by performing a weighted sum to combine the representative scores of the two sentences and the similarity degree between the two sentences. The pair with the highest comparative score in a cluster will be selected as a comparative sentence pair. Moreover, we propose an efficient updating algorithm to insert a new opinioned sentence into the existing clusters of sentences incrementally. Then, it only requires performing comparative sentence pair selection from the updated cluster. The experimental results show that the effectiveness of the comparative sentence pair extraction method proposed in this thesis outperforms the related work. Especially, the proposed cluster updating algorithm has significant improvement on execution efficiency for processing newly inserted opinioned sentences.

    目錄 i 附表目錄 ii 附圖目錄 iii 第一章 緒論 1 1.1 研究動機 1 1.2 研究目的 2 1.3 研究的範圍與限制 3 1.4 論文方法 4 1.5 論文架構 5 第二章 文獻探討 6 2.1 資料分群方法 6 2.2 文件分群方法 7 2.3 意見探勘 8 2.4 文件摘要 11 第三章 系統運作流程 15 第四章 資料特徵擷取 18 4.1 資料集前處理 18 4.2 文字內容處理 19 4.3 字詞特徵擷取 20 第五章 對比句組擷取與動態更新 24 5.1 分群方法 24 5.2 對比句組擷取方法 27 5.3 對比句組動態更新 33 第六章 實驗結果與討論 38 6.1 意見句分群結果評估 38 6.2 對比句摘要結果評估 42 6.3 動態更新句子群組演算法結果評估 49 第七章 結論與未來研究方向 53 7.1 結論 53 7.2 未來研究方向 54 參考文獻 55 附錄A stop word list 59

    [1] D. Das, A. Martins, “A Survey on Automatic Text Summarization, ” in Literature Survey for the Language and Statistics Ⅱ Course at CMU, 2007.
    [2] Jade Goldstein, Vibhu Mittal, Jaime Carbonell, Mark Kantrowitz, “Multi-Document Summarization By Sentence Extraction, ” in Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4, 2000.
    [3] Xuanjing Huang, W. Bruce Croft, “A Unified Relevance Model for Opinion Retrieval, ” in Proceedings of the 18th ACM conference of information and knowledge management, 2009.
    [4] N. Jindal, B. Liu, “Identifying comparative sentences in text documents, ” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 244-251, 2006.
    [5] Seung-Shik Kang, “Keyword-based document clustering, ” in Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages – Volume 11, 2003.
    [6] Hyun Duk Kim, ChengXiang Zhai, “Generating Comparative Summaries of Contradictory Opinions in Text, ” in Proceeding of the 18th ACM conference on Information and knowledge management, 2009.
    [7] Beibei Li, Shuting Xu, Jun Zhang, “Enhancing clustering blog documents by utilizing author/reader comments, ” in Proceedings of the 45th Annual Southeast Regional Conference, 2007.
    [8] M. Litvak, M. Last, “Graph-Based Keyword Extraction for Single-Document Summarization, ” in Proceedings of the workshop on Multi-source Multilingual Information Extraction and Summarization, pages 17-24, 2008.
    [9] J. B. MacQueen, “Some Methods for classification and Analysis of Multivariate Observations, ” in Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1:281-297, 1967.
    [10] M. C. de Marneffe, A. N. Rafferty, C. D. Manning, “Finding contradictions in text, ” in Proceedings of ACL-08: HLT, pages 1039-1047, Columbus, Ohio, USA, 2008.
    [11] Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, Pinar Duygulu, “Automatic multimedia cross-modal correlation discovery, ” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004.
    [12] Michael J. Paul*, ChengXiang Zhai, Roxana Girju, “Summarizing Contrastive Viewpoints in Opinionated Text, ” in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010.
    [13] T. Pedersen, S. Patwardhan, J. Michelizzi, “WordNet::Similarity: measuring the relatedness of concepts, ” in AAAI, pages 1024-1025, 2004.
    [14] J. M. Ponte, W. B. Croft, “A language modeling approach to information retrieval, ” in Proceedings of the 21st Annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 275-281, 1998.
    [15] Ellen Riloff, Janyce Wiebe, “Learning extraction patterns for subjective expressions, ” in Proceedings of the 2003 Conference on Empirical Mehods in Natural Language Processing, 2003.
    [16] M. Thomas, B. Pang, L. Lee, “Get out the vote: Determining support or opposition from Congressional floor-debate transcripts, ” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 327-335, 2006.
    [17] Tombros, J. M. Jose, I. Ruthven, “Clustering Top-Ranking Sentences for Information Access, ” in Proceedings of the 7th European Conference on Digital Libraries, 2003.
    [18] Peter D. Turney, “Thumbs up or Thumbs down? Semantic orientation applied to unsupervised classification of reviews, ” in Proceedings of ACL-02, 40th Annual Meeting of the Association for Computational Linguistics, pages 417-424, 2002.
    [19] J. H. Ward, "Hierarchical Grouping to Optimize an Objective Function, " Journal of the American Statistical Association 58 (301): 236–244, 1963.
    [20] http://nlp.stanford.edu/software/tagger.shtml
    [21] http://tartarus.org/~martin/PorterStemmer/
    [22] http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

    下載圖示
    QR CODE