研究生: |
蔡謹安 Tsai, Jim-An |
---|---|
論文名稱: |
賣場文字評論內容自動面向摘要之研究 Aspect Auto-Extraction Summary of Online Store Reviews |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 76 |
中文關鍵詞: | 評論摘要 、TF-IDF特徵 、主題特徵 、關鍵字特徵 |
英文關鍵詞: | Review Summary, TF-IDF feature, topic model feature, keywords feature |
DOI URL: | http://doi.org/10.6345/THE.NTNU.DCSIE.027.2018.B02 |
論文種類: | 學術論文 |
相關次數: | 點閱:123 下載:10 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
線上商場中的評論通常包括對產品或賣家的敘述,但動輒好幾百條的評論,使用者不容易一一瀏覽這些評論內容。若能將上述的評論內容進行摘要總結,將會有助於使用者有效選擇產品。本論文對線上商場的評論建構一個判別評論屬於產品或商家描述的摘要系統,提出以三種類型特徵來建立分類模型。第一種是運用單詞(unigram)頻率特徵,計算評論內所有字在各文字片段的TF-IDF值為特徵值。第二種是以主題模型分析,建立每個文字片段在不同主題數的程度值,作為每個文字片段的主題性特徵值。第三種則透過人為標註不同面向的文字片段內容,找出各面向中文字片段內卡方代表值高的字詞或利用LDA主題底下的字詞當作關鍵詞,再利用Word2Vec計算一個文字片段與各關鍵詞特徵的相似度值。分類後各類的文字片段以LDA分析結果做面向歸納,並將面向內的文字片段透過Word2Vec將語意相似的文字片段合併,進行摘要整理。實驗結果顯示關鍵字詞特徵在商家分類上有較好的分類效果,而主題性特徵結合關鍵字詞特徵在產品分類有較好的分類效果,能有效的區分出商家和產品的文字片段,而摘要結果則有助於使用者有效率瀏覽該商場的資訊。
Online Market reviews usually include descriptions about sellers or products, but it remains a lot of information that users can’t easily browse. If there has a system to summarize these reviews, it will help users choose products efficiently. In this thesis, we construct a system to summarize the snippets of market reviews about sellers or products. We use three types of features to build the classification model for distinguish seller reviews, product reviews, ans other reviews. The first one is the frequency of unigrams. We calculate TF-IDF values of every words in snippets as features. The second is the topic model features. The degrees of LDA topic models of each snippet form the features. The third one is the keyword features. Chi-square value test and LDA topic words are used to select the keywords. Then, Word2Vec is used to calculate the similarity between a snippet and each selected keyword to generate the feature values. After getting the snippets classified into seller reviews and product reviews, we use LDA analysis to cluster the snippets into topics of aspects. Finally, sematics-similar snippets in the same topic are combined according to their Word2Vec to generate the summarization. The result of the experiments shows that using keyword features achieves higher precision for classifing the seller reviews. To combine the topic model feature and keywords feature have better classification result for the product reviews. This system will help users browse the market review more efficiently.
[1]A.K. Samha, Y. Li, J. ZhangAspect-based opinion mining from product reviews using conditional random fields.Data Mining and Analytics: Proceedings of the 13th Australasian Data Mining Conference [Conferences in Research and Practice in Information Technology, Volume 168], Australian Computer Society (2015), pp. 119-128
[2]Antonie, M.-L., Zaiane, O.R., Holte, R.C.: Learning to Use a Learned Model: A Two-Stage Approach to Classification. In: Proceedings of the Sixth International Conference on Data Mining, pp. 33–42 (2006)
[3]Bing Liu. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2012.
[4]Bing Liu, Minqing Hu, and Junsheng Cheng. Opinion observer: analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on World Wide Web, WWW ’05, pages 342–351, New York, NY, USA, 2005. ACM.
[5]Bing Liu. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2012.
[6]Bing Liu, Minqing Hu, and Junsheng Cheng. Opinion observer: analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on World Wide Web, WWW ’05, pages 342–351, New York, NY, USA, 2005. ACM.
[7]Christina Sauper, Aria Haghighi, and Regina Barzilay. Content models with attitude. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, pages 350–358, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.
[8]David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, March 2003.
[9]HSU, CHEN-WEI. Fact Extraction for Epidemic Disease from Chinese News Articles. NTNU Thesis.
[10]Howell, D.C. Chi-square test: Analysis of contingency tables. In International Encyclopedia of Statistical Science; Springer: Berlin, Germany, 2011; pp. 250–252.
[11]Ivan Titov and Ryan McDonald. Modeling online reviews with multi-grain topic models. In Proceedings of the 17th international conference on World Wide Web, WWW ’08, pages 111–120, New York, NY, USA, 2008. ACM.
[12]Li Zhuang, Feng Jing, and Xiao-Yan Zhu. Movie review mining and summarization. In Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM ’06, pages 43–50, New York, NY, USA, 2006. ACM.
[13]Lun-Wei Ku, Yu-Ting Liang, and Hsin-Hsi Chen. Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of the AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pages 100–107, 2006.
[14]Minqing Hu and Bing Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’04, pages 168–177, New York, NY, USA, 2004. ACM.
[15]Niklas Jakob and Iryna Gurevych. Using anaphora resolution to improve opinion target identification in movie reviews. In Proceedings of the ACL 2010 Conference Short Papers, ACLShort ’10, pages 263–268, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
[16]Pierre F. Baldi , Cristina V. Lopes , Erik J. Linstead , Sushil K. Bajracharya, A theory of aspects as latent topics, Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, October 19-23, 2008, Nashville, TN, USA
[17]Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, and ChengXiang Zhai. Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on World Wide Web, WWW ’07, pages 171–180, New York, NY, USA, 2007. ACM.
[18]Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. Multi-facet rating of product reviews. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR ’09, pages 461–472, Berlin, Heidelberg, 2009. Springer-Verlag.
[19]Samuel Brody and Noemie Elhadad. An unsupervised aspect-sentiment model for online reviews. In Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pages 804–812, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
[20]Tak-Lam Wong, Lidong Bing, and Wai Lam. Normalizing web product attributes and discovering domain ontology with minimal effort. In Proceedings of the Forth International Conference on Web Search and Web Data Mining (WSDM ’11), pages 805–814, 2011.
[21]Wei Jin, Hung Hay Ho, and Rohini K. Srihari. Opinionminer: a novel machine learning system for web opinion mining and extraction. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, pages 1195– 1204, New York, NY, USA, 2009. ACM.
[22]Yi-Hsuan Yeh. Search Results Summarization for Multiple Query Aspects. NTNU Thesis.
[23]Yejin Choi and Claire Cardie. Hierarchical sequential learning for extracting opinions and their attributes. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10), pages 269–274, 2010.