簡易檢索 / 詳目顯示

研究生: 許先緯
Hsu, Hsien-Wei
論文名稱: 旅遊評論關注面向與不一致性分析研究
Analysis of Aspects and Inconsistency from Travel Reviews
指導教授: 侯文娟
Hou, Wen-Juan
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 67
中文關鍵詞: 不一致性旅遊評論面向意見探勘非監督式學習自然語言處理
英文關鍵詞: Inconsistency, Travel reviews, Aspect opinion mining, Unsupervised learning, NLP
DOI URL: http://doi.org/10.6345/THE.NTNU.DCSIE.019.2018.B02
論文種類: 學術論文
相關次數: 點閱:230下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 網路的便利性改變人們的消費習慣和店家的經營模式,許多人在進行購物前習慣上網先查詢相關評價再決定是否購買,希望購買的物品能達到預期的效益。店家則希望消費者在購物體驗後能上網留下評價,這些評論能夠吸引更多人關注並且提供店家維持品質和改善的方向。一篇評論通常包含使用者給予的星等分數和意見,當評論文章數量變多,經過觀察會發現其中有些評論的星等分數和意見內容不符合,像是使用者給予5顆星的正面評分但是留下的意見卻都是許多缺失和抱怨,就是所謂的不一致現象。
    本論文使用的資料來自於TripAdvisor國際旅遊評論網站,實驗資料選自台北市知名7間飯店。研究目的有二:第一個目的是擴充情感字典裡的詞彙數量,透過自建擴充的情緒詞彙庫和所提出的情緒計算模組能自動賦予每個詞彙情緒分數,分析評論文章的不一致性,以便提供有效的評論意見供旅客做為參考依據。第二個目的是找出評論文章裡的面向詞(Aspect term),將所有面向詞映射到向量空間後使用分群演算法進行分群,希望意義相近的詞彙能夠分到同一類並找出能夠代表此類的代表字,當使用者想要查看所在意面向的評論文章時,不需要每篇評論都要看過,而是能夠透過分析出來的代表字快速找到有關此面向的評論文章,也能更細部的分析各個面向的正負面評價。
    本研究提出三種基於不同規則的統計算法辨識評論文章的不一致性,其中使用去掉最低分做算術平均數之規則,系統準確率可達到85.7%。關注面向部分,使用Word2vec產生詞向量,利用K-Means和Fuzzy C-Means將面向詞分群,並找出每群的代表字。研究結果顯示,使用Fuzzy C-Means分群找出的代表字較能區分各種不同面向。

    The convenience of the Internet changes people's buying behaviors and business models. On the one hand, many people tend to do online research about related reviews before making decisions, and hope the goods they purchased would fulfill their expectations. On the other hand, retailers hope consumers leave shopping reviews online in order to draw more attention and offer a direction for the shops to improve product and service quality. A review generally includes a rating score and comments. However, sometimes when the amount of reviews grows to a certain number, there might be some rating scores not fitting in with the comments. For example, a user gave a five-star rating score, but the comment included the complaint about the service and product quality. In this study, such situations are so-called inconsistent phenomena.
    This thesis downloaded the review data of seven well-known hotels in Taipei, Taiwan from TripAdvisor, an international traveler review site. There are two key objectives for this research. The first one is to expand the emotional vocabulary list, by presenting a formula, taking emotional vocabulary as the parameter, to generate a related score to each of the word. The study uses the scores to analyze each of the comments and their inconsistency, and further to provide travelers reliable opinions accordingly. The second objective is to find the related aspect term from the reviews, to project the terms to vector spaces where the study applies a clustering algorithm to group them. The aim of this step is to find a core term to represent the similar words. Therefore, when users want to check the reviews about the topics that they care about the most, they do not need to read each of the reviews thoroughly. In short, they could use the analyzed core term to find articles about this aspect, as well as analyzing the positive and negative reviews in a more detailed way regarding to each aspect.
    The research offers three methods to recognize the inconsistency in comments. The third method calculates the average score after removing the lowest scores, which makes the system reach the accuracy of 85.7%. Regarding to the aspect part, the study uses Word2vec to produce word vectors, and furthermore applies K-Means and Fuzzy C-Means to group the terms and find the core one among each group. The study results show that Fuzzy C-Means method generates the better core terms to distinguish different aspects than K-means method.

    摘要 I Abstract III 附表目錄 VIII 附圖目錄 IX 第一章 緒論 1 第一節 研究背景與動機 1 第二節 研究目的 2 第三節 論文架構 3 第二章 文獻探討 4 第一節 SemEval-2014 Task 4 4 第二節 情緒分析 6 第三節 中文斷詞系統 9 第四節 NTUSD 13 第五節 CVAW 15 第六節 分群(Clustering)演算法 16 第三章 研究方法 18 第一節 緒論 18 第二節 實驗資料 21 第三節 文章斷詞與詞性標記 25 第四節 情緒分析 26 第五節 自建語料庫 27 第六節 詞彙情感程度數值化 28 第七節 Word2vec 30 第八節 分群 31 第四章 實驗結果與分析 34 第一節 新增詞彙擴充字典 34 第二節 文本情緒數值化 35 第三節 不一致性分析 39 第四節 提取面向詞(Aspect term) 44 第五節 詞彙向量化 45 第六節 分群結果 46 第五章 結論與未來發展 53 參考文獻 54 附錄 60

    Berger, A. L., Pietra, V. J. D., & Pietra, S. A. D. (1996). A maximum entropy approach to natural language processing. Computational linguistics, 22(1), 39-71.
    Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3), 191-203.
    Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
    Chen, Y. S., Chen, L. H., & Takama, Y. (2015, November). Proposal of lda-based sentiment visualization of hotel reviews. In Data Mining Workshop (ICDMW), 2015 IEEE International Conference on (pp. 687-693). IEEE.
    Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
    Esuli, A., & Sebastiani, F. (2006). “SentiWordNet: A publicly available resource for opinion mining”. In Proceedings of the 6th international conference on Language Resources and Evaluation (LREC’06), pp.417-422.
    Kiritchenko, S., Zhu, X., Cherry, C., & Mohammad, S. (2014). NRC-Canada-2014: Detecting aspects and sentiment in customer reviews. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) (pp. 437-442).
    Ku, L. W., & Chen, H. H. (2007). Mining opinions from the Web: Beyond relevance retrieval. Journal of the Association for Information Science and Technology, 58(12), 1838-1850.
    Larose, D. T. (2005). k‐nearest neighbor algorithm. Discovering Knowledge in Data: An Introduction to Data Mining, 90-106.
    Lu, B., & Tsou, B. K. (2010, July). Combining a large sentiment lexicon and machine learning for subjectivity classification. In Proceedings of the 9th International Conference on Machine Learning and Cybernetics (ICMLC), (pp. 3311-3316).
    MacQueen, J. B. (1967). Some Methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, (pp. 281-297).
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).
    Murphy, K. P. (2006). Naive bayes classifiers. University of British Columbia, 18.
    Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S., Pavlopoulos, J. & Androutsopoulos, I. (2014). Semeval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) , pages 27–35.
    Raut, V. B., & Londhe, D. D. (2014, November). Opinion mining and summarization of hotel reviews. In Computational Intelligence and Communication Networks (CICN), 2014 International Conference on (pp. 556-559). IEEE.
    Safavian, S. R., & Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3), 660-674.
    Shi, H. X., & Li, X. J. (2011, July). A sentiment analysis model for hotel reviews based on supervised learning. In Machine Learning and Cybernetics (ICMLC), 2011 International Conference on (Vol. 3, pp. 950-954). IEEE.
    Singh, V. K., Piryani, R., Uddin, A., & Waila, P. (2013, March). Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification. In Automation, computing, communication, control and compressed sensing (iMac4s), 2013 international multi-conference on (pp. 712-717). IEEE.
    Strapparava, C., & Mihalcea, R. (2007, June). Semeval-2007 task 14: Affective text. In Proceedings of the 4th International Workshop on Semantic Evaluations (pp. 70-74). Association for Computational Linguistics.
    Tsytsarau, M., & Palpanas, T. (2012). Survey on mining subjective data on the web. Data Mining and Knowledge Discovery, 24(3), 478-514.
    Turney, P. D. (2002, July). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417-424). Association for Computational Linguistics.
    Wagner, J., Arora, P., Cortes, S., Barman, U., Bogdanova, D., Foster, J., & Tounsi, L. (2014). Dcu: Aspect-based polarity classification for semeval task 4. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)(pp. 223-229).
    Waila, P., Singh, V. K., & Singh, M. K. (2012, December). Evaluating machine learning and unsupervised semantic orientation approaches for sentiment analysis of textual reviews. In Computational Intelligence & Computing Research (ICCIC), 2012 IEEE International Conference on (pp. 1-6). IEEE.
    Yu, L. C., Lee, L. H., Hao, S., Wang, J., He, Y., Hu, J., Lai, K., & Zhang, X. (2016). Building Chinese affective resources in valence-arousal dimensions. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 540-545).
    中研院中文斷詞系統:http://ckipsvr.iis.sinica.edu.tw/
    中研院中英雙語知識本體詞網(BOW):http://bow.ling.sinica.edu.tw/
    李孟潔.(2009)。利用機器學習作法之中文意見分析,清華大學資訊工程學系學位論文。
    林彤.(2017)。分析旅遊評論中之極性不一致性問題(未出版之碩士論文),臺灣師範大學資訊工程學系,碩士論文。
    邱鴻達.(2010)。意見探勘在中文電影評論之應用,國立交通大學資訊科學與工程研究所碩士論文。
    楊登堯.(2017)。利用臉書資訊探討網路新聞的吸引度及極性分析,臺灣師範大學資訊工程學系,碩士論文。
    謝鎮宇.(2010)。意見探勘在中文評鑑語料之應用,交通大學資訊學院碩士在職專班資訊組學位論文。

    無法下載圖示 本全文未授權公開
    QR CODE