研究生: |
陳傳生 Chen, Chuan-Sheng |
---|---|
論文名稱: |
使用廣義知網於情感詞彙之極性分析研究 Polarity Analysis of Sentiment Vocabulary Using E-HowNet |
指導教授: |
侯文娟
Hou, Wen-Juan |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 中文 |
論文頁數: | 99 |
中文關鍵詞: | 自然語言處理 、情緒分析 、中文處理 、廣義知網 、情感詞典 |
英文關鍵詞: | NLP, sentiment analysis, Chinese parser, E-HowNet, semantic dictionary |
DOI URL: | https://doi.org/10.6345/NTNU202203534 |
論文種類: | 學術論文 |
相關次數: | 點閱:234 下載:23 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年隨著網路的快速發展,我們可以根據自己的需求,很方便的找到各式各樣相關的資料。在消費前,人們往往習慣於收集評論和分析做為參考;而評論中出現的情感詞彙更是影響使用者看法的指標。採用人工的方式找出意見詞彙,雖然準確度高,卻相當耗費時間和人力,更永遠不可能趕上資訊產生的速度。
在此本論文提出一種非監督的方法,過程不需要人工的介入。主要目的是分析電影領域的評論文章,從中找出帶有情感的詞彙,並給予極性。本論文分兩大部分處理此問題,第一部分透過中文的語法規則找出情感詞彙可能出現的位置,收集這些位置出現的詞彙做為種子,接著透過廣義知網進行擴充。本研究統計廣義知網對部分詞彙情緒標記的正負數目,給予一個類別中的成員相同的極性。
在第二部分中,針對國立臺灣大學情緒詞詞典NTUSD(舊版)進行斷詞分析,再次透過廣義知網擴充,找出可能的情緒詞彙。對於無法由廣義知網部分詞彙的情緒標記而得到極性的詞彙,和NTUSD(舊版)進行完全比對,試著納入更多的擴充詞彙。最後利用前幾步驟中得到的類別整體極性,為帶有複雜概念結構的詞彙分類極性。
結合兩部分後,本研究以人工方式選出的980個情緒詞彙做測試,實驗結果顯示可以得到92.48%的正確率。
The sentiment vocabularies are the most powerful key point which can influence user’s opinion in commends. It is very time-wasted and costs people lots of efforts to manually make the polarity classification. Besides, it is impossible for us to catch the speed of information produced in the World Wide Web.
The thesis proposes an unsupervised method to deal with the problem of the polarity classification. The goal is to analyze the commends in the movie domain, to find the sentiment vocabularies, and to classify them with the polarity. The research consists of two main parts. In the first part, the Chinese syntactic rules are built to find the positions where the sentiment vocabularies may appear. The vocabularies in the positions are collected as the seeds, and then E-HowNet is utilized to expand the sentiment vocabularies.
In the second part, the terms in NTUSD are segmented and served as seeds, and E-HowNet is employed subsequently. The terms in NTUSD are used to determine the polarity of the words which can't be classified in the preceding steps. At last, we use the polarity of the class to classify the structural words in E-HowNet.
Combining with the two parts, there are 980 sentiment vocabularies chosen as the test data in a man-made fashion. The result shows a good performance of 92.48% accuracy.
Altman, N.S. (1992). "An Introduction to Kernel and Nearest-neighbor Nonparametric Regression," The American Statistician, 46(3), pp. 175-185.
Carletta, J. (1996). "Assessing Agreement on Classification Tasks: the Kappa Statistic," Computational linguistics, 22(2), pp. 249-254.
Dong, Z., Dong, Q., and Hao, C. (2010). "Hownet and Its Computation of Meaning," Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pp. 53-56.
Esuli, A., Sebastiani, F. (2006). "Sentiwordnet: A Publicly Available Lexical Resource for Opinion Mining," Proceedings of LREC, Vol. 6, pp. 417-422.
John, G.H. and Langley, P. (1995). "Estimating Continuous Distributions in Bayesian Classifiers," Proceedings of the Eleventh conference on Uncertainty in Artificial Intelligence, pp. 338-345.
Koncz, P., Paralic, J. (2011). "An Approach to Feature Selection for Sentiment Analysis," Proceedings of the 15th IEEE International Conference on Intelligent Engineering Systems (IES), pp. 357-362, Poprad, June 23-15, 2011.
Ku, L.W. and Chen, H.H. (2007). "Mining Opinions from the Web: Beyond Relevance Retrieval," Journal of American Society for Information Science and Technology, volume 58(12), pp.1838-1850, Special Issue on Mining Web Resources for Enhancing Information Retrieval.
Manning, C. D., Schütze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.
Mouthami, K., Devi, K.N., Bhaskaran, V.M. (2013). "Sentiment Analysis and Classification based on Textual Reviews," Proceedings of 2013 International Conference on Information Communication and Embedded Systems (ICICES), pp. 271-276, Chennai, Feb. 21-22, 2013.
Parkhe, V., Biswas, B. (2014). "Aspect based Sentiment Analysis of Movie Reviews: Finding the Polarity Directing Aspects," Proceedings of the 2014 International Conference on Soft Computing and Machine Intelligence (ISCMI), pp. 28-32, New Delhi, Sept. 26-27, 2014.
Singh, V.K., Piryani, R., Uddin, A., Waila, P. (2013a). "Sentiment Analysis of Movie Reviews and Blog Posts," Proceedings of 2012 IEEE 3rd International Advanced Computing Conference (IACC), pp. 893-898, Ghaziabad, Feb. 22-23, 2013.
Singh, V.K., Piryani, R., Uddin, A., Waila, P. (2013b). "Sentiment Analysis of Movie Reviews: A new Feature-based Heuristic for Aspect-Level Sentiment Classification," Proceedings of 2013 International Multi-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), pp. 712-717, Kottayam, March 22-23, 2013.
Trindade, L., Wang, H. Blackburn, W., Rooney, N. (2013). "Effective Sentiment Classification based on Words and Word Senses," Proceedings of the 2013 International Conference on (Volume:01 ) Machine Learning and Cybernetics (ICMLC), pp. 277 - 284, Tianjin, July 14-17, 2013.
Vapnik, N.V. (1995). The Nature of Statistical Learning Theory. Springer.
中文斷詞系統,中文詞知識庫小組,中央研究院,http://ckipsvr.iis.sinica.edu.tw/
朱嫣岚, 闵锦, 周雅倩, 黄萱菁, 吴立德. "基于 HowNet 的词汇语义倾向计算," 中文信息学报, 20(1), pp.14-20, 2006
刘群, 李素建. "基于《知网》的词汇语义相似度计算," 中文计算语言学 7.2, pp.59-76, 2002
李政儒、游基鑫、陳信希,2012, "廣義知網詞彙意見極性的預測 Predicting the Semantic Orientation of Terms in E-HowNet", Computational Linguistics and Chinese Language Processing, Vol. 17, No. 2, pp.21-36, 2012。
邱鴻達,2011,"意見探勘在中文電影評論之應用",國立交通大學資訊科學與工程研究所碩士論文。
陳立,2010,"中文情緒語意自動分類之研究",國立臺灣師範大學資訊工程所碩士論文。
陳昱年,2013,"電影評論中情感詞彙之極性分析",國立臺灣師範大學資訊工程所碩士論文。
張莊平,2012,"中文文法剖析應用於電影評論之意見情緒分析",國立臺灣師範大學資訊工程所碩士論文。
董振东,董强(1999),"知网",http://www.keenage.com/html/c_index.html
廣義知網,中文詞知識庫小組,中央研究院,http://ehownet.iis.sinica.edu.tw/