簡易檢索 / 詳目顯示

研究生: 陳傳生
Chen, Chuan-Sheng
論文名稱: 使用廣義知網於情感詞彙之極性分析研究
Polarity Analysis of Sentiment Vocabulary Using E-HowNet
指導教授: 侯文娟
Hou, Wen-Juan
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 99
中文關鍵詞: 自然語言處理情緒分析中文處理廣義知網情感詞典
英文關鍵詞: NLP, sentiment analysis, Chinese parser, E-HowNet, semantic dictionary
DOI URL: https://doi.org/10.6345/NTNU202203534
論文種類: 學術論文
相關次數: 點閱:175下載:23
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年隨著網路的快速發展,我們可以根據自己的需求,很方便的找到各式各樣相關的資料。在消費前,人們往往習慣於收集評論和分析做為參考;而評論中出現的情感詞彙更是影響使用者看法的指標。採用人工的方式找出意見詞彙,雖然準確度高,卻相當耗費時間和人力,更永遠不可能趕上資訊產生的速度。

    在此本論文提出一種非監督的方法,過程不需要人工的介入。主要目的是分析電影領域的評論文章,從中找出帶有情感的詞彙,並給予極性。本論文分兩大部分處理此問題,第一部分透過中文的語法規則找出情感詞彙可能出現的位置,收集這些位置出現的詞彙做為種子,接著透過廣義知網進行擴充。本研究統計廣義知網對部分詞彙情緒標記的正負數目,給予一個類別中的成員相同的極性。

    在第二部分中,針對國立臺灣大學情緒詞詞典NTUSD(舊版)進行斷詞分析,再次透過廣義知網擴充,找出可能的情緒詞彙。對於無法由廣義知網部分詞彙的情緒標記而得到極性的詞彙,和NTUSD(舊版)進行完全比對,試著納入更多的擴充詞彙。最後利用前幾步驟中得到的類別整體極性,為帶有複雜概念結構的詞彙分類極性。

    結合兩部分後,本研究以人工方式選出的980個情緒詞彙做測試,實驗結果顯示可以得到92.48%的正確率。

    The sentiment vocabularies are the most powerful key point which can influence user’s opinion in commends. It is very time-wasted and costs people lots of efforts to manually make the polarity classification. Besides, it is impossible for us to catch the speed of information produced in the World Wide Web.

    The thesis proposes an unsupervised method to deal with the problem of the polarity classification. The goal is to analyze the commends in the movie domain, to find the sentiment vocabularies, and to classify them with the polarity. The research consists of two main parts. In the first part, the Chinese syntactic rules are built to find the positions where the sentiment vocabularies may appear. The vocabularies in the positions are collected as the seeds, and then E-HowNet is utilized to expand the sentiment vocabularies.

    In the second part, the terms in NTUSD are segmented and served as seeds, and E-HowNet is employed subsequently. The terms in NTUSD are used to determine the polarity of the words which can't be classified in the preceding steps. At last, we use the polarity of the class to classify the structural words in E-HowNet.

    Combining with the two parts, there are 980 sentiment vocabularies chosen as the test data in a man-made fashion. The result shows a good performance of 92.48% accuracy.

    摘要 i Abstract ii 目錄 vi 附表目錄 viii 附圖目錄 xii 第一章 緒論 1 第一節 研究動機 1 第二節 論文架構 2 第二章 相關研究探討 3 第一節 情緒語意分析 3 第二節 中文斷詞系統 7 第三節 知網 10 第四節 廣義知網 11 第五節 NTUSD 14 第三章 研究方法 16 第一節 緒論 16 第二節 實驗資料 20 第三節 種子詞彙的選取 24 第四節 種子詞彙的擴充(同類別) 29 第五節 種子詞彙的擴充(同階層) 33 第六節 利用NTUSD的擴充 37 第七節 雙極性問題、已收錄但無法分類極性的詞彙問題 40 第八節 廣義知網中帶有結構的詞彙之極性分類 43 第四章 實驗結果與分析 49 第一節 選取種子詞彙的分析 49 第二節 種子詞彙的擴充(同類別)的分析 56 第三節 種子詞彙的擴充(同階層)的分析 61 第四節 引入NTUSD做斷詞並擴充的分析 67 第五節 雙極性問題、已收錄但無法分類極性的詞彙問題的分析 76 第六節 廣義知網中結構型詞彙分類極性的分析 80 第七節 錯誤分析 87 第五章 結論與未來展望 92 第一節 結論 92 第二節 未來展望 93 參考文獻 95

    Altman, N.S. (1992). "An Introduction to Kernel and Nearest-neighbor Nonparametric Regression," The American Statistician, 46(3), pp. 175-185.

    Carletta, J. (1996). "Assessing Agreement on Classification Tasks: the Kappa Statistic," Computational linguistics, 22(2), pp. 249-254.

    Dong, Z., Dong, Q., and Hao, C. (2010). "Hownet and Its Computation of Meaning," Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pp. 53-56.

    Esuli, A., Sebastiani, F. (2006). "Sentiwordnet: A Publicly Available Lexical Resource for Opinion Mining," Proceedings of LREC, Vol. 6, pp. 417-422.

    John, G.H. and Langley, P. (1995). "Estimating Continuous Distributions in Bayesian Classifiers," Proceedings of the Eleventh conference on Uncertainty in Artificial Intelligence, pp. 338-345.

    Koncz, P., Paralic, J. (2011). "An Approach to Feature Selection for Sentiment Analysis," Proceedings of the 15th IEEE International Conference on Intelligent Engineering Systems (IES), pp. 357-362, Poprad, June 23-15, 2011.

    Ku, L.W. and Chen, H.H. (2007). "Mining Opinions from the Web: Beyond Relevance Retrieval," Journal of American Society for Information Science and Technology, volume 58(12), pp.1838-1850, Special Issue on Mining Web Resources for Enhancing Information Retrieval.

    Manning, C. D., Schütze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.

    Mouthami, K., Devi, K.N., Bhaskaran, V.M. (2013). "Sentiment Analysis and Classification based on Textual Reviews," Proceedings of 2013 International Conference on Information Communication and Embedded Systems (ICICES), pp. 271-276, Chennai, Feb. 21-22, 2013.

    Parkhe, V., Biswas, B. (2014). "Aspect based Sentiment Analysis of Movie Reviews: Finding the Polarity Directing Aspects," Proceedings of the 2014 International Conference on Soft Computing and Machine Intelligence (ISCMI), pp. 28-32, New Delhi, Sept. 26-27, 2014.

    Singh, V.K., Piryani, R., Uddin, A., Waila, P. (2013a). "Sentiment Analysis of Movie Reviews and Blog Posts," Proceedings of 2012 IEEE 3rd International Advanced Computing Conference (IACC), pp. 893-898, Ghaziabad, Feb. 22-23, 2013.

    Singh, V.K., Piryani, R., Uddin, A., Waila, P. (2013b). "Sentiment Analysis of Movie Reviews: A new Feature-based Heuristic for Aspect-Level Sentiment Classification," Proceedings of 2013 International Multi-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), pp. 712-717, Kottayam, March 22-23, 2013.

    Trindade, L., Wang, H. Blackburn, W., Rooney, N. (2013). "Effective Sentiment Classification based on Words and Word Senses," Proceedings of the 2013 International Conference on (Volume:01 ) Machine Learning and Cybernetics (ICMLC), pp. 277 - 284, Tianjin, July 14-17, 2013.

    Vapnik, N.V. (1995). The Nature of Statistical Learning Theory. Springer.

    中文斷詞系統,中文詞知識庫小組,中央研究院,http://ckipsvr.iis.sinica.edu.tw/

    朱嫣岚, 闵锦, 周雅倩, 黄萱菁, 吴立德. "基于 HowNet 的词汇语义倾向计算," 中文信息学报, 20(1), pp.14-20, 2006

    刘群, 李素建. "基于《知网》的词汇语义相似度计算," 中文计算语言学 7.2, pp.59-76, 2002

    李政儒、游基鑫、陳信希,2012, "廣義知網詞彙意見極性的預測 Predicting the Semantic Orientation of Terms in E-HowNet", Computational Linguistics and Chinese Language Processing, Vol. 17, No. 2, pp.21-36, 2012。

    邱鴻達,2011,"意見探勘在中文電影評論之應用",國立交通大學資訊科學與工程研究所碩士論文。

    陳立,2010,"中文情緒語意自動分類之研究",國立臺灣師範大學資訊工程所碩士論文。

    陳昱年,2013,"電影評論中情感詞彙之極性分析",國立臺灣師範大學資訊工程所碩士論文。

    張莊平,2012,"中文文法剖析應用於電影評論之意見情緒分析",國立臺灣師範大學資訊工程所碩士論文。

    董振东,董强(1999),"知网",http://www.keenage.com/html/c_index.html

    廣義知網,中文詞知識庫小組,中央研究院,http://ehownet.iis.sinica.edu.tw/

    下載圖示
    QR CODE