研究生: |
張庭瑋 Chang, Ting-Wei |
---|---|
論文名稱: |
基於標籤類別的權重之情感分析分類器 Label-based Supervised Term Weighting for Sentiment Analysis |
指導教授: |
蔡碧紋
Tsai, Pi-Wen |
口試委員: |
蔡碧紋
Tsai, Pi-Wen 丘政民 Chiu, Jeng-Min 呂翠珊 Lu, Tsui-Shan |
口試日期: | 2024/06/05 |
學位類別: |
碩士 Master |
系所名稱: |
數學系 Department of Mathematics |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 26 |
中文關鍵詞: | 情感分析 、單純貝氏分類器 、監督式權重調整 |
英文關鍵詞: | sentiment analysis, Naive Bayes, Supervised term weighting |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202400544 |
論文種類: | 學術論文 |
相關次數: | 點閱:168 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
情感分析是自然語言處理的一個子領域,目的是依據文章中表達的正面或負面情感將文章分類。 多項式單純貝氏分類器、補集單純貝氏分類器和支援向量機是情緒分析中常用的三種方法。 為了改善這些分類器的結果,有許多監督/非監督術語權重方法可以用來輔助,這些方法會依據每個字在所有文章中的分佈情況給予不同的權重。 本論文提出了一種基於標籤的監督式術語權重來進一步改進這些分類器,此外,我們也提出使用 AFINN 字典將文字轉換到較低維度的情感特性來進行情感分析,避免過高維度帶來的龐大的計算量。我們分別用F1 分數、ROC 曲線和曲線下面積 (AUC)來比較我們所提的權重調整方法是否能幫助分類器有更好的表現。
Sentiment analysis is a subfield of natural language processing that aims to determine the sentiment expressed in textual materials. Multinomial Naive Bayes, Complement Naive Bayes, and Support Vector Machine are three popular methods in sentiment analysis to classify documents into positive or negative categories. Some supervised/ unsupervised term weight methods have been developed to adjust the corresponding weight of a word. In this thesis, we propose a label-based supervised term weighting which considers the sentiment labels (positive or negative) of a document not only when computing the adjusted term weight but also when applying these weights to the whole data. By doing so, more sentiment information can be captured. Additionally, we propose using AFINN lexicon along with these adjusted term weights to further improve the classifiers. Applications of our methods to three data sets are presented and their corresponding F1-score, ROC curve and AUC are given.
Aung, K. Z., and Myo, N. N. Sentiment analysis of students’ comment using lexicon based approach. In 2017 IEEE/ACIS 16th international conference on computer and information science (ICIS) (2017), IEEE, pp. 149–154.
Brooke, J., Tofiloski, M., and Taboada, M. Cross-linguistic sentiment analysis: From english to spanish. In Proceedings of the international conference RANLP-2009 (2009), pp. 50–54.
Christopher, D., Raghavan, P., Schutze, H., et al. ¨ Scoring term weighting and the vector space model. Introduction to information retrieval 100 (2008), 2–4.
Dewi, C., and Chen, R.-C. Complement naive bayes classifier for sentiment analysis of internet movie database. In Asian Conference on Intelligent Information and Database Systems (2022), Springer, pp. 81–93.
Fawcett, T. An introduction to roc analysis. Pattern recognition letters 27, 8 (2006), 861–874.
Grobelnik, M. Feature selection for unbalanced class distribution and naive bayes. In ICML ‘99: Proceedings of the sixteenth international conference on machine learning (1999), Citeseer, pp. 258–267.
Hanley, J. A., and McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143, 1 (1982), 29–36.
Joachims, T. Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning (1998), Springer, pp. 137–142.
Lan, M., Tan, C. L., Su, J., and Lu, Y. Supervised and traditional term weighting methods for automatic text categorization. IEEE transactions on pattern analysis and machine intelligence 31, 4 (2008), 721–735.
Liu, B. Sentiment analysis and opinion mining. Springer Nature, 2022.
Liu, Y., Loh, H. T., and Sun, A. Imbalanced text classification: A term weighting approach. Expert systems with Applications 36, 1 (2009), 690–701.
McCallum, A., Nigam, K., et al. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization (1998), vol. 752, Madison, WI, pp. 41–48.
Nielsen, F. ˚A. A new anew: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903 (2011).
Pang, B., and Lee, L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. arXiv preprint cs/0409058 (2004).
Pang, B., Lee, L., et al. Opinion mining and sentiment analysis. Foundations and Trends in information retrieval 2, 1–2 (2008), 1–135.
Rennie, J. D., Shih, L., Teevan, J., and Karger, D. R. Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the 20th international conference on machine learning (ICML-03) (2003), pp. 616–623.
Sparck Jones, K. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28, 1 (1972), 11–21.
Tseng, K.-H., Lin, C.-H. R., Liu, J.-S., Huang, C.-M. A., and Wang, Y.-H. A study on text classification: Term weighting algorithm analysis. Journal of Internet Technology 22, 2 (2021), 311–325.
Van Rijsbergen, C. Information retrieval: theory and practice. In Proceedings of the joint IBM/University of Newcastle upon tyne seminar on data base systems (1979), vol. 79, pp. 1–14.
Wiebe, J., and Riloff, E. Creating subjective and objective sentence classifiers from unannotated texts. In International conference on intelligent text processing and computational linguistics (2005), Springer, pp. 486–497.
Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., and Liu, B. Combining lexicon-based and learning-based methods for twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011 89 (2011), 1–8.
Zhang, X., Fuehres, H., and Gloor, P. A. Predicting stock market indicators through twitter “i hope it is not as bad as i fear”. Procedia-Social and Behavioral Sciences 26 (2011), 55–62.