簡易檢索 / 詳目顯示

研究生: 江宜勳
Chiang, I-Hsun
論文名稱: 利用剖析樹結構探討論壇評論之特徵與意見詞配對關係
Using Parse Tree Structures for Mining Matching Relationships between Features and Opinion Words from Forum Reviews
指導教授: 侯文娟
Hou, Wen-Juan
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 99
中文關鍵詞: 意見探勘剖析樹結構論壇評論PVC人形模型
英文關鍵詞: opinion mining, parse tree structure, forum reviews, PVC figure model
DOI URL: https://doi.org/10.6345/NTNU202202823
論文種類: 學術論文
相關次數: 點閱:124下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著網際網路的蓬勃發展,人們的消費習慣逐漸傾向網路購物,然而在尚未見到實體的情況下,往往會被官方「美好」的商品照片及描述所矇蔽,因為官方往往帶有主觀的推銷目的而不會將產品真正的優劣寫出來,故網友的評論就具有很大的參考價值,這也是本研究進行「分析評論」以達成產品推薦的主要原因。
    本研究從巴哈姆特論壇中找尋該產品的相關評論,利用中研院剖析器逐一進行分析,從中找到標記為Head Na系列之詞彙 (本研究稱為特徵詞)及標記為VH、A系列之詞彙(本研究稱為意見詞),由於網路評論大多為非正式中文,故在語料庫之擷取上本論文秉持著只要有一個特徵詞或是意見詞就採納。利用投票的方式建構出特徵詞的資料庫,意見詞資料庫的建構部分則是與台大的情緒字典(NTUSD)比對,並利用物以類聚法、教育部重編字典和人工標記等方式加以補充,建構好之資料庫可用於處裡分群及给定分數等工作,並利用Aspect Based Semantic Analysis (ABSA)的核心概念,藉由剖析樹進行特徵及意見詞的配對。在輸出方面會提供使用者該產品的各項評論之特徵、意見詞、意見詞的情感分數、特徵及意見詞之配對及整體產品的分數等,以期提供評論之重要資訊給使用者。
    本論文的最後的實驗數據在特徵詞分群上有著81.8%的正確率、意見詞的分群上有著87.71%的正確率,特徵詞語意見詞之配對正確率有著87.13%,而最後與日本亞馬遜的推薦與否在星等上有著90%的相似度,IDF值上有著70%的相似度。

    As the development of Internet, people’s consumption habits grow to tend to shopping in the online shop. However, we are usually deceived by the ‘beautiful pictures and words’ without seeing the real items. We analyze the comments which were written by netizens in the forum to avoid the manufacturer’s marketing purpose that makes us confusion that which advantages are right. This is the reason why we choose to explore the forum comments in the study.
    In the thesis, the study retrieve the comments in ‘Bahamūt Forum’ and then parse the reviews by CKIP(Chinese Knowledge Information Processing) parser. We extract the words with tags ‘Head Na’ as the features words, and extract the words with tags ‘VH’ or ‘A’ as the opinion words. The comments in the forum are usually unofficial, so the sentences are maybe not complete. Thus, if the sentence has one of features words or opinion words, the system will extract it. The study uses the majority vote strategy to construct the Feature_Words_Database, the Opinion_Words_Database is constructed by NTUSD, the distance from Positive_Words to Negative_Words, and the dictionary revised by the Ministry of Education. These databases are used for classification and scoring tasks. Based on the concept of ABSA(Aspect Based Semantic Anlysis), a pair of the feature word and opion word is generated. The output includes the information of feature words, opinion words, the score of the production and the pair of feature words and opinion words that can be offered to users for their reference.
    The experiments show the precision of feature word classification is 87.71% and opinion words classification is 81.8%. The precision of pair matching is 87.13%. Finally, the similarity of stars between the system and amazon.jp is 90%, and the similarity of IDF number between the system and amazon.jp is 70%.

    第一章 緒論 1 第一節 研究動機 1 第二節 論文架構 3 第二章 文獻探討 4 第一節 SemEval-2015 Task 12(ABSA) 4 第二節 中文剖析系統 8 第三節 NTUSD 12 第四節 教育部重編中文字典 13 第三章 方法與步驟 15 第一節 緒論 15 第二節 實驗資料 19 第三節 特徵詞彙的選取 22 第四節 意見詞詞彙的選取 33 第五節 配對 40 第六節 給分機制 48 第四章 實驗結果與分析 50 第一節 特徵詞之分析與討論 50 第二節 意見詞之分析與討論 60 第三節 配對方法之分析與討論 71 第四節 給分機制之分析與討論 77 第五章 結論 86 第一節 摘要總結 86 第二節 未來展望 87 參考文獻 88 附錄 93

    Agarwal, Basant, and Namita Mittal. Categorical probability proportion difference (CPPD): A feature selection method for sentiment classification. Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2012), COLING. 2012.

    Agerri, Rodrigo and Bermudez, Josu, and Rigau, German. 2014. Ixa pipeline: Efficient and ready to use multilingual nlp tools. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), pages 26–31, Reykjavik, Iceland, May.

    ALTER(アルター):https://alter-web.jp/

    Amazon.cp.jp:https://www.amazon.co.jp/

    Baccianella, S. and Esuli, A. and Sebastiani, F. 2010. Senti- WordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Seventh conference on International Language Resources and Evaluation (LREC-2010), Malta., volume 25.

    Brown, Peter F and Desouza, Peter V and Mercer, Robert L and Vincent Pietra, J Della and Lai, Jenifer C. 1992. Classbased n-gram models of natural language. Computational linguistics, 18(4):467–479. Rodrigo Agerri, Josu Bermudez, and German Rigau. 2014. Ixa pipeline: Efficient and ready to use multilingual nlp tools. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), pages 26–31, Reykjavik, Iceland, May.

    Carletta, J. (1996). "Assessing Agreement on Classification Tasks: the Kappa Statistic," Computational linguistics, 22(2), pp. 249-254.

    Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1) (1990) 22–29

    Clark, Alexander. 2003. Combining distributional and morphological information for part of speech induction. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume 1, pages 59–66.

    cLayz(クレイズ):http://clayz-online.com/

    De Clercq, O., Van de Kauter, M., Lefever, E., & Hoste, V. (2015). Applying hybrid terminology extraction to aspect-based sentiment analysis. In International Workshop on Semantic Evaluation (SemEval 2015) (pp. 719-724). Association for Computational Linguistics.

    Garcıa-Pablos, A., Cuadros, M., & Rigau, G. (2015). V3: unsupervised aspect based sentiment analysis for SemEval-2015 Task 12. SemEval-2015, 714–718.

    goo辞書:https://dictionary.goo.ne.jp/

    GSC(GOOD SMILE COMPANY):http://www.goodsmile.info/zh/

    Hall, Mark and Frank, Eibe and Holmes, Geoffrey and Pfahringer, Bernhard and Peter Reutemann and Ian H. Witten. 2009. The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1):10–18, november.

    Hu, M. and Liu, B. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168–177.

    Jiménez-Zafra, S. M., Martınez-Cámara, E., Martın-Valdivia, M. T., & Urena-López, L. A. (2015). SINAI: Syntactic approach for Aspect Based Sentiment Analysis. SemEval-2015, 730–735.

    Koppula, A. R., Pallelra, R. R., Repaka, R., & Movva, V. S. (2015). UMDuluth-CS8761-12: A Novel Machine Learning Approach for Aspect Based Sentiment Analysis. SemEval-2015, 742–747.

    KOTOBUKIYA | 株式会社 壽屋 コトブキヤ:http://www.kotobukiya.co.jp/

    Ku, L.-W. and Chen, H.-H. 2007. Mining Opinions from the Web: Beyond Relevance Retrieval. Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 58(12), 1838-1850.

    Liu, Bing and Hu, Minqing and Cheng, Junsheng. 2005. Opinion Observer: Analyzing and Comparing Opinions on the Web. In Proceedings of the 14th International World Wide Web conference (WWW-2005). Chiba, Japan.
    Liu, Kang and Xu, Liheng and Zhao, Jun 2014. Extracting Opinion Targets and Opinion Words from Online Reviews with Graph Co-ranking Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics

    Liu, L., Lei, M., & Wang, H. (2013). Combining domain-specific sentiment lexicon with hownet for chinese sentiment analysis. Journal of Computers, 8(4), 878-883.

    Lu, Bin and Ott, Myle and Cardie, Claire, and Tsou, Benjamin K. 2011. Multi-aspect sentiment analysis with topic models. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, pages 81–88. IEEE.

    McCallum, Andrew Kachites. 2002. MALLET: A Machine Learning for Language Toolkit.

    Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Jeff. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111–3119.

    Miller, George A. 1995. Wordnet: a lexical database forenglish. Communications of the ACM, 38(11):39–41.

    Nielsen, Finn A° rup. 2011. A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs. In Proceedings, 1st Workshop on Making Sense of Microposts (#MSM2011): Big things come in small packages. pp: 93-98. Greece.

    Pontiki M., Galanis D., Papageorgiou H., Manandhar S., & Androutsopoulos I.(2015, June). Semeval-2015 task 12: Aspect Based Sentiment Analysis. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 486-495).

    PTT:https://www.ptt.cc/bbs/hotboards.html

    Saias, J. (2015, June). Sentiue: Target and aspect based sentiment analysis in semeval-2015 task 12. Association for Computational Linguistics.

    San Vicente, I., Saralegi, X., Agerri, R., & Sebastián, D. S. (2015, June). Elixa: A modular and flexible absa platform. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) (pp. 748-752).

    SIGLEX(Special Interest Group on the Lexicon):http://alt.qcri.org/semeval2015/

    Stone P. and Dunphy, D. and Smith, M. and Ogilvie, D. 1966. The General Inquirer: A Computer Approach to Content Analysis. Cambridge (MA): MIT Press.

    Weblio日中中日辞典:http://cjjc.weblio.jp/

    Wilson, Theresa and Wiebe, Janyce and Hoffmann, Paul. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ’05, pages 347–354, Stroudsburg, PA, USA.

    ixa-pipe-nerc Named Entity Recognition system. Available from:https://github.com/ixa-ehu/ixa-pipe-nerc

    中研院中文剖析系統:http://parser.iis.sinica.edu.tw/

    中研院中文斷詞系統:http://ckipsvr.iis.sinica.edu.tw/

    分群範例 【風華の開箱】ALTER 未聞花名 本間芽衣子:https://forum.gamer.com.tw/Co.php?bsn=60036&sn=237462

    巴哈姆特電玩資訊站:https://www.gamer.com.tw/

    伊莉討論區:http://www68.eyny.com/index.php

    國家教育研究院,雙語詞彙、學術名詞暨辭書資訊網:http://terms.naer.edu.tw/

    張莊平,2012,“中文文法剖析應用於電影評論之意見情感分類”,國立師範大學資訊工程研究所碩士論文。

    陳昱年,2013,“電影評論中情感詞彙之極性分析”,國立師範大學資訊工程研究所碩士論文。

    陳傳生,2014“使用廣義知網於情感詞彙之極性分析研究”,國立師範大學資訊工程研究所碩士論文。

    臉書社團「PVC_Figure人型討論分享社」:https://www.facebook.com/groups/figure.hot/

    下載圖示
    QR CODE