研究生: |
李昇陽 |
---|---|
論文名稱: |
利用詞性與詞權重分析中文意見之研究 |
指導教授: | 侯文娟 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 中文 |
論文頁數: | 39 |
中文關鍵詞: | 自然語言處理 、意見判別 、資訊檢索 |
英文關鍵詞: | NLP, Information Retrival, opinion |
論文種類: | 學術論文 |
相關次數: | 點閱:173 下載:10 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在資訊爆炸的時代中,我們可以很容易搜尋到大眾的想法與心得,如何快速擷取這些寶貴的資訊,變成了一個重要的課題。目前關於這方面的研究已經慢慢興起,也有許多大型會議提供進行評比的競賽主題,我們希望提出意見的模型,使得進行意見存在與否方面的研究能有所助益。
本篇研究根據主題中的查詢字串找出包含意見的文件,在探討詞的各種權重方面,我們首先對每份文件進行斷詞,並根據查詢主題計算詞的PMI值,然後我們以文件內主題相關詞彙的PMI跟BM25屬性得到關於主題相關度的分數。也根據文件內意見相關詞彙權重與距離權重結合主題相關分數得到意見相關分數;在探討詞性方面,我們在流程中加入主題相關詞彙需經過名詞過濾,意見相關詞彙除了原本的詞典外,也加入了高PMI且詞性為不及物動詞的的詞來擴充。
1. 林宏達,“維基資料量比《大英百科》多七倍”,商業周刊960期, http://www.businessweekly.com.tw/webfineprint.php?id=22395
2. S.M. Kim and E. Hovy. “Determining the sentiment of opinions.” Proceedings of the COLING conference, pp.1367-1374 , 2004
3. Chen, K.J. & S.H. Liu, "Word Identification for Mandarin Chinese Sentences," Proceedings of COLING 1992, pages 101-107
4. C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web track. In Proceedings of the 18th Text REtrieval Conference (TREC 2009), 2010.
5. Yohei Seki, David Kirk Evans, Lun-Wei Ku, Le Sun, Hsin Hsi Chen, and Noriko Kando. 2008. Overview of multilingual opinion analysis task at ntcir-7. In Proceedings of The 7th NTCIR Workshop (2007/2008) - Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access.
6. Janyce Wiebe, E. Breck, Christopher Buckley, Claire Cardie, P. Davis, B. Fraser, Diane Litman, D. Pierce, Ellen Riloff, Theresa Wilson, D. Day, and Mark Maybury. 2003. Rec-ognizing and organizing opinions expressed in the world press. In Proceedings of the 2003 AAAI Spring Symposium on New Directions in Question Answering.
7. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 79–86.
8. Yeha Lee, Seung-Hoon Na, Jungi Kim, Sang-Hyob Nam, Hun young Jung, and Jong-Hyeok Lee. 2008. Kle at trec 2008 blog track: Blog post and feed retrieval. In Proceedings of TREC-08.
9. Andrea Esuli and Fabrizio Sebastiani. 2006. Sentwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC 2006, pages 417–422.
10. Janyce M. Wiebe, Theresa Wilson, Rebecca Bruce, Matthew Bell, and Melanie Martin. 2004. Learning subjective language. Computational Linguistics, 30(3):277–308, September
11. Jungi Kim, Jin-Ji Li, and Jong-Hyeok Lee. 2009. Discovering the discriminative views: Measuring term weights for sentiment analysis. In Proc. of ACL-IJCNLP.
12. L.-W. Ku, Y.-T. Liang and H.-H. Chen. Opinion Extraction, Summarization and Tracking in News and Blog Corpora. In Proc. of the AAAI-CAAW'06, 2006.
13. John Lafferty and Chengxiang Zhai. 2001.Document language models, query models, and risk minimization for information retrieval. In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 111–119, New York, NY, USA. ACM.
14. Seung-Hoon Na, In-Su Kang, Yeha Lee, and Jong-Hyeok Lee. 2008b. Applying complete-arbitrary passage for pseudo-relevance feedback in language modeling approach. In AIRS ’08, pages 626–631.
15. Furuse,O.,Hiroshima,N.,Yamada,S.,& Kataoka, R.(2007).Opinion sentence search engine on open-domain blog. Proc.of 20th Int. Joint Conf. of Artificial Intelligence (IJCAI2007).
16. D. Kushal, S. Lawrence, and D. Pennock. 2003. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In WWW.