研究生: |
戴衣菱 Yi-Ling Tai |
---|---|
論文名稱: |
多個專有詞彙概念解釋句語意關連自動分析組織之研究 Semantic Association Analysis for Organizing Related Sentences of Multiple Domain-Specific Terms |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 中文 |
論文頁數: | 71 |
中文關鍵詞: | 資料探勘 、資訊檢索 、句子分群 、自動摘要 |
英文關鍵詞: | Data Mining, Information Retrieval, Sentence Clustering, Automatic Summarization |
論文種類: | 學術論文 |
相關次數: | 點閱:122 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文研究以電子書作為內容來源,針對兩個特定領域專有詞彙的概念解釋句,進行自動擷取以及分群組織整理。為了克服傳統上使用字詞頻率建構特徵向量卻忽略隱含語意關係的缺點,本論文提出計算句子中出現的所有字詞對選取的特徵字詞之語意相似關係,來對句子建立MI特徵向量,進行句子分群。從分群的結果中選定可以代表分群概念的標籤,使用標籤來重新組織概念架構,並且在分群中挑出可以代表兩個專有詞彙的比較句。
In this thesis, we use PDF textbook as data resource, focus on comparing the conceptual sentences of two domain-specific terms .We first calculate the mutual information of every word in sentence and selected feature words to build MI vector space model. The vector space model is used to evaluate the similarity of two sentences for the hierarchical clustering algorithm. After clustering, we choose representative labels and comparative sentence pair for every cluster. According representative labels, the clusters which have the same labels will be grouped as a new concept hierarchy.
[1]M. Grineva, M. Grinev, and D. Lizorkin, “Extracting Key Terms From Noisy and Multi-theme Documents,” in Proceedings of the 18th international conference on World wide web (WWW), 2009.
[2]X. Hu, N. Sun, C Zhang, and T. Chua, “Exploiting internal and external semantics for the clustering of short texts using world knowledge,” in Proceedings of the 18th ACM conference on Information and knowledge management (CIKM), 2009.
[3]D. Bollegala, Y. Matsuo, and M. Ishizuka, “Measuring the similarity between implicit semantic relations using web search engines,” in Proceedings of the Second ACM International Conference on Web Search and Data Mining, 2009.
[4]W. Jin, R.K. Srihari, H.H. Ho, and X. Wu, “Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques,” in Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, 2007.
[5]S. Momtazi, and D. Klakow, “A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems,” in Proceedings of the 18th ACM conference on Information and knowledge management, 2009.
[6]J. Ko, L. Si, and E. Nyberg, “A Probabilistic Graphical Model for Joint Answer Ranking,” in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007.
[7]H. D. Kim, and C. Zhai, “Generating comparative summaries of contradictory opinions in text,” in Proceedings of the 18th ACM conference on Information and knowledge management, 2009.
[8]H.T. Dang, D. Kelly and J. Lin, “Overview of the TREC 2007 Question Answering Track,”in Proceedings of the Sixteenth Text REtrieval Conference (TREC), 2007.
[9]K.S. Han, Y.I. Song and H.C. Rim, ”Probabilistic Model for Definitional Question Answering,”in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2006.
[10]H. CUI, M.Y. KAN, and T.S. CHUA, “Soft Pattern Matching Models for Definitional Question Answering,” in Proceedings of ACM Transactions on Information Systems, 2007.
[11]G.Cong, L.Wang, C,Y,Lin, Y.I. Song and Y.Sun, “Finding Question-Answer Pairs from Online Forums,”in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2008.
[12]L.Hong and B.D. Davison,”A Classification-based Approach to Question Answering in Discussion Boards,”in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2009.
[13]E.Agichtein, C.Castillo, D.Donato, A.Gionis and G.Mishne, “Finding High-Quality Content in Social Media,”in Proceedings of the international conference on Web search and web data mining(WSDM),2008.
[14]Y.Liu, J.Bian and E.Agichtein, ”Predicting Information Seeker Satisfaction in Community Question Answering,”in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2008.
[15]Fautsch, C. and Savoy, J. 2010. ,”Adapting the tf idf vector-space model to domain specific information retrieval,” in Proceedings of the 2010 ACM Symposium on Applied Computing, 2010.
[16]Wang, L., Jia, Y., and Han, W. 2007.,“Instant message clustering based on extended vector space model,” in Proceedings of the 2nd International Conference on Advances in Computation and Intelligence, 2007.
[17]D.Bollegala,Y.Matsuo and M.Ishizuka, ”Measuring the Similarity Between Implicit Semantic Relations using Web Search Engines,”in Proceedings of the Second ACM International Conference on Web Search and Data Mining(WSDM),2009.
[18]R.Sinha and R.Mihalcea, “Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity,”In Proceedings of the International Conference on Semantic Computing(ICSC),2007.