簡易檢索 / 詳目顯示

研究生: 戴衣菱
Yi-Ling Tai
論文名稱: 多個專有詞彙概念解釋句語意關連自動分析組織之研究
Semantic Association Analysis for Organizing Related Sentences of Multiple Domain-Specific Terms
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 71
中文關鍵詞: 資料探勘資訊檢索句子分群自動摘要
英文關鍵詞: Data Mining, Information Retrieval, Sentence Clustering, Automatic Summarization
論文種類: 學術論文
相關次數: 點閱:122下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文研究以電子書作為內容來源,針對兩個特定領域專有詞彙的概念解釋句,進行自動擷取以及分群組織整理。為了克服傳統上使用字詞頻率建構特徵向量卻忽略隱含語意關係的缺點,本論文提出計算句子中出現的所有字詞對選取的特徵字詞之語意相似關係,來對句子建立MI特徵向量,進行句子分群。從分群的結果中選定可以代表分群概念的標籤,使用標籤來重新組織概念架構,並且在分群中挑出可以代表兩個專有詞彙的比較句。

    In this thesis, we use PDF textbook as data resource, focus on comparing the conceptual sentences of two domain-specific terms .We first calculate the mutual information of every word in sentence and selected feature words to build MI vector space model. The vector space model is used to evaluate the similarity of two sentences for the hierarchical clustering algorithm. After clustering, we choose representative labels and comparative sentence pair for every cluster. According representative labels, the clusters which have the same labels will be grouped as a new concept hierarchy.

    目錄 i 附表目錄 ii 附圖目錄 iii 第一章 簡介 1 1.1 研究動機 1 1.2 研究目的 2 1.3 研究的範圍與方法 5 1.4 論文內容的安排 6 第二章 文獻探討 7 2.1 問答系統介紹 7 2.2文件特徵擷取 10 2.3 語意關聯 12 2.4 文件分群與摘要 14 第三章 系統架構與運作流程 16 第四章 資料前處理與索引建立 19 4.1 資料前處理 19 4.2 建立文件索引 23 第五章 解釋句分群 28 5.1 答案句排序 28 5.2 建立解釋句特徵向量 30 5.3 解釋句分群方法 32 第六章 解釋句概念組織方法 36 6.1 分群代表標籤 36 6.2 組織概念分群 37 6.3 比較句挑選 39 第七章 實驗結果與討論 41 7.1 實驗評估 41 7.2 分析與討論 49 第八章 結論與未來研究 52 參考文獻 53

    [1]M. Grineva, M. Grinev, and D. Lizorkin, “Extracting Key Terms From Noisy and Multi-theme Documents,” in Proceedings of the 18th international conference on World wide web (WWW), 2009.
    [2]X. Hu, N. Sun, C Zhang, and T. Chua, “Exploiting internal and external semantics for the clustering of short texts using world knowledge,” in Proceedings of the 18th ACM conference on Information and knowledge management (CIKM), 2009.
    [3]D. Bollegala, Y. Matsuo, and M. Ishizuka, “Measuring the similarity between implicit semantic relations using web search engines,” in Proceedings of the Second ACM International Conference on Web Search and Data Mining, 2009.
    [4]W. Jin, R.K. Srihari, H.H. Ho, and X. Wu, “Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques,” in Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, 2007.
    [5]S. Momtazi, and D. Klakow, “A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems,” in Proceedings of the 18th ACM conference on Information and knowledge management, 2009.
    [6]J. Ko, L. Si, and E. Nyberg, “A Probabilistic Graphical Model for Joint Answer Ranking,” in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007.
    [7]H. D. Kim, and C. Zhai, “Generating comparative summaries of contradictory opinions in text,” in Proceedings of the 18th ACM conference on Information and knowledge management, 2009.
    [8]H.T. Dang, D. Kelly and J. Lin, “Overview of the TREC 2007 Question Answering Track,”in Proceedings of the Sixteenth Text REtrieval Conference (TREC), 2007.
    [9]K.S. Han, Y.I. Song and H.C. Rim, ”Probabilistic Model for Definitional Question Answering,”in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2006.
    [10]H. CUI, M.Y. KAN, and T.S. CHUA, “Soft Pattern Matching Models for Definitional Question Answering,” in Proceedings of ACM Transactions on Information Systems, 2007.
    [11]G.Cong, L.Wang, C,Y,Lin, Y.I. Song and Y.Sun, “Finding Question-Answer Pairs from Online Forums,”in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2008.
    [12]L.Hong and B.D. Davison,”A Classification-based Approach to Question Answering in Discussion Boards,”in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2009.
    [13]E.Agichtein, C.Castillo, D.Donato, A.Gionis and G.Mishne, “Finding High-Quality Content in Social Media,”in Proceedings of the international conference on Web search and web data mining(WSDM),2008.
    [14]Y.Liu, J.Bian and E.Agichtein, ”Predicting Information Seeker Satisfaction in Community Question Answering,”in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR),2008.
    [15]Fautsch, C. and Savoy, J. 2010. ,”Adapting the tf idf vector-space model to domain specific information retrieval,” in Proceedings of the 2010 ACM Symposium on Applied Computing, 2010.
    [16]Wang, L., Jia, Y., and Han, W. 2007.,“Instant message clustering based on extended vector space model,” in Proceedings of the 2nd International Conference on Advances in Computation and Intelligence, 2007.
    [17]D.Bollegala,Y.Matsuo and M.Ishizuka, ”Measuring the Similarity Between Implicit Semantic Relations using Web Search Engines,”in Proceedings of the Second ACM International Conference on Web Search and Data Mining(WSDM),2009.
    [18]R.Sinha and R.Mihalcea, “Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity,”In Proceedings of the International Conference on Semantic Computing(ICSC),2007.

    下載圖示
    QR CODE