研究生: |
謝聿承 Yu-Cheng, Hsieh |
---|---|
論文名稱: |
兩個專有詞彙概念關聯句 自動擷取技術之研究 Automatic Sentence Pairs Retrieval for Describing Common Concepts of Two Domain-Specific Terms |
指導教授: |
柯佳伶
Koh, Jia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 51 |
中文關鍵詞: | 共同概念詞 、概念關聯代表句組 、擴展段落 、專有詞彙 、語意關聯度 、電子書 |
英文關鍵詞: | common concepts, concept related sentence pairs, expanded paragraph, domain-specific term, semantic relatedness, e-Books |
論文種類: | 學術論文 |
相關次數: | 點閱:172 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文之研究目的是針對特定專業領域的電子書為文件集,根據讀者輸入的兩個專有辭彙作為查詢詞彙,自動擷取出兩個專有詞彙概念關聯句組,以方便讀者了解兩個查詢詞彙在各共同概念詞底下之異同處。從電子書擷取出包含各別查詢詞彙之句子後,我們透過各共同候選概念詞與兩個查詢詞彙之字詞關聯性,及各分組之語意一致性,評估每一個共同候選概念詞之語意關聯代表度,自動找出與兩個查詢詞彙具高語意關聯之共同概念詞。接下來,針對每一個共同概念詞,從兩個查詢詞彙個別之句子集中,找出與查詢詞彙以及共同概念詞具高語意關聯度之句子,形成兩個查詢詞彙在共同概念詞底下之關聯代表句組。此外,由於一個句子所能表達的內容有限,因此我們也提出如何找出代表句在書中語意相關擴展段落的技術。實驗結果顯示本研究方法能有效擷取出與兩個專有詞彙相關的共同概念詞,而以關聯句組分數篩選後所找出的概念關聯句組多有助於使用者釐清兩個查詢詞彙異同處,特別是在提供擴展段落後確實可提升使用者對兩個專有詞彙的了解度。
This thesis studies the strategies of automatically extracting concept related sentence pairs of two domain-specific query terms from domain-specific eBooks. The goal of extracting the sentence pairs is to describe the similar and different points on common concepts of the two query terms for users. First, the sentences that contain one of the two query terms are retrieved from the eBooks. Then the semantic relatedness degree of a common concept term is obtained by evaluating not only the relatedness between the concept term and the two query terms but also the semantic consistence of the corresponding sentence set of the concept term. Accordingly, the common concept terms with the top-k highest semantic relatedness degree are extracted. Next, for each extracted common concept, two sentences which totally have the highest semantic relatedness degree both with one of the two query terms and with the common concept term are selected from the corresponding sentence set to form a pair of concept related sentences. For solving the limited semantics described by a sentence, we also propose a method to discover an expanded paragraph for each concept related sentence. The experimental results show that the method proposed by this thesis effectively extracts common related concept terms of two query terms. Besides, after filtering the sentence pairs according to their semantic relatedness scores, most of the discovered concept related sentence pairs help users clarify the two query terms. Especially, the users’ understanding of the two query terms is further improved after reading the provided expanded paragraphs of the concept related sentence pairs.
[1] R. Blanco, and H. Zaragoza, “Finding Support Sentences for Entities,” in Proceedings of the 33rd Annual International ACM conference on Special Interest Group on Information Retrieval (SIGIR), 2010.
[2] H.T. Dang, D. Kelly and J. Lin, “Overview of the TREC 2007 Question Answering Track,” in Proceedings of the Sixteenth Text Retrieval Conference (TREC), 2007.
[3] E. Frank, G.W.Paynter, I.H.Written, C.Gutwin, C.G. Nevill-Manning, “Domain-Specific Keyphrase Extraction,” in Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI), 1999
[4] K.S. Han, Y.I. Song and H.C. Rim, “Probabilistic Model for Definitional Question Answering,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), 2006.
[5] X.Hu, N.Sun, C.Zhang and T.S. Chua, “Exploiting Internal and External Semantices for the Clustering of Short Texts Using World Knowledge,” in Proceedings of the 18th ACM conference Information and Knowledge management(CIKM), 2009
[6] W. Jin, R.K. Srihari, H.H. Ho, and X. Wu, “Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques,” in Proceedings of the 17th IEEE International Conference on Data Mining (ICDM), 2007.
[7] S. Jones, S. Lundy, G.W. Paynter, “Interactive Document Summarization Using Automatically Extracted,” in Proceedings of Hawaii International Conference on System Sciences (HICSS), 2002
[8] H.D. Kim, and C. Zhai, “Generating Comparative Summaries of Contradictory Opinions in Text,” in Proceedings of the 30th Annual International ACM conference on Special Interest Group on Information Retrieval (SIGIR), 2007.
[9] J.L.Koh, J.W. Cho, “Informative Sentence Retrieval for Domain Specific Terminologies,” in Proceedings of the 24th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems(IEA/AIE), 2011
[10] K.W. Kor and T.S. Chua, “Interesting Nuggets and Their Impact on Definitional Question Answering, “in Proceedings of the 30th Annual International ACM conference on Special Interest Group on Information Retrieval (SIGIR), 2007.
[11] D. Milne and I. Witten, “An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links,” in Wikipedia and AI workshop at the AAAI-08 Conference(WikiAI08), 2008.
[12] T. Pedersen, S. Patwardhan, J. Michelizzi, “WordNet::Similarity: measuring the relatedness of concepts, ” in AAAI, pages 1024-1025, 2004.
[13] S. Robertson, H. Zaragoza, and M. Taylor, “Simple BM25 Extension to Multiple Weighted Fields,” in Proceedings of the 13th ACM conference on Information and Knowledge Management (CIKM), 2004.
[14] H. Raghavan, J Allan, A. McCallum, “An Exploration of Entity Models, Collective Classification and Relation Description,” in Proceedings of KDD Workshop on Link Analysis and Group Detection, 2004
[15] D. Shahaf, and C. Guestrin, “Connecting the Dots Between News Articles,” in Proceedings of the 2010 the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010.
[16] H.J. Zeng, Q.C. He, Z. Chen, W.Y. Ma, and J. Ma, “Learning to Cluster Web Search Results,” in Proceedings of the 27th Annual International ACM