簡易檢索 / 詳目顯示

研究生: 謝聿承
Yu-Cheng, Hsieh
論文名稱: 兩個專有詞彙概念關聯句 自動擷取技術之研究
Automatic Sentence Pairs Retrieval for Describing Common Concepts of Two Domain-Specific Terms
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 51
中文關鍵詞: 共同概念詞概念關聯代表句組擴展段落專有詞彙語意關聯度電子書
英文關鍵詞: common concepts, concept related sentence pairs, expanded paragraph, domain-specific term, semantic relatedness, e-Books
論文種類: 學術論文
相關次數: 點閱:113下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文之研究目的是針對特定專業領域的電子書為文件集,根據讀者輸入的兩個專有辭彙作為查詢詞彙,自動擷取出兩個專有詞彙概念關聯句組,以方便讀者了解兩個查詢詞彙在各共同概念詞底下之異同處。從電子書擷取出包含各別查詢詞彙之句子後,我們透過各共同候選概念詞與兩個查詢詞彙之字詞關聯性,及各分組之語意一致性,評估每一個共同候選概念詞之語意關聯代表度,自動找出與兩個查詢詞彙具高語意關聯之共同概念詞。接下來,針對每一個共同概念詞,從兩個查詢詞彙個別之句子集中,找出與查詢詞彙以及共同概念詞具高語意關聯度之句子,形成兩個查詢詞彙在共同概念詞底下之關聯代表句組。此外,由於一個句子所能表達的內容有限,因此我們也提出如何找出代表句在書中語意相關擴展段落的技術。實驗結果顯示本研究方法能有效擷取出與兩個專有詞彙相關的共同概念詞,而以關聯句組分數篩選後所找出的概念關聯句組多有助於使用者釐清兩個查詢詞彙異同處,特別是在提供擴展段落後確實可提升使用者對兩個專有詞彙的了解度。

    This thesis studies the strategies of automatically extracting concept related sentence pairs of two domain-specific query terms from domain-specific eBooks. The goal of extracting the sentence pairs is to describe the similar and different points on common concepts of the two query terms for users. First, the sentences that contain one of the two query terms are retrieved from the eBooks. Then the semantic relatedness degree of a common concept term is obtained by evaluating not only the relatedness between the concept term and the two query terms but also the semantic consistence of the corresponding sentence set of the concept term. Accordingly, the common concept terms with the top-k highest semantic relatedness degree are extracted. Next, for each extracted common concept, two sentences which totally have the highest semantic relatedness degree both with one of the two query terms and with the common concept term are selected from the corresponding sentence set to form a pair of concept related sentences. For solving the limited semantics described by a sentence, we also propose a method to discover an expanded paragraph for each concept related sentence. The experimental results show that the method proposed by this thesis effectively extracts common related concept terms of two query terms. Besides, after filtering the sentence pairs according to their semantic relatedness scores, most of the discovered concept related sentence pairs help users clarify the two query terms. Especially, the users’ understanding of the two query terms is further improved after reading the provided expanded paragraphs of the concept related sentence pairs.

    附圖目錄 I 附表目錄 II 第一章 緒論 1 1.1 研究動機 1 1.2研究目的 2 1.3研究範圍與限制 2 1.4 論文方法 4 1.5 論文架構 5 第二章 文獻探討 7 2.1 自動問答系統 7 2.2 具名實體擷取 9 2.3 文件語意關聯分析 10 2.3.1 字詞間的語意關聯 11 2.3.2 字詞與句子間的語意關聯探勘 11 2.4 對比意見摘要 13 第三章 系統運作流程 15 3.1 系統簡介 15 第四章 共同概念詞之產生與評估方法 17 4.1 資料前處理 17 4.1.1 電子書文字內容之擷取 17 4.1.2 文字內容之斷句 18 4.1.3 字詞詞性之標記 19 4.1.4 句子概念詞集之擷取 19 4.1.5 文字內容之前處理 20 4.2 文句索引之建立及搜尋 21 4.3 與查詢詞彙高語意相關度之共同概念詞擷取方法 22 4.3.1 共同候選概念詞之產生方法 23 4.3.2與查詢詞彙具高語意相關度之共同候選概念詞擷取方法 24 第五章 關聯代表句組以及擴展段落擷取 29 5.1關聯代表句組之產生方法 29 5.2 擴展段落之產生 31 5.2.1 立即上下句與代表句語意關聯度評估方法 31 5.2.2 其他擴展句之選取 31 第六章 實驗評估 34 6.1 實驗資料 34 6.2 實驗評估 35 6.2.1 評估共同概念詞之正確性 35 6.2.2 關聯代表句品質之評估 38 6.2.3 擴展段落品質之評估 44 6.3 分析與討論 47 第七章 結論與未來研究方向 48 7.1 結論 48 參考文獻 50

    [1] R. Blanco, and H. Zaragoza, “Finding Support Sentences for Entities,” in Proceedings of the 33rd Annual International ACM conference on Special Interest Group on Information Retrieval (SIGIR), 2010.
    [2] H.T. Dang, D. Kelly and J. Lin, “Overview of the TREC 2007 Question Answering Track,” in Proceedings of the Sixteenth Text Retrieval Conference (TREC), 2007.
    [3] E. Frank, G.W.Paynter, I.H.Written, C.Gutwin, C.G. Nevill-Manning, “Domain-Specific Keyphrase Extraction,” in Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI), 1999
    [4] K.S. Han, Y.I. Song and H.C. Rim, “Probabilistic Model for Definitional Question Answering,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), 2006.
    [5] X.Hu, N.Sun, C.Zhang and T.S. Chua, “Exploiting Internal and External Semantices for the Clustering of Short Texts Using World Knowledge,” in Proceedings of the 18th ACM conference Information and Knowledge management(CIKM), 2009
    [6] W. Jin, R.K. Srihari, H.H. Ho, and X. Wu, “Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques,” in Proceedings of the 17th IEEE International Conference on Data Mining (ICDM), 2007.
    [7] S. Jones, S. Lundy, G.W. Paynter, “Interactive Document Summarization Using Automatically Extracted,” in Proceedings of Hawaii International Conference on System Sciences (HICSS), 2002
    [8] H.D. Kim, and C. Zhai, “Generating Comparative Summaries of Contradictory Opinions in Text,” in Proceedings of the 30th Annual International ACM conference on Special Interest Group on Information Retrieval (SIGIR), 2007.
    [9] J.L.Koh, J.W. Cho, “Informative Sentence Retrieval for Domain Specific Terminologies,” in Proceedings of the 24th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems(IEA/AIE), 2011
    [10] K.W. Kor and T.S. Chua, “Interesting Nuggets and Their Impact on Definitional Question Answering, “in Proceedings of the 30th Annual International ACM conference on Special Interest Group on Information Retrieval (SIGIR), 2007.
    [11] D. Milne and I. Witten, “An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links,” in Wikipedia and AI workshop at the AAAI-08 Conference(WikiAI08), 2008.
    [12] T. Pedersen, S. Patwardhan, J. Michelizzi, “WordNet::Similarity: measuring the relatedness of concepts, ” in AAAI, pages 1024-1025, 2004.
    [13] S. Robertson, H. Zaragoza, and M. Taylor, “Simple BM25 Extension to Multiple Weighted Fields,” in Proceedings of the 13th ACM conference on Information and Knowledge Management (CIKM), 2004.
    [14] H. Raghavan, J Allan, A. McCallum, “An Exploration of Entity Models, Collective Classification and Relation Description,” in Proceedings of KDD Workshop on Link Analysis and Group Detection, 2004
    [15] D. Shahaf, and C. Guestrin, “Connecting the Dots Between News Articles,” in Proceedings of the 2010 the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010.
    [16] H.J. Zeng, Q.C. He, Z. Chen, W.Y. Ma, and J. Ma, “Learning to Cluster Web Search Results,” in Proceedings of the 27th Annual International ACM

    下載圖示
    QR CODE