研究生: |
鍾昇宏 Sheng-Hong Chung |
---|---|
論文名稱: |
兩個專有詞彙關聯句自動擷取之研究 Associated Sentences Retrieval for Two Domain-Specific Terms |
指導教授: | 柯佳伶 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 70 |
中文關鍵詞: | 專有詞彙 、問題分類 、句型樣式 、語意關聯度 、關聯句 、關聯句組 |
英文關鍵詞: | domain-specific term, query classification, lexical pattern, relatedness degree, associated sentence, associated sentence pair |
論文種類: | 學術論文 |
相關次數: | 點閱:164 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文之研究目的是針對可信文字資料來源,根據使用者所輸入的兩個專有詞彙,依照詞彙不同的關係,由資料來源中自動找出關聯句組或是關聯句,幫助使用者比較兩個專有詞彙概念。我們將詞彙關係分成兩大類:包含關係和非包含關係。本系統利用網路搜尋引擎分別搜尋兩個查詢詞彙,蒐集包含個別查詢詞彙的前K名網頁摘要,統計兩個查詢詞彙在彼此網頁摘要中出現的機率作為特徵,依據詞彙關係分類模型進行自動分類。兩個查詢詞彙若被分類為”包含”關係,系統會取出同時包含兩個查詢詞彙之句子作為關聯句集,比對關聯句型規則模型,並計算與查詢詞彙之語意關聯度,選出關聯分數最高的句子當作關聯句。查詢詞彙若被分類為 ”非包含” 關係,系統則取出包含任一查詢詞彙的句子作為關聯句集,從中找出對兩個查詢詞彙有高度關聯的共同概念詞,將句子依照共同概念詞進行分群,評估句子與共同概念詞以及句子間兩兩配對的語意相關分數,挑選分數最高的兩個句子形成關聯句組。實驗結果顯示本研究所提出的方法能有效對查詢字組的關係自動分類;考慮句型和語意關聯度分數找出的關聯句有助於使用者了解查詢詞彙的關聯性;而利用句組分數篩選出的關聯句組亦大多可以幫助使用者釐清兩個查詢詞彙在某些概念上相同相異的比較。
According to different relationships between two domain-specific query terms, this thesis studies the strategies of automatically extracting the associated sentences or sentence pairs of the query terms from a reliable text data source. The goal of this task is to help users comparing two domain-specific query terms from the retrieved results. Two categories for the relationships between query terms are defined in this thesis: contained and not-contained relationships. The system uses a search engine on theweb to search the given two query termsforcollecting the top-k snippets for each query term. The probability of a query term appearing in the top-k snippets of the other query term is used as features to train aclassifier of query pair relationship. Ifthe two query terms have the containedrelationship, the sentences containing both terms are retrieved as the candidate sentences.Foreach candidate sentence, itsassociated score is evaluated by matching the lexical pattern withthe associated sentence rule model and computing the semantic relatedness degreewith the query terms. The sentence with the highest associated score is selected as the associated sentence.If the relationship is a not-containedrelationship, the common concept terms, which have high semantic relatedness with both query terms, are extracted from the sentences containingone of the two query terms.We use common concept terms to group sentences.Within each group, the representation scoreof each candidate sentence pair is evaluated by computing its sematic relatedness with the concept terms andthe sematic relatedness sematic similaritybetween the sentence pair. The sentence pairwith the highest representation score isselected as an associated sentence pair.The experimental results show that the proposed methodcan effectively classifythe relationshipsof query terms. Moreover, the retrieved associated sentencesare helpful for usersto understand the semantic relationshipbetween two query terms.The discovered associated sentence pairs also effectively help users to clarify the similar and dissimilar concept between two query terms.
[1] N. Schlaefer, J. Chu-Carroll, and E. Nyberg, “Statistical Source Expansion for Question Answering,” in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011.
[2] H. T. Dang, D. Kelly and J. Lin, “Overview of the TREC 2007 Question Answering Track,” in Proceedings of the Sixteenth Text Retrieval Conference (TREC), 2007.
[3] X. Cao, G. Cong, and B. Cui, “The Use of Categorization Information in Language Models for Question Retrieval,” in Proceedings of the 18th ACM conference on Information and Knowledge Management (CIKM), 2009.
[4] L. Cai, G. Zhou and K. Liu, "Large-Scale Question Classification in cQA by Leveraging Wikipedia Semantic Knowledge", in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011
[5] Song, Y., Qiu, B., and Farooq, U. “Hierarchical tag visualization and application for tag recommendations.” in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011
[6] D. Bollegala, Y. Matsuo, and M. Ishizuka, "Measuring the SimilarityBetween Implicit Semantic Relations Using Web Search Engines", in Proceedings of the Second ACM International Conference on Web Search and Data Mining(WSDM), 2009.
[7] A. Kalyanpur, S. Patwardhan, and B. Boguraev, “Fact-Based Question Decomposition for Candidate Answer Re-Ranking” in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011
[8] X. Xue, J. Jeon, and W. B. Croft, “Retrieval models for question and answer archives,” in Proceedings of the 31rd Annual International ACM conference on Special Interest Group on Information Retrieval (SIGIR), 2008
[9] G. Luo, C. Tang, and Y. Tian, “Answering relationship queries on the web” in Proceedings of the 16th international conference on World Wide Web(WWW), 2007
[10] S.E. Robertsom, S. Walker, and M. Hancock-Beaulieu, “Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive”, In proceedings of the 7th Text Retrieval Conference(TREC-7), NIST Special Publication.
[11] D. Jiang, K. W. Leung, W. Ng, “Context-Aware Search Personalization with Concept Preference,” in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011
[12] S. Szumlanski and F. Gomez, “Automatically Acquiring a Semantic Network of Related Concepts” in Proceedings of the 19th ACM conference on Information and Knowledge Management (CIKM), 2010
[13] C. Fellbaum, editor. “WordNet: An electronic Lexical Database” MIT Press, 1998
[14] C. Fautsch, J. Savoy, “Adapting the tf-idf Vector Space Model to Domain Specific Information Retrieval” in proceedings of 25th ACM Symposium on Applied Computing(SAC), 2010
[15] D. Vandic, J. V. Dam and F. Hogenboom, “A Semantic Clustering-Based Approach for Searching and Browsing Tag Spaces” in proceedings of 26th ACM Symposium on Applied Computing(SAC), 2011
[16] M. S. Pera, R. Qumsiyeh, Y. K. Ng, “A Query-Based Multi-document Sentiment Summarizer” in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011.
[17] 謝聿承, 「兩個專有詞彙概念關聯句自動擷取技術之研究」 ,國立臺灣師範大學,碩士論文,民國100年。
[18] H. Cui, M. Kan and T. Chua, “Generic Soft Pattern Models for Definitional Question Answering” ACM Transactions on Information Systems, Vol. 25, No. 2, Article 8, April 2007.
[19] R.-E. Fan, P.-H. Chen, and C.-J. Lin. “Working set selection using the second order information for training SVM,” Journal of Machine Learning Research 6, 1889-1918, 2005