簡易檢索 / 詳目顯示

研究生: 蔡秉翰
論文名稱: 以答案驗證方法為基礎之生醫相關問答系統
Biomedical Related Question Answering System Based on Answer Validation Approach
指導教授: 侯文娟
Hou, Wen-Juan
學位類別: 碩士
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 77
中文關鍵詞: 答案驗證機器閱讀問答系統評估跨語言評估會議字詞擴充線上人類孟德爾遺傳學阿茲海默
英文關鍵詞: Answer validation, QA4MRE, CLEF, Query expansion, OMIM, Alzheimer
論文種類: 學術論文
相關次數: 點閱:136下載:8
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文的研究,是以阿茲海默症為主題,實現一個問答系統來回答問題。目的在於能夠讀取一篇測試文章,回答相關文章的問題,正確理解測試問題的涵意,並擷取文章中相關字句資訊作評分計算,從中得到正確的答案,達成一個高精準度的問答系統。
    本論文的測試資料共包含了四個主題為阿茲海默症的測試資料集,每個測試集包含一篇測試文章、10個關於該文章的測試問題,每個問題都有五個選項供選擇,問題答案皆為單選題。另外使用到背景知識庫,資料來源包含從Pubmed Central得到關於阿茲海默症的醫學文獻資料庫(Medical Literature Analysis and Retrieval System Online, Medline)的文章,以及美國麻薩諸塞州的阿茲海默症研究中心(Massachusetts Alzheimer’s Disease Research Center)所提供關於阿茲海默症的生物文章及摘要。我們也從線上人類孟德爾遺傳學(Online Mendelian Inheritance in Man, OMIM)的網站針對阿茲海默症作為關鍵字,擷取此疾病的相對應基因名稱,再利用連結內文來建立基因關係。

    In our study, we use Alzheimer’s disease as a subject to implement a question answering system. The purpose of the thesis is to read a document and identify the answers to a set of document-related questions. We try to realize the meaning of the questions and extract related sentences from document. How to get the correct answer and achieve a high-precision question answering system is our goal.
    The test set is composed of 4 reading tests. Each reading test consists of one document, with 10 questions and a set of five choices per question. There always is one and only one correct option. We also use background collections from the articles of Medical Literature Analysis and Retrieval System Online, called Medline, and Massachusetts Alzheimer’s Disease Research Center. Besides, we reference to the website named “Online Mendelian Inheritance in Man, OMIM” and use “Alzheimer” as a key word to extract the gene names, then we use the content to build gene-gene relations.
    First, our system is similar to the scenario of human’s answering a multiple choice question. When we receive a question, we will read and retrieve sentences from document which may be related to the question. After that, we read all the choices to choose the one most similar to the related sentences. Second, we use the method of “Answer Validation” to combine the question part and answer part as hypothesis, and find answers in the document according to the hypothesis. Relevant sentences are retrieved from the associated document based on TFIDF of the matching words. The higher score the hypothesis gets, the more consistent of the subject matches in test document. Finally, we compute every hypothesis’ score based on the weight of related sentences. The hypothesis which gets the highest score is the most confident answer at last. This study divides in words as well as phrases as a unit to carry out experiments. In addition, we use background collections and OMIM terms as other resources to implement query expanded methods.
    We consist of all the 23 kinds of methods as results in our experiment. The accuracy of the first few experiments is only about ten to twenty percent because of our ignoring important information in the answer options. Then we use the method of Answer Validation and get higher accuracy. After that we add the assistance of phrases, top related sentence choosing and query expansion. Also, we try to evaluate these experiments and their impact. Gradually, the accuracy rises again, and approaching about fifty percent. It shows a pretty good result comparing to the other researches which use the same test set as our study.

    附表目錄                         VIII 附圖目錄                         IX 第一章 簡介 1 研究動機 1 研究目的 2 論文組織 2 第二章 相關研究探討 3 文獻探討 3 第三章 研究方法 11 第一節 緒論 11 第二節 實驗資料 11 第三節 研究方法一架構 15 第四節 研究方法一介紹 16 第五節 研究方法二架構 22 第六節 研究方法二介紹 24 第四章 實驗與結果 39 第一節 評估測量標準 39 第二節 實驗結果 40 第三節 結果分析與探討 50 第四節 公式(9)與公式(14)補充說明 61 第五章 結論與未來發展 65 參考著作 67 附錄 71

    Ask Jeeves. Available from http://www.ask.com.

    Attardi, Giuseppe, Atzori, Luca and Simi, Maria (2012). Index Expansion for Machine Reading and Question Answering. QA4MRE Pilot Task – Machine Reading of Biomedical Texts about Alzheimer’s Disease at CLEF 2012.

    Bhaskar, Pinaki, Pakray, Partha, Banerjee, Somnath, Banerjee, Samadrita, Bandyopadhyay, Sivaji and Gelbukh, Alexander (2012). Question Answering System for QA4MRE@CLEF 2012. Main Task of Question Answering for Machine Reading Evaluation at CLEF 2012.

    Bhattacharya, Sanmitra and Toldo, Luca (2012). Question Answering for Alzheimer Disease Using Information Retrieval. QA4MRE Pilot Task – Machine Reading of Biomedical Texts about Alzheimer’s Disease at CLEF 2012.

    Cao, Ling, Qiu, Xipeng and Huang, Xuanjing (2011). Deep Question Answering for Single Document with Lexical Chains. Main Task of Question Answering for Machine Reading Evaluation at CLEF 2011.

    CLEF2012. Available from http://clef2012.org/

    Fellbaum, Christiane (1998). WordNet: An Electronic Lexical Database. Cambrige, MA: MIT Press.

    GDep Parser. Available from http://people.ict.usc.edu/~sagae/parser/gdep/index.html

    LA-PDFText. Available from http://code.google.com/p/lapdftext/

    Miller, G.A. (1995). WordNet: A Lexical Database for English. Communications of the ACM, Vol. 38, No. 11:39-41.

    Morante, Roser, Krallinger, Martin, Valencia, Alfonso and Daelemans, Walter. Machine Reading of Biomedical Texts about Alzheimer’s Disease. QA4MRE Pilot Task – Machine Reading of Biomedical Texts about Alzheimer’s Disease at CLEF 2012.

    Online Mendelian Inheritance in Man. Available from http://omim.org/

    Pakray, Partha, Bhaskar, Pinaki, Banerjee, Somnath, Pal, BidhanChandra, Bandyopadhyay, Sivaji and Gelbukh, Alexander (2011). A Hybrid Question Answering System based on Information Retrieval and Answer Validation. Main Task of Question Answering for Machine Reading Evaluation at CLEF 2011.

    Phan, Suan-Hieu (2006). CRF Chunker: CRF English Phrase Chunker. PACLIC.

    Porter, M.F. (1980). An algorithm for suffix stripping. In Program, 14(3), pp.130-137.

    Porter’s Stemmer. Available from http://tartarus.org/martin/PorterStemmer/

    QA4MRE. Available from http://celct.fbk.eu/ResPubliQA/

    Qiu, Yonggang and Frei, H.P. (1993). Concept Based Query Expansion. In Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval, pp.160-169.

    Ramakrishnan, C., Patnia, A., Hovy, E. and Burns G. (2012). Layout-Aware Text Extraction from Full-text PDF of Scientific Articles.Source Code for Biology and Medicine 7(1): 7.

    Sagae, K. and Tsujii, J. (2007). Dependency parsing and domain adaptation with LR models and parser ensembles. Proceedings of the CoNLL 2007 Shared Task. Joint Conferences on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL'07). Prague, Czech Republic.

    Stop Word List. Available from http://www.lextek.com/manuals/onix/stopwords1.html

    Wren, Jonathan D. Question answering systems in biology and medicine – thetime is now. Bioinformatics 2011, 27 (14):2025 – 2026.

    Zhou, Guangyou, Cai, Li, Zhao, Jun and Liu, Kang. Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives (2011). Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp.653-662.
