簡易檢索 / 詳目顯示

研究生: 李柏勳
Lee, Bo-Syun
論文名稱: 生醫文獻中疾病與藥物關係之樣式自動化擷取
Automatic Pattern Extraction of Disease-Drug Association from Biomedical Texts
指導教授: 侯文娟
Hou, Wen-Juan
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 57
中文關鍵詞: 疾病-藥物關聯度樣式擷取生醫文獻卡方檢定
英文關鍵詞: disease-drug association, pattern extraction, biomedical literature, chi-square test
DOI URL: https://doi.org/10.6345/NTNU202202301
論文種類: 學術論文
相關次數: 點閱:90下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究嘗試從生醫文獻中找出人類疾病與藥物的關聯度,並在人類疾病與藥物之間得到一些規則或是關聯性。若能自動從文獻中預測疾病與藥物之間的相關性,對於未來生醫研究人員探討疾病與藥物的文獻資料時,就可以利用此關聯性,快速了解疾病與藥物之間的關係,達到快速獲取資訊的目的,既可以節省人力與時間成本,也能加速生物醫學的發展速度。
    本研究所使用的資料為Clinical trials (https://clinicaltrials.gov/)網站中提供的一些美國官方已完成的疾病研究和藥物的配對,以及PubMed資料庫(https://www.ncbi.nlm.nih.gov/pubmed/)的生醫文獻摘要。在本論文中,首先從PubMed文章摘要找出含有Clinical trials所提及到的疾病與藥物之句子,視為正向的句子;以及相同疾病卻不同的藥物之句子,視為負向的句子。透過兩種模型,第一種是句子中疾病位置在前、藥物位置在後;第二種則是句子中藥物位置在前、疾病位置在後,以便分析在疾病與藥物之間的動詞、名詞等相關資訊。本研究將這些單字分為純關聯、純無關聯性、混合字,再使用卡方檢定(chi-square test)把符合門檻的中性字再做一次的分類,得到疾病與藥物關係之樣式規則,最後利用這些樣式規則與測試資料做比對與評估,本研究實驗最佳結果Precision為100%、Recall為89%以及F-score為94%。

    The objectives of this study are to identify the association between human diseases and medications from the biomedical literatures, and to find the rules or relationships between human diseases and drugs. If the association can be identified automatically from literatures, it will help biomedical researchers who is studying the literatures of diseases and medications use the information understand the relationships between diseases and drugs, and have the benefit of collecting the information more efficiently. It would either save the human resource cost and time cost or accelerate the pace of development of biomedical science.
    The data in this study is from the existing studies of diseases and drugs pairs accomplished by the American authorities in the website of Clinical Trial (https://clinicaltrials.gov/) and biomedical literatures in the website of PubMed (https://www.ncbi.nlm.nih.gov/pubmed/). In this thesis, initially we search for the sentences with the terms of diseases and drugs mentioned in the Clinical trials website and identify these sentences as positive sentences. Then find the sentences with relevant diseases but with different medications and identify these sentences as negative sentences. As to analyze the number of verbs and nouns pertinent to diseases and medications, two models with different sentence structures are established.
    The first model is for the sentences with the order that word “diseases” precedes the word “medications”. The second model is for the sentences in a reverse order of the first model. Then classify these words into categories of pure association, pure no association and neutrals. Among them, the qualified neutrals are further classified by the method of the chi-square test. The associations between diseases and medications are, as a result, identified which are called patterns later. Finally, use the patterns to test data to extract the disease and drug pairs. The best experimental results show precision value of 100%, Recall value of 89%,and F-score value of 94%.

    摘要 i Abstract ii 目錄 vi 附表目錄 vii 附圖目錄 viii 第一章 緒論 1 第一節 研究背景 1 第二節 研究目的 2 第三節 論文架構 2 第二章 相關研究探討 3 第一節 文獻探討 3 第二節 疾病介紹 6 第三節 Stanford Parser 8 第四節 Drug Bank 9 第五節 Stemming 10 第三章 方法與步驟 11 第一節 緒論 11 第二節 背景知識庫 11 第三節 前置處理程序 14 第四節 研究方法架構 16 第五節 後置處理程序 20 第四章 實驗與結果 31 第一節 實驗資料 31 第二節 評估測量標準 37 第三節 實驗結果 38 第四節 分析與討論 42 第五章 結論與未來發展 56 參考文獻 57

    COPD介紹:http://epaper.ntuh.gov.tw/health/201509/health_2.html
    Drug Bank:https://www.drugbank.ca/
    Jang, D., Lee, S., Lee, J., Kim, K., & Lee, D. (2016). Inferring new drug indications using the complementarity between clinical disease signatures and drug effects. Journal of biomedical informatics, 59, 248-257.
    MeSH terms:https://www.ncbi.nlm.nih.gov/mesh/
    Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130-137.
    Porter stemmer. Available from https://tartarus.org/martin/PorterStemmer/
    PubMed database:https://www.ncbi.nlm.nih.gov/pubmed/
    Stanford Parser:http://nlp.stanford.edu/software/lex-parser.shtml

    Xu, R., & Wang, Q. (2013). Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing. BMC bioinformatics, 14(1), 181.
    卡方檢定的介紹:http://amebse.nchu.edu.tw/new_page_659.htm
    非小細胞肺癌介紹:http://www2.cch.org.tw/lungcancer/LC_path.htm
    葉氏連續性修正:http://terms.naer.edu.tw/detail/1312488/

    下載圖示
    QR CODE