Author: |
陳佩瑄 Chen, Pei-Hsuan |
---|---|
Thesis Title: |
以混合式方法自生醫文獻擷取藥物-藥物交互作用之研究 A Hybrid Method for Drug-Drug Interaction Extraction from Biomedical Literature |
Advisor: |
侯文娟
Hou, Wen-Juan |
Degree: |
碩士 Master |
Department: |
資訊工程學系 Department of Computer Science and Information Engineering |
Thesis Publication Year: | 2017 |
Academic Year: | 105 |
Language: | 中文 |
Number of pages: | 79 |
Keywords (in Chinese): | 藥物-藥物交互作用 、生醫文獻 、機器學習 、規則為基 |
Keywords (in English): | Drug-Drug Interaction, Biomedical Literature, Machine Learning, Rule-based |
DOI URL: | https://doi.org/10.6345/NTNU202202907 |
Thesis Type: | Academic thesis/ dissertation |
Reference times: | Clicks: 134 Downloads: 7 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
一種疾病往往伴隨著許多不同的症狀,而一種症狀通常使用一種藥物治療,例如:感冒時,會有咳嗽、鼻塞或頭痛等症狀,所以就需要多種藥物來治癒該疾病。在服藥期間,若藥物與藥物之間產生不理想之狀況,像是藥效過強或互相抵抗,導致療效失敗,嚴重甚至導致死亡,就是所謂的藥物-藥物交互作用(Drug-Drug Interaction, DDI)。目前許多的藥物-藥物交互作用仍隱藏在大量的生醫文獻中,等著被研究人員挖掘,若利用自然語言處理(Natural Language Processing, NLP)的擷取和分析等技術,將能大量挖掘隱藏的藥物-藥物交互作用以及減少研究人員的挖掘時間。
論文中所使用的資料來源是由SemEval 2013 Task 9所提供的語料庫,內容包括MedLine的摘要和DrugBank的資料庫,SemEval 2013 Task 9的競賽內容為自生醫文獻中擷取藥物-藥物交互作用(SemEval 2013 Task9:Extraction of Drug-Drug Interactions from Biomedical Texts),將藥物-藥物交互作用分成五類:Advice(建議)、Effect(影響)、Mechanism(機制)、Int(交互作用)和無交互作用,評估的方式為計算辨識和分類的precision、recall和F1-measure。
本研究利用混合式方法進行辨識和分類,其中混合式方法為機器學習方法和以規則為基方法,由於語料庫內部五個類別的數量呈現不平衡的狀態,因此,運用兩階段的方式先辨識藥物對是否有交互作用存在,辨識所獲得的F1-measure為70.8%,接著再將辨識出有交互作用的藥物對做分類,分類所獲得的F1-measure為62.5%,其中FBK-irst隊伍獲得最好的效能,辨識和分類的F1-measure分別為80.0%和65.1%,參賽隊伍之平均辨識和分類的F1-measure分別為68.1%和51.8%,雖然辨識和分類無法比FBK-irst隊伍所獲得的F1-measure還高,但所獲得的F1-measure高於平均許多。在未來可將機器學習方法和以規則為基方法運用於其他領域的資訊擷取研究上。
A disease is often accompanied by many different symptoms, and a symptom is usually treated with a drug. For example, when someone gets a cold, he or she usually has symptoms such as coughing, stuffy nose or headache, so it leads to need many kinds of drugs to cure the disease. Drug-Drug Interaction (DDI) is happened during the treatment with drugs if unpredictable results are produced. It may increase or decrease the drug effect, even may cause death. At present, many Drug-Drug Interactions are still hidden in a large number of biomedical literature. It takes a lot of time to find out the DDIs for the researchers. Using Natural Language Processing (NLP) extraction and analysis technologies will be able to discover a large number of hidden DDIs and reduce the researchers’ research time.
The corpus in the thesis is provided by Semeval 2013 Task 9, which includes MedLine abstracts and DrugBank database. Semeval 2013 Task 9 aims to extraction of Drug-Drug Interactions from biomedical texts, and DDIs are classified as the following five types: Advice(ADV),Effect(EFF),Mechanism(MEC),Int(INT) and non-interaction. Evaluation results will be reported using the standard precision、recall and F1-measure.
This study uses the hybrid method to detect and classify DDIs. The hybrid method includes a machine learning method and a rule-based method. Because the corpus is unbalanced, the study uses two stages to complete the tasks of detection and classification. The first stage is to detect with all the classes (i.e., positive and negative), and the second stage is to make a classification on the positive DDIs (i.e., ADV, EFF, MEC, INT). The experiments show the results of 70.8% F-score in detection, and 62.5% F-score in classification. Though the performance is still worse than FBK-irst team in DDI detection and classification, the performance is higher than the average performance of all teams. In the future, we hope to use the hybrid method in other area of information extraction researches.
Altincay, H., & Ergun, C. (2004, January). Clustering based under-sampling for improving speaker verification decisions using AdaBoost. In SSPR/SPR (pp. 698-706).
Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the AMIA Symposium (p. 17). American Medical Informatics Association.
Björne, J., Heimonen, J., Ginter, F., Airola, A., Pahikkala, T., & Salakoski, T. (2011). EXTRACTING CONTEXTUALIZED COMPLEX BIOLOGICAL EVENTS WITH RICH GRAPH‐BASED FEATURE SETS. Computational Intelligence, 27(4), 541-557.
Björne, J., Kaewphan, S., & Salakoski, T. (2013, June). UTurku: drug named entity recognition and drug-drug interaction extraction using SVM classification and domain knowledge. In Second Joint Conference on Lexical and Computational Semantics (* SEM) (Vol. 2, pp. 651-659).
Bobic, T., Fluck, J., & Hofmann-Apitius, M. (2013). SCAI: Extracting drug-drug interactions using a rich feature vector. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Vol. 2, pp. 676-683
Bokharaeian, B., & Díaz, A. (2013, June). NIL UCM: Extracting Drug-Drug interactions from text through combination of sequence and tree kernels. In Second Joint Conference on Lexical and Computational Semantics. Atlanta, Georgia, USA (pp. 644-650).
Chang, C. C., & Lin, C. J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
Chowdhury, M. F. M., & Lavelli, A. (2013). FBK-irst: A multi-phase kernel based approach for drug-drug interaction detection and classification that exploits linguistic information. Atlanta, Georgia, USA, 351, 53.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Hailu, N. D., Hunter, L. E., & Cohen, K. B. (2013). UColorado SOM: extraction of drug-drug interactions from biomedical text using knowledge-rich and knowledge-poor features. Proceedings of SemEval, 684-8.
Kubat, M., & Matwin, S. (1997, July). Addressing the curse of imbalanced training sets: one-sided selection. In ICML (Vol. 97, pp. 179-186).
Lewis, D. D., & Catlett, J. (1994, July). Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the eleventh international conference on machine learning (pp. 148-156).
Neves, M. L., Carazo, J. M., & Pascual-Montano, A. (2009, June). Extraction of biomedical events using case-based reasoning. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task (pp. 68-76). Association for Computational Linguistics.
Rastegar-Mojarad, M., Boyce, R. D., & Prasad, R. (2013, June). UWM-TRIADS: classifying drug-drug interactions with two-stage SVM and post-processing. In Proceedings of the 7th International Workshop on Semantic Evaluation (pp. 667-674).
Sánchez Cisneros, D. (2013). UC3M: A kernel-based approach to identify and classify DDIs in biomedical texts. Association for Computational Linguistics.
Segura-Bedmar, I., Martinez, P., & de Pablo-Sánchez, C. (2011). Using a shallow linguistic kernel for drug–drug interaction extraction. Journal of biomedical informatics, 44(5), 789-804.
Segura-Bedmar, I., Martınez, P., & Herrero-Zazo, M. (2013). SemEval-2013 Task 9: Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013). Atlanta, Georgia, USA, 3206(65), 341.
Thomas, P., Neves, M., Rocktäschel, T., & Leser, U. (2013, June). WBI-DDI: drug-drug interaction extraction using majority voting. In Second Joint Conference on Lexical and Computational Semantics (* SEM) (Vol. 2, pp. 628-635).
Weiss, G. M., & Provost, F. (2001). The effect of class distribution on classifier learning: an empirical study. Rutgers Univ.
Yang, Y., & Pedersen, J. O. (1997, July). A comparative study on feature selection in text categorization. In Icml (Vol. 97, pp. 412-420).
石琢暐(2011)支持向量機簡介,Available form http://eeil.imis.ncku.edu.tw/knowledgebase/zhi-yuan-xiang-liang-ji-support-vector-machine
張毓珊 (2009) 發展處理類別不平衡問題之資料探勘模式,朝陽科技大學資訊管理系學位論文.