簡易檢索 / 詳目顯示

研究生: 吳培豪
Wu, Pei-Hao
論文名稱: 醫療檢驗報告關鍵字擷取與結構化之研究
Keyword Extraction and Structuralization for Medical Report
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 55
中文關鍵詞: 關鍵字詞擷取醫療檢驗報告結構化醫學詞彙字典建立
英文關鍵詞: keyword extraction, structuralization for medical report, establishment of medical vocabulary dictionary
DOI URL: https://doi.org/10.6345/NTNU202202021
論文種類: 學術論文
相關次數: 點閱:122下載:14
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近來醫療科技的進步,可以對病人做更精確且詳細的檢查。但很多檢驗報告並非是一些數據化的數值,而是檢驗科醫生對於檢驗過程中採用儀器與技術檢查觀察發現的結果,以文字描述說明。若能將上述非結構化的文字檢驗報告轉換成一種結構化檢驗報告,將可幫助診斷醫師較有效率了解病人在不同檢查項目的狀況,更進一步可進行病症資料關聯分析,找出影響病症的潛在因素。本論文對腎臟科病理檢驗報告,運用自然語言詞性分析設計出可自動擷取出關鍵字詞組的方法,建立檢驗報告中各個段落的醫療詞彙字典,作為檢驗報告結構化之詞彙擷取依據。並運用主題機率模型分析,提出可自動擷取檢驗報告主要檢驗細項關鍵字詞的方法。最後利用醫療詞彙字典,實作出將檢驗報告依照不同段落特性個別進行結構化的方法。實驗結果顯示本論文提供的處理技術,能有效將檢驗報告進行結構化,並可擷取出常見檢驗細項關鍵詞,將有助醫療文字報告的自動處理及分析。

    In recent years, the patients usually accept more and more accurate and detailed examinations because of the rapid advances in medical technology. Many of the examination reports are not represented in numerical data, but are text documents written by the medical examiners according to the observations obtained from the instruments and biochemical tests. If the above-mentioned unstructured data can be converted into a examination report in a structured form, it will help the doctors to understand the patient's status in different examination items more efficiently. Besides, further association analysis on the structural data can be performed to identify potential factors that affect a disease. In this thesis, from the pathology examination reports of renal disease, we applied the POS tagging result of natural language analysis to automatically extract the keyword phrases. Then a medical vocabulary dictionary of examination report for each paragraph is established, which is used as the basic information for retrieving the terms to construct a structured form of the report. Besides, a topic probability modeling method is applied to automatically find the keywords of the examination items from the reports. Finally, a system is implemented to generate the structured form for the various types of paragraphs in an examination report with the assistance of the constructed medical dictionary. The results of experiments showed that the methods proposed in this paper can effectively construct a structural form of examination reports. Furthermore, the keywords of the popular examination items can be extracted correctly. The above techniques will help automatic processing and analysis of medical text reports.

    目錄 第一章 緒論 1 1.1 研究動機 1 1.2 研究目的 2 1.3 論文方法 5 1.4 論文架構 7 第二章 文獻探討 8 2.1 醫療文字記錄探勘 8 2.2 文字探勘及擷取 10 第三章 醫學詞彙字典建立方法 12 3.1 檢驗報告前處理 12 3.2 建立段落字典 15 第四章 醫學詞彙字典的應用 24 4.1 結構化處理方法 24 4.2 特殊項目段落細項關鍵字詞自動擷取 30 第五章 實驗結果與討論 36 5.1 實驗資料來源 36 5.2 LCS篩除錯字效果評估 37 5.3 擷取檢驗細項候選關鍵字詞效果評估 40 5.4 檢驗報告結構化結果評估 50 第六章 結論與未來研究方向 52 6.1 結論 52 6.2 未來方向 52 參考文獻 53 附錄一 55

    [1] Stanford CoreNLP – Core natural language software https://stanfordnlp.github.io/CoreNLP.

    [2] X. Rong, Z. Chen, Q. Mei, and E. Adar. EgoSet: exploiting word ego-networks and user-generated ontology for multifaceted set expansion. In Proc. of the International Conference on Web Search and Data Mining (WSDM), 2016.

    [3] Y. Jo, N. Loghmanpour, and C. P. Rose. Time series analysis of nursing notes for mortality prediction via a state transition topic model. In Proc. of the International Conference on Information and Knowledge Management (CIKM), 2015.

    [4] T. R. Goodwin, and S. M. Harabagiu. Medical question answering for clinical decision support. In Proc. of the International Conference on Information and Knowledge Management (CIKM), 2016.

    [5] R. Feldman, O. Netzer, A. Peretz, and B. Rosenfeld. Utilizing text mining on online medical forums to predict label change due to adverse drug reactions. In Proc. of Knowledge Discovery and Data Mining (KDD), 2015.

    [6] N. Tandon, G. D. Melo, A. De, and G. Wrikum. Knowlywood: mining activity knowledge from hollywood narratives. In Proc. of the International Conference on Information and Knowledge Management (CIKM), 2015.

    [7] Y. Song, and Q. Guo. Query-less: predicting task repetition for nextGen proactive search and recommenddation engines. In Proc. of the International World Wide Web Conference (WWW), 2016.

    [8] D. Savenkov, and E. Agichtein. When a knowledge base is not enough-question answering over knowledge bases with external text data. In Proc. of the Special Interest Group on Information Retrieva (SIGIR), 2016.

    [9] M. Paterson, and V. Dančík. Longest common subsequences. In Proc. of the Mathematical Foundations of Computer Science (MFCS), 1994.

    [10] M. Ghassemi, T. Naumann, F. Doshi-Velez, N. Brimmer, R. Joshi, A. Rumshisky, and P. Szolovits. Unfolding physiological state : mortality modelling in intensive care units. In Proc. of the Knowledge Discovery and Data Mining (KDD), 2014.

    [11] L.-W. Lehman, M. Saeed, W. Long, J. Lee, and R. Mark. Risk stratification of ICU patients using topic models inferred from unstructured progress notes. In Proc. of the American Medical Informatics Association (AMIA), 2012.

    [12] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. In Proc. of the Journal of Machine Learning Research (JMLR), 2003.

    [13] S. Balaneshin-kordan, A. Kotov, and R. Xisto. Wsu-Ir at trec 2015 clinical decision support track: Joint weighting of explicit and latent medical query concepts from diverse sources. In Proc. of the Text REtreival Conference (TREC), 2015.

    [14] R. Leaman, L. Wojtulewicz, R. Sullivan, A. Skariah, J. Yang, and G. Gonzalez. Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks. In Proc. of the 2010 workshop on Biomedical Natural Language Processing, Association for Computational Linguistics, 2010.

    [15] X. Liu, and H. Chen. Azdrugminer: an information extraction system for mining patient-reported adverse drug events in online patient forums. In Proc. of the International Conference on Smart Health (ICSH), 2013.

    [16] Sethi, Sanjeev, et al. Mayo clinic/renal pathology society consensus report on pathologic classification, diagnosis, and reporting of GN. In Proc. of the Journal of the American Society of Nephrology (JASN), 2015.

    下載圖示
    QR CODE