研究生: |
曹榮校 Tsao, Jung-Hsiao |
---|---|
論文名稱: |
基於主訴語料庫進行急診病患住院預測之研究 Using the Corpus of Chief Complaints for Predicting Emergency Hospital Admissions |
指導教授: | 吳怡瑾 |
學位類別: |
碩士 Master |
系所名稱: |
圖書資訊學研究所 Graduate Institute of Library and Information Studies |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 91 |
中文關鍵詞: | 主訴 、語料庫 、資料探勘 、文件處理 、檢傷級數 |
英文關鍵詞: | Chief complaint, corpus, data mining, text processing, Triage |
DOI URL: | http://doi.org/10.6345/NTNU202000020 |
論文種類: | 學術論文 |
相關次數: | 點閱:415 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
主訴(chief complaint, CC)為描述病患到緊急醫療部門之短文並為病情研判的重要依據,也成為醫令指示的核心內容之一。本研究以北部大型醫院提供之六個年度,共計824,614筆的急診主訴資料,藉由文字處理與文件探勘處理技術以建立五等級檢傷級數(triage)下的語料庫(corpus);研究主要目的為探勘鑑別力強的主訴關鍵字並透過不同triage之語料庫在病患入院進行檢傷判定時即預測住院之可能性,以協助醫院可即早備床。故,本研究涵蓋主訴關鍵字萃取、關鍵字住院預測檢定與主訴語料庫建立。本研究初步結果可觀察到各年度間相同檢傷級數的關鍵字相似度高,表示資料中的主訴用語在各年度及各檢傷級數間有相當的穩定度,故主訴關鍵字應為預測住院之重要屬性,本研究並將基於TFIDF篩選之住院關鍵字進行卡方檢定,建立120個語料庫關鍵字及計算相關係數建立107個與資訊熵37個語料庫關鍵字以提供後續住院預測研究之依據。研究發現使用TFIDF與 採用資訊熵(Entropy)能以較少比例的主訴關鍵字得到較能兼顧住院與非住院的預測,其中又以檢傷級數一的表現最好。研究顯示主訴為進行症狀監測、住院預測等研究之重要資料,以便調配其醫療資源的運用,初步研究結果可提供未來相關醫療臨床決策研究參考。
Chief complaints (CC) are short free-text phrases describing reasons for patients’ emergency department (ED) visits and are important references for medical order. This research adopted CC-related text-based data from January 1, 2010 to December 31, 2015 amassing 824,614 records from the hospital information system of a representative ED in Taiwan. Text processing and data mining techniques were used to construct a CC corpus for five-level ED triage. The aim of this research is to extract keywords of CC to predict if the patients will be inpatients at the time of triage, i.e., the early stage of the ED process, to help hospital prepare beds for patients in time. This research focuses on extracting keywords from CCs, conducting statistical tests for keywords, constructing a triage-based corpus, and then predicting the possibility of emergency hospital admissions. Our preliminary analysis results show that the keywords are quite similar for CCs in each triage across the six-year data collection period. It indicates the contents for CCs are quite stable; therefore, we believe the hospital can adopt the ED CCs to predict if the patients will be inpatients in the early stage of the process. The research carry out the chi-square test for hospitalization based on the keywords of the TFIDF screening to establish 120 CCs corpus keywords and calculate correlation coefficient to establish 107 corpus keywords and information entropy 37 corpus keywords to provide the basis for follow-up hospitalization prediction research. The research found that using TFIDF and using Entropy can compare the hospitalization and non-hospital prediction rates with a small proportion of the CCs keywords, and the best performance is the level 1 of ED triage. The research shows that the CC can be a data source for syndromic
surveillance and impatient prediction for the arrangement of medical resources. The results serve as a reference model for related ED research on clinical decision support in a similar context.
一、中文文獻
何敏煌(譯)(2017)。Python 資料科學學習手冊。台北市:碁峯出版社。
吳軍(2014)。數學之美(第二版)。北京市:人民郵電出版社。
林清山(1992)。χ 2 考驗。載於林清山(主編),心理與教育統計學(275-304頁)。台北市:東華書局。
周濟群、連子杰(2011)。運用文字探勘與 XBRL 技術提升企業資訊擷取與整合效益之研究。
當代會計,12(1),91-92。
周小青、黃惠勇、劉旺華(2017)。主訴的內涵及書寫。載於周小青、黃惠勇、劉旺華(主編),中醫主訴診療學(4 頁)。北京市:中國中醫藥出版社。
梅震國(2015)。基於 Lucene 系統的中文分詞演算法設計與實現。上網日期:2018 年 10 月 25 日。取自:http://www.docx88.com/wkid-6236098abe1e650e52ea99fb-1.html。
陳亦苓(譯)(2017)。預測式建模入門。載於陳亦苓(主編),資料科學的商業運用(原作者:Provost F. & Fawcett T.)(50-56 頁)。台北市:碁峯出版社。
陳亦苓(譯)(2017)。文本的表述與文字採礦。載於陳亦苓(主編),資料科學的商業運用(原作者:Provost F. & Fawcett T.)(254-263 頁)。台北市:碁峯出版社。
陳正昌、賈俊平(2016)。統計分析與 R。台北市:五南出版公司。
黃兆聖(2008)。開發建置整合型急診症候群即時監控暨偵測系統。台北醫學大學。未出版。台北市。
黃永昌(2018)。信息增益。載於黃永昌(主編),scikit-learn 機器學習(116-118 頁)。北京市:機械工業出版社。
黃昌寧、趙海(2007)。中文分詞十年回顧。中文信息學報,21(3),8-19。
黃瀚萱、陳信希(2018)。醫療大數據及其應用。載於蔡甫昌(主編),大數據之醫療運用與人文反省 (1-15 頁)。台北市:元照出版公司。
鄭捷(2018)。中文語言的機器處理。載於鄭捷(主編),自然語言處理-用人工智慧看懂中文(11-14 頁)。台北市:佳魁數位。
龍安靖(2013)。急診醫令系統。載於吳瑞容(主編),醫療資訊管理學(314-316 頁)。台北市:華杏出版公司。
簡禎富、許嘉裕(2018)。大數據分析與資料挖礦(第二版)。新北市:前程文化事業公司。
二、英文文獻
Balamuth F., Alpern E., Grundmeier R., Chilutti M., Weiss S., Fitzgerald J., & Lautenbach E.(2015). Comparison of Two Sepsis Recognition Methods in a Pediatric Emergency Department. Academic Emergency Medicine, 22(11),1298-1306.
Bellazzi R., Diomidous M., Sarkar I., Takabayashi K., Ziegler A., & McCray A.(2011). Data Analysis and Data Mining: Current Issues in Biomedical Informatics. Methods Inf Med, 6,536-543.
Chen, X., Shi, Z., Qiu, X., Huang, X.(2017).“Adversarial Multi-Criteria Learning for Chinese Word Segmentation”. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1193-1203.
Chapman W., Christensen L., Wagner M., Haug P., Ivanov O., Dowling J., & Robert T. Olszewski R.(2004). Classifying free-text triage chief complaints into syndromic categories with natural language processing. Artificial Intelligence in Medicine, 33,1-10.
Conway M., Dowling J., Chapman W.(2013). Using chief complaints for syndromic surveillance: A review of chief complaint based classifiers in North America, Journal of Biomedical Informatics, 46(2013),734–743.
Dara J., Dowling J., Travers D., Cooper G. & Chapman W.(2007). Evaluation of preprocessing techniques for chief complaint classification. Journal of Biomedical Informatics, 41(4),613-623.
Gorski J., Batt R., Otles E., Shah M., Hamedani A., & Patterson B.(2016). The Impact of Emergency Department Census on the Decision to Admit. Academic Emergency Medicine, 24(1),13-21.
Gravel J., Gouin S., Manzano S., Arsenault M., Amre D.(2008) Interrater Agreement between Nurses for the Pediatric Canadian Triage and Acuity Scale in a Tertiary Care Center. Academic Emergency Medicine, 15(12),1262-1267.
Han J., Kamber M., & Pei J.(2012). Data Mining Concepts and Techniques, pp. 94-96.
Jernite Y., Yoni Halpern Y., Horng S., & Sontag D.(2013). Predicting Chief Complaints at Triage Time in the Emergency Department. New York University, pp. 1-5.
Khalid Al-Johani, Hanadi Lamfon, Hassan Abed, Mohammed Beyari(2017).“Common Chief Complaints of Dental Patients at Umm Al-Qura University,Makkah City, Saudi Arabia”. OHDM, 16(3),1-4.
Lapointe & Legendre(1994). A Classification of Pure Malt Scotch Whiskies. Journal of the Royal Statistical Society, 43(1),237-257.
Lee S., Levin D., Finley P., & Heilig C.(2018). Chief Complaint Classification with Recurrent Neural Networks. arxiv.org, 2018(V2),1-14.
Lu, H. M., King, C. C., Wu, T. S., Shih, F. Y., Hsiao, J. Y., Zeng D., & Chen H.(2007). Chinese Chief Complaint Classification for Syndromic Surveillance. BioSurveillance, pp. 11-22.
Lu, H. M., Daniel Zeng D., Trujillo L., Komatsu K., & Chen H.(2008). Ontology-enhanced automatic chief complaint classification for syndromic surveillance. Journal of Biomedical Informatics, 41(2008),340–356.
Mike Conway M., Dowling J., & Chapman W.(2013). Using chief complaints for syndromic surveillance: A review of chief complaint based classifiers in North America. Journal of Biomedical Informatics, 46,734-743.
Sattar A., Sable K., Likourezos A., Fromm C., & John Marshall J. (2014) Does the Nature of Chief Complaint, Gender,or Age Affect Time to Be Seen in the Emergency Room. Open Journal of Emergency Medicine, 2,36-41.
Vilpert S., Monod S., Ruedin J., Maurer J., Trueb L., Yersin B., & and Büla C. (2018). Differences in triage category, priority level and hospitalization rate between young-old and old-old patients visiting the emergency department. BMC Health Services Research,18(456),1-9.
Travers D.& Haas S.(2003). Using nurses’nnatural language entries to build a concept-oriented terminology for patients’chief complaints in the emergency department. Journal of Biomedical Informatics, 36,260-270.