研究生: |
曾偉紘 Tseng, Wei-Hung |
---|---|
論文名稱: |
健康心理因素文本自動分類之研究 Research on Automated Text Classification of Health Psychological Factors |
指導教授: |
謝建成
Shieh, Jiann-Cherng |
口試委員: |
朱延平
Zhu, Yan-Ping 柯皓仁 Ke, Hao-Ren 李正吉 Lee, Cheng-Chi 謝建成 Shieh, Jiann-Cherng |
口試日期: | 2024/01/08 |
學位類別: |
碩士 Master |
系所名稱: |
圖書資訊學研究所 Graduate Institute of Library and Information Studies |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 47 |
中文關鍵詞: | 機器學習 、深度學習 、少樣本微調 、自動分類 |
英文關鍵詞: | Machine Learning, Deep Learning, Few-shot Fine-tuning, Automatic Classification |
研究方法: | 內容分析法 |
DOI URL: | http://doi.org/10.6345/NTNU202400273 |
論文種類: | 學術論文 |
相關次數: | 點閱:115 下載:11 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
心理學的研究對象通常非常複雜,需要長期追蹤和研究。傳統的研究方法需要人工標記和評分,這不僅費時費力,還容易出現主觀性和一致性問題。目前大多數研究透過社群平台來找到研究對象。因此本研究希望透過社群平台找到研究資料,並利用機器以自動化的方式更有效的進行心理學研究。
本研究要將心理健康方面的文本用人工智慧的技術,將其自動分類到5個面向中的11個指標,每個指標都有5個分數,並且期望在有限的人工標記的訓練資料下(每個類至少60筆資料),機器預測的準確度要能達到0.8以上(人工標記一致性平均分數為0.8011),以Macro F1為主要判斷標準。
使用的技術包括機器學習、BERT、SetFit、GPT-3、GPT-4。就本研究的結果而言,機器學習與BERT雖然執行的時間成本低,但成效在各指標都無法達到理想的0.8。GPT-4也許因為是使用prompt的方式進行實驗,要它處理的任務太過於複雜,準確度無法像用訓練的方式來的好,所以也都沒有達到目標。GPT-3與SetFit的成效在多數指標上都有不錯的表現,GPT-3有5個指標達到目標,SetFit更是有7個指標達到目標,兩個指標只差1到2個百分點達到目標。
考量到GPT-3的執行時間成本很重(主要是1次request只能預測1筆資料),而SetFit只有訓練時間成本重而已,預測的速度是非常快速的,所以選用SetFit用於心理健康文本的自動分類是一個準確度高、預測時間成本低的方法。
Subjects of psychological studies are often intricate, requiring long-term tracking and research. Traditional research methods involve manual labeling and scoring, which is not only time-consuming but also prone to subjectivity and consistency issues. Presently, most studies leverage social media platforms to find subjects for research. Therefore, this study aims to acquire research data through social media platforms and utilize machine-driven automation for more efficient psychological research.
This research aims to employ artificial intelligence techniques in the domain of mental health to automatically categorize text into 11 indicators across 5 aspects, with each indicator having 5 scores. The objective is to achieve a machine prediction accuracy of 0.8 or higher with limited manually labeled data (approximately 60 data points per class), with human-labeled consistency averaging a score of 0.8011, using Macro F1 as the primary evaluation metric.
Techniques employed include machine learning, BERT, SetFit, GPT-3, and GPT-4. In terms of the results of this study, while machine learning and BERT incur low execution time costs, their effectiveness falls short of the desired 0.8 across all indicators. GPT-4, perhaps due to the complexity of the tasks it is prompted to handle, fails to achieve the desired accuracy, as it cannot perform as effectively as in a trained mode. GPT-3 and SetFit show promising performance across multiple indicators, with GPT-3 reaching the target in 5 indicators and SetFit achieving the target in 7 indicators, with only a 1 to 2 percentage point difference in two indicators to reach the target.
Considering the significant execution time cost of GPT-3 (mainly due to predicting one data point per request) and the primary training time cost of SetFit, the prediction speed of SetFit is considerably rapid. Therefore, choosing SetFit for automatic classification of psychological text proves to be a high-accuracy, low prediction time cost method.
Ackermann, T. J. (2020). GPT-3: a robot wrote this entire article. Are you scared yet, human?. Artificial Intelligence: ANI, LogicGate Computing, AGI, ASI.
Alshahrani, A., Ghaffari, M., Amirizirtol, K., & Liu, X. (2020, July). Identifying optimism and pessimism in twitter messages using xlnet and deep consensus. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
Anguita, D., Ghelardoni, L., Ghio, A., Oneto, L., & Ridella, S. (2012, April). The'K'in K-fold Cross Validation. In ESANN (pp. 441-446).
Burkov, A. (2019). The Hundred-Page Machine Learning Book. Self-published.
Breiman, L. (1996). Bagging predictors. Machine learning, 24, 123-140.
Breiman, L. (1996). Bias, variance, and arcing classifiers.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20, 273-297.
Chao, H. J., Lien, Y. J., Kao, Y. C., Tasi, I. C., Lin, H. S., & Lien, Y. Y. (2020). Mental Health Literacy in Healthcare Students: An Expansion of the Mental Health Literacy Scale. Int J Environ Res Public Health, 17(3). doi:10.3390/ijerph17030948.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Galderisi, S., Heinz, A., Kastrup, M., Beezhold, J., & Sartorius, N. (2015). Toward a new definition of mental health. World psychiatry, 14(2), 231.
Giachanou, A., & Crestani, F. (2016). Like It or Not: A Survey of Twitter Sentiment Analysis Methods. ACM Computing Surveys, 49(2), Article No. 28, 1–41. https://doi.org/10.1145/2938640
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Hern, A., & Bhuiyan, J. (2023). OpenAI says New Model GPT-4 is more creative and less likely to invent facts. The Guardian.
Hugging Face. (2022). SetFit: Efficient Few-Shot Learning Without Prompts. Retrieved November 6, 2023, from https://huggingface.co/blog/setfit.
K. N. P. Kumar and M. L. Gavrilova. (2019). Personality Traits Classification on Twitter. 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 2019, pp. 1-8. doi: 10.1109/AVSS.2019.8909839.
Kearns, M. (1988). Thoughts on hypothesis boosting. Unpublished manuscript, 45, 105.
Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM (JACM), 41(1), 67-95.
Kumar, K. P., & Gavrilova, M. L. (2019, September). Personality traits classification on twitter. In 2019 16th ieee international conference on advanced video and signal based surveillance (avss) (pp. 1-8). IEEE.
Lien, Y. J., Chen, L., Cai, J., Wang, Y. H., & Liu, Y. Y. (2023). The power of knowledge: How mental health literacy can overcome barriers to seeking help. American Journal of Orthopsychiatry.
Lien, Y. J., & Chen, L. (2023). Validation of a theoretically based mental health literacy framework: A meta-analytic structural equation modeling approach. European Psychiatry, 66(Suppl 1), S480.
Lu, C. M., Lien, Y. J., Chao, H. J., Lin, H. S., & Tsai, I. C. (2021). A Structural Equation Modeling of Mental Health Literacy in Healthcare Students. Int J Environ Res Public Health, 18(24). doi:10.3390/ijerph182413264.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35.
Mathur, A., Kubde, P., & Vaidya, S. (2020, June). Emotional analysis using twitter data during pandemic situation: Covid-19. In 2020 5th international conference on communication and electronics systems (ICCES) (pp. 845-848). IEEE.
Nayak, P. (2019). Understanding searches better than ever before. Google.
OpenAI. (2023). GPT-4. OpenAI Research. Retreved November 7, 2023, from https://web.archive.org/web/20230314174531/https://openai.com/research/gpt-4
OpenAI. (n.d.). Models. Retreved November 8, 2023, from https://platform.openai.com/docs/models
Ray, Tiernan. (2020). OpenAI’s gigantic GPT-3 hints at the limits of language models for AI. Retrieved November 7, 2023, from https://www.zdnet.com/article/openais-gigantic-gpt-3-hints-at-the-limits-of-language-models-for-ai/.
Rish, I. (2001, August). An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence (Vol. 3, No. 22, pp. 41-46).
Rogers, A., Kovaleva, O., & Rumshisky, A. (2021). A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8, 842-866.
Saha, K., Torous, J., Ernala, S. K., Rizuto, C., Stafford, A., & De Choudhury, M. (2019). A computational study of mental health awareness campaigns on social media. Translational behavioral medicine, 9(6), 1197-1207.
Tam, S., Said, R. B., & Tanriöver, Ö. Ö. (2021). A ConvBiLSTM deep learning model-based approach for Twitter sentiment classification. IEEE Access, 9, 41283-41293.
Towards Data Science. (n.d.). What is K-fold Cross Validation?. Retrieved April 27, 2023, from https://towardsdatascience.com/what-is-k-fold-cross-validation-5a7bb241d82f.
Tunstall, L., Reimers, N., Jo, U. E. S., Bates, L., Korat, D., Wasserblat, M., & Pereg, O. (2022). Efficient few-shot learning without prompts. arXiv preprint arXiv:2209.11055.
Twitter. (n.d.). What’s new with Twitter API v2?. Retrieved April 27, 2023, from https://developer.twitter.com/en/docs/twitter-api/migrate/whats-new.
Wiggers, Kyle. (2023). OpenAI releases GPT-4, a multimodal AI that it claims is state-of-the-art. TechCrunch. Retrieved November 7, 2023, from https://techcrunch.com/2023/03/14/openai-releases-gpt-4-ai-that-it-claims-is-state-of-the-art/.
Yin-Ju Lien. (2023). 結合多重研究策略驗證具理論基礎之心理健康素養架構. Retrieved April 27, 2023, from https://scholar.lib.ntnu.edu.tw/zh/projects/%E7%B5%90%E5%90%88%E5%A4%9A%E9%87%8D%E7%A0%94%E7%A9%B6%E7%AD%96%E7%95%A5%E9%A9%97%E8%AD%89%E5%85%B7%E7%90%86%E8%AB%96%E5%9F%BA%E7%A4%8E%E4%B9%8B%E5%BF%83%E7%90%86%E5%81%A5%E5%BA%B7%E7%B4%A0%E9%A4%8A%E6%9E%B6%E6%A7%8B-3.
Yuan, G. X., Ho, C. H., & Lin, C. J. (2012). Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9), 2584-2603.
Yuen-Hsien Tseng. (2020). The Feasibility of Automated Topic Analysis: An Empirical Evaluation of Deep Learning Techniques Applied to Skew-Distributed Chinese Text Classification. Journal of Educational Media & Library Sciences, Vol. 57, No. 1, pp. 121-144 (March 2020).
Zhou, Z. H. (2012). Ensemble methods: foundations and algorithms. CRC press.