研究生: |
林郁綺 Lin, Yu-Chi |
---|---|
論文名稱: |
利用人工智慧技術偵測中文假新聞 Exploring Artificial Intelligence Technologies for Fake News Detection |
指導教授: | 曾元顯 |
學位類別: |
碩士 Master |
系所名稱: |
圖書資訊學研究所 Graduate Institute of Library and Information Studies |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 98 |
中文關鍵詞: | 假新聞偵測 、人工智慧 、假新聞語料 、知識推論 、文字生成 |
英文關鍵詞: | Fake News Detection, Artificial Intelligence, Fake News Corpus, Knowledge Inference, Writing Model |
DOI URL: | http://doi.org/10.6345/NTNU202100039 |
論文種類: | 學術論文 |
相關次數: | 點閱:821 下載:68 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在資訊快速傳播的時代,假新聞滿天飛的困境肆虐全世界,在資訊爆炸的時代如何使用資訊科技的技術快速過濾虛假的資訊是此研究想要探討的問題。
本研究為探討人類與電腦在中文假新聞偵測上的實際表現,分別以人類與電腦為出發點進行三個實驗,「自然語言模型辨別假新聞」根據臺灣假新聞平台「CoFacts 真的假的」建置中文假新聞語料,並包含知識推論標記,再使用Naïve Bayes、SVM以及BRET進行真假新聞預測;「人類辨別假新聞編寫模型」根據經濟日報語料,使用GPT2-Chinese生成假新聞,並請受試者辨別真假新聞;最後以「假新聞自動編寫模型評估」整合前兩項實驗,以分類器實測電腦是否能辨別出GPT2-Chinese自動生成的假新聞,並比較與受試者的差異,實驗結論如下:
1. BERT預測真假新聞MicroF1為0.8184,MacroF1為0.7686,顯示電腦在一定程度上能夠輔助人工辨別假新聞,但並非真正瞭解語意。
2. 受試者辨別GPT2-Chinese自動生成之假新聞,其真新聞平均可信度為3.68,假新聞為2.54,顯示閱讀者可以辨別真假,但不具有背景知識的受試者較難辨別,而新聞與受試者越相關越會提高轉發意願。
3. BERT預測問卷的30篇新聞,其MicroF1與MacroF1皆為0.93,僅2篇錯誤,而人類判斷錯誤為5篇,且判斷錯誤的新聞完全不重疊,顯示電腦可以辨別電腦所產生的假新聞,並且與人類有互補合作之處。
綜合而言,本研究的貢獻不僅建置了包含知識推論之假新聞語料庫,並進行分類器評測;且從反向思維實作了假新聞編寫模型之訓練,更以人類與電腦進行實測,奠定了未來假新聞研究之基石,期待日後能有更多研究者投入於此。
In the era of information explosion, fake news is raging over the world, how to use artificial intelligence technologies to distinguish fake news is this study exploring.
To evaluate humans and computers in the detection of fake news in Chinese, this study conducts two experiments from different opinions. In the first experiment "Evaluation of Detection of Fake News by Natural Language Processing", classify Taiwan fake news website "CoFacts" by BERT; in the second experiment "Human Distinguish Fake News Write Model", generate fake news by GPT2-Chinese and interview reader to distinguish, the conclusions are as follows:
1. The BERT highest prediction rate is 79%.
2. Computers can help identifying fake news, but not by semantics.
3. Readers identify fake news by GPT2-Chinese, the average credibility of true news is 3.68, and the average credibility of fake news is 2.54.
4. Readers can distinguish between true and fake news, readers without economic background are more difficult to distinguish.
This research not only builds and evaluates a fake news corpus with knowledge inference; also trains fake news writing models and evaluates, foundation for future fake news research. I'm looking forward to more researchers invest this in future.
Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2), 211-36.
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific american, 284(5), 28-37.
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019, May 24). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Retrieved from https://arxiv.org/abs/1810.04805v2
Fake News Challenge Stage 1 (FNC-I): Stance Detection. (n.d.). Retrieved from http://www.fakenewschallenge.org/
Goldfarb, C. F. (1990). The SGML handbook. Oxford University Press.
Hanselowski, A., PVS, A., Schiller, B., Caspelherr, F., Chaudhuri, D., Meyer, C. M., & Gurevych, I. (2018). A retrospective analysis of the fake news challenge stance detection task. arXiv preprint arXiv:1806.05180.
Jruvika. (2017, December 07). Fake News detection. Retrieved from https://www.kaggle.com/jruvika/fake-news-detection
Meng, L. (2019, September 7). 直觀理解 GPT-2 語言模型並生成金庸武俠小說. Retrieved from https://leemeng.tw/gpt2-language-model-generate-chinese-jing-yong-novels.html
Monti, F., Frasca, F., Eynard, D., Mannion, D., & Bronstein, M. M. (2019). Fake News Detection on Social Media using Geometric Deep Learning. arXiv preprint arXiv:1902.06673.
Morizeyao. (2019, December 09). Morizeyao/GPT2-Chinese. Retrieved from https://github.com/Morizeyao/GPT2-Chinese
MrOrz. (n.d.). Cofacts 真的假的 - 協作型事實查核系統. Retrieved from https://cofacts.g0v.tw/
MyGoPen. (n.d.). Retrieved from https://www.mygopen.com/
Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217-250.
Noy, N., & McGuinness, D. L. (2001). Ontology development 101 A guide to creating your first ontology. Retrieved from http://protege.stanford.edu/publications/ontology_development/onyology101.pdf
Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. (2018, November 02). Retrieved from https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html
Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017, September). Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2931-2937).
Riedel, B., Augenstein, I., Spithourakis, G. P., & Riedel, S. (2017). A simple but tough-to-beat baseline for the Fake News Challenge stance detection task. arXiv preprint arXiv:1707.03264.
Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22-36.
Sowa, J. F. (1995). Top-level ontological categories. International journal of human-computer studies, 43(5-6), 669-685.
WSDM - Fake News Classification. (n.d.). Retrieved from https://www.kaggle.com/c/fake-news-pair-classification-challenge
Yang, S., Shu, K., Wang, S., Gu, R., Wu, F., & Liu, H. (2019). Unsupervised fake news detection on social media: A generative approach. In Proceedings of 33rd AAAI Conference on Artificial Intelligence.
行政院全球資訊網. (n.d.). Retrieved from https://www.ey.gov.tw/Page/5519E969E8931E4E
何志青,2009,〈推論證成與遵循規則〉,《國立臺灣大學哲學論評》,35 期,頁 63-90。
常敬宇(1993)。漢語詞彙與文化。北京大學出版社。
張雅晶(2013)。中阿網路新聞的語言與文化研究。國立臺灣師範大學華語文教學系碩士論文,台北市。 取自https://hdl.handle.net/11296/zfhy9a
經濟日報. (n.d.). Retrieved from https://money.udn.com/
蘭姆酒吐司|真相與謠言的水乳交融. (n.d.). Retrieved from https://www.rumtoast.com/