研究生: |
張悅倫 Chang, Yueh-Lun |
---|---|
論文名稱: |
文字生成技術應用於學術論文寫作之評估─以人工智慧領域論文摘要為例 Evaluation of Automated Text Generation Applied to Academ-ic Writing - A Case Study of Abstracts of Papers in the Field of Artificial Intelligence |
指導教授: |
曾元顯
Tseng, Yuen-Hsien |
口試委員: |
李龍豪
Lee, Lung-Hao 吳怡瑾 Wu, I-Chin 曾元顯 Tseng, Yuen-Hsien |
口試日期: | 2022/06/22 |
學位類別: |
碩士 Master |
系所名稱: |
圖書資訊學研究所 Graduate Institute of Library and Information Studies |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 116 |
中文關鍵詞: | 人工智慧 、深度學習 、自然語言生成 、文字生成 、學術論文寫作 |
英文關鍵詞: | Artificial Intelligence, Deep Learning, Natural Language Generation, Text Generation, Academic Writing |
DOI URL: | http://doi.org/10.6345/NTNU202201451 |
論文種類: | 學術論文 |
相關次數: | 點閱:241 下載:50 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
文字生成技術的應用在近年愈臻成熟,其對學術產出過程的影響更是不容小覷。為初步瞭解此技術對學術研究發表的影響,並探索人類與電腦能否辨別電腦生成或人類撰寫之學術文章,本研究運用既有的開放資源,以人工智慧領域之論文摘要為範圍,進行了「人類評估電腦生成摘要」及「摘要生成模型自動化評估」兩實驗。
實驗一依據ACL Anthology和arXiv(cs.AI)語料,以語言模型GPT-2生成論文摘要,再就英文文法檢查工具Grammarly和受試者對其之評估情形進行分析。實驗二則藉由分類器,實測電腦能否辨別出電腦生成之摘要,再與受試者的評估結果進行比較。研究結論如下:
1. 電腦能生成仿真度高的摘要,並在Grammarly的評估指標表現較人類撰寫摘要佳。
2. 受試者對於電腦生成摘要之平均良窳度給分為3.617,而人類撰寫摘要則為3.622,顯示人類在不知道有電腦參與生成的前提下,無法明顯地辨別出一篇摘要為電腦生成或人類撰寫。
3. 以SciBERT預測30篇摘要之Micro和Macro f1皆為0.93,較受試者的0.53及0.44高上許多,顯示電腦具辨別電腦生成摘要之能力。同時,由於在SciBERT預測錯誤的2篇摘要中,有1篇在人類預測中為正確,推論電腦與人類或許能在辨別上相互輔助。
The application of text generation has become well developed in recent years and is having a growing impact on the process of academic production. In order to explore its influences of it on academic publications and whether humans and computers can distinguish the differences between computer-generated and human-written academic articles. This study used abstracts in the field of artificial intelligence to conduct two experiments.
In the first experiment, we generated abstracts by GPT-2, which were fine-tuned with corpora from ACL Anthology and arXiv. Then we analyzed and evaluated the abstracts by both Grammarly and humans. In the second experiment, we used classifiers to test whether the computer could identify the computer-generated abstracts. Finally, we compared them with the human evaluation results. The conclusions are as follows:
1. The computer can generate high-quality abstracts.
2. The mean score of computer-generated abstracts was 3.617, while the mean score of human-written abstracts was 3.622, indicating that humans could not distinguish whether an abstract was computer-generated or human-written.
3. The Micro and Macro f1 of the abstract prediction by SciBERT were both 0.93, which are higher than the prediction by humans (0.53 and 0.44), indicating that the computer has the ability to discriminate computer-generated abstracts.
李興昌(2019)。科技論文的規範表達:寫作與編輯。台北:崧燁出版。
莊道名(1995年12月)。圖書館學與資訊科學大辭典-摘要標準【國家教育研究院雙語詞彙、學術名詞暨辭書資訊網】。取自:http://terms.naer.edu.tw/detail/1680297/
曾元顯(2012年10月)。圖書館學與資訊科學大辭典-F度量【國家教育研究院雙語詞彙、學術名詞暨辭書資訊網】。取自:http://terms.naer.edu.tw/detail/1679003/
曾元顯、林郁綺(2021)。電腦生成的新聞有多真?─文字自動生成技術運用於經濟新聞的評估。圖書資訊學刊,19(1),43-65。
鍾靜美、羅姿玉、陳盈年、鄭淑曼、何舒涵、葉綾、鍾瑞珈(2015)。英文期刊論文摘要分析。明新學報,41(1),49-64。
Abd-Elaal, E. S., Gamage, S. H., & Mills, J. E. (2019). Artificial intelligence is a tool for cheating academic integrity. In 30th Annual Conference for the Australasian Association for Engineering Education (AAEE 2019): Educators Becoming Agents of Change: Innovate, Integrate, Motivate (p. 397). Engineers Australia.
Amancio, D. R. (2015). Comparing the writing style of real and artificial papers. arXiv preprint arXiv:1506.05702.
Asli, Celikyilmaz & Clark, Elizabeth & Gao, Jianfeng. (2020). Evaluation of Text Generation: A Survey.
Ball, P. (2005). Computer conference welcomes gobbledegook paper. Nature, 434(7036), 946-947.
Bartoli, Alberto & Medvet, Eric. (2020). Exploring the Potential of GPT-2 for Generating Fake Reviews of Research Papers. Fuzzy Systems and Data Mining VI: Proceedings of FSDM 2020, 331, 390.
Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.
Bird, S., Dale, R., Dorr, B. J., Gibson, B., Joseph, M. T., Kan, M-Y., Lee, D., Powley, B., Radev, D. R., & Tan, Y. F. (2008). The ACL Anthology reference corpus: A reference dataset for bibliographic research in computational linguistics. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, & D. Tapias (Eds.), Proceedings of the Sixth International Conference on Language Resources and Evaluation (pp. 1755-1759). European Language Resources Association (ELRA).
Celikyilmaz, A., Clark, E., & Gao, J. (2020). Evaluation of text generation: A survey. arXiv preprint arXiv:2006.14799.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
Dehouche, N. (2021). Plagiarism in the age of massive Generative Pre-trained Transformers (GPT-3). Ethics in Science and Environmental Politics, 21, 17-23.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv. Retrieved from https://arxiv.org/pdf/1810.04805.pdf
Francke, E., Alexander, B. (2019). The potential influence of artificial intelligence on plagiarism a higher education perspective. In: Griffiths P, Kabir MN (eds.) Proc European Conference on the Impact of Artificial Intelligence and Robotics. EM Normandie Business School, Oxford, 131–140.
Gildea, D., Kan, M. Y., Madnani, N., Teichmann, C., & Villalba, M. (2018). The ACL anthology: Current state and future directions. Proceedings of Workshop for NLP Open Source Software (NLP-OSS), 23-28.
Grace, K., Salvatier, J., Dafoe, A., Zhang, B., & Evans, O. (2018). When will AI exceed human performance? Evidence from AI experts. Journal of Artificial Intelligence Research, 62, 729-754.
Graves, A. (2013). Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850.
He, X., Tang, J., Tan, Z., Yu, Z., & Zhao, X. (2020). Paperant: Key Elements Generation with New Ideas. In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data (pp. 294-308). Springer, Cham.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Kelly-Bootle, S. (2005). Call That Gibberish. Queue, 3(6), 64-ff.
Kobis, N., Mossink, L. D. (2021). Artificial Intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Comp Human Behav, 114, 106553.
Kot, M., & Biłas, M. (2013). Gobbledygook, or not. Responsibility: An Interdisciplinary Perspective, Wydawnictwo Matrix, 281, 292.
Labbé, C. (2010). Ike Antkare one of the great stars in the scientific firmament (Doctoral dissertation, LIG).
Labbé, C., & Labbé, D. (2013). Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science?. Scientometrics, 94(1), 379-396.
Labbé, C., Labbé, D., & Portet, F. (2016). Detection of computer-generated papers in scientific literature. In Creativity and universality in language (pp. 123-141). Springer, Cham.
Laplante, P. A. (2018). Technical Writing: A Practical Guide for Engineers, Scientists, and Nontechnical Professionals. CRC Press.
Lavoie, A., & Krishnamoorthy, M. (2010). Algorithmic detection of computer generated text. arXiv preprint arXiv:1008.0706.
LeCun, Y., Bengio, Y. & Hinton, G. (2015). Deep learning. Nature 521, 436–444. https://doi.org/10.1038/nature14539
Luu, Kelvin & Koncel-Kedziorski, Rik & Lo, Kyle & Cachola, Isabel & Smith, Noah. (2020). Citation Text Generation.
McKeown, K. (1992). Text generation - using discourse strategies and focus constraints to generate natural language text. Studies in natural language processing.
Meng, L. (2019, July 10). 進擊的BERT:NLP界的巨人之力與遷移學習. Retrieved from https://leemeng.tw/attack_on_bert_transfer_learning_in_nlp.html
Nguyen, M. T., & Labbé, C. (2016, March). Engineering a tool to detect automatically generated papers. In BIR 2016 Bibliometric-enhanced Information Retrieval.
Nguyen-Son, H.-Q., Tieu, N.-D. T., Nguyen, H. H., Yamagishi, J., & Zen, I. E. (2017). Identifying computer-generated text using statistical analysis. Paper presented at the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Nikiforovskaya, A., Kapralov, N., Vlasova, A., Shpynov, O., & Shpilman, A. (2020, December). Automatic generation of reviews of scientific papers. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), 314-319. IEEE.
Radev, D. R., Muthukrishnan, P., Qazvinian, V., & Abu-Jbara, A. (2013). The ACL anthology network corpus. Language Resources and Evaluation, 47(4), 919-944.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners.
Reiter, E., & Dale, R. (1997). Building applied natural language generation systems. Natural Language Engineering, 3(1), 57-87.
Reiter, E., & Dale, R. (2000). Building natural language generation systems. Cambridge University press.
Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A primer in bertology: What we know about how bert works. Transactions of the Association for Computational Linguistics, 8, 842-866.
Stede, M., & Umbach, C. (1998). DiMLex: A lexicon of discourse markers for text generation and understanding. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, (2), 1238-1242.
Stribling, J., Aguayo, D., & Krohn, M. (2005). Rooter: A methodology for the typical unification of access points and redundancy. Journal of Irreproducible Results, 49(3), 5.
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215.
Tien, N. M., & Labbé, C. (2015). SciDetectTM Documentation.
Van Noorden, R. (2014). Publishers withdraw more than 120 gibberish papers. Nature, 24.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
Wang, H. C., Hsiao, W. C., & Chang, S. H. (2020). Automatic paper writing based on a RNN and the TextRank algorithm. Applied Soft Computing, 97, 106767.
Wang, Q., Huang, L., Jiang, Z., Knight, K., Ji, H., Bansal, M., & Luan, Y. (2019). Paperrobot: Incremental draft generation of scientific ideas. arXiv preprint arXiv:1905.07870.
Wang, Q., Xiong, Y., Zhang, Y., Zhang, J., & Zhu, Y. (2021). AutoCite: Multi-Modal Representation Fusion for Contextual Citation Generation. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 788-796.
Wang, Q., Zhou, Z., Huang, L., Whitehead, S., Zhang, B., Ji, H., & Knight, K. (2018). Paper abstract writing through editing mechanism. arXiv preprint arXiv:1805.06064.
Wei, C. H., Kao, H. Y., & Lu, Z. (2013). PubTator: a web-based text mining tool for assisting biocuration. Nucleic acids research, 41(1), 518-522.
Xing, X., Fan, X., & Wan, X. (2020). Automatic generation of citation texts in scholarly papers: A pilot study. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , 6181-6190.
Yao, L., Peng, N., Weischedel, R., Knight, K., Zhao, D., & Yan, R. (2019). Plan-and-write: Towards better automatic storytelling. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1), 7378-7385.