研究生: |
李奕璇 Lee, Yi-Hsuan |
---|---|
論文名稱: |
摘要能力量尺之建置及摘要自動化批改系統之建置與效能評估 Constructing a Summarization Ability Scale for Designing and Evaluating an Automatic Scoring System for Text Summarization |
指導教授: |
宋曜廷
Sung, Yao-Ting |
口試委員: | 陳柏琳 陳柏熹 陳冠宇 趙子揚 宋曜廷 |
口試日期: | 2021/09/08 |
學位類別: |
博士 Doctor |
系所名稱: |
教育心理與輔導學系 Department of Educational Psychology and Counseling |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 113 |
中文關鍵詞: | 摘要 、自動化摘要 、自動化批改 、試題反應理論 、段落向量 、潛在語意分析 、變換器之雙向編碼器 |
英文關鍵詞: | summarization, automatic summarization, automatic summary scoring, item response theory, paragraph embedding, Latent Semantic Analysis, Bidirectional Encoder Representations from Transformers |
研究方法: | 準實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202101643 |
論文種類: | 學術論文 |
相關次數: | 點閱:282 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
國內近年來在十二年國民基本教育課程綱要(簡稱12年國教課綱)的推動下,更加重視素養的養成。當中受到最多關注的便是閱讀理解這項跨領域素養,隨之而起的則是關於閱讀教學、閱讀策略的討論。許多教師嘗試將閱讀理解的概念融入於教學中,亦時常要求學生進行各種閱讀任務(task),其中撰寫摘要被視為最能代表讀者是否獲知閱讀文本內容的方法,亦常被用作閱讀理解的檢核。然而,在實務上摘要評分工具的研發卻相當缺乏,且具有標準不一、測驗結果無法相互比較等問題。有鑑於此,本研究擬建構一套可應用於廣泛對象的摘要評分規準,調查學生的摘要能力發展,並透過試題反應理論(item response theory, IRT)建構摘要能力量尺,提供參照標準,使教師可有效地掌握學生的程度。更重要的是,為呼應閱讀教學之需求,本研究擬探討自動化摘要批改應用於讀後評量的可行性。
本文依研究主體劃分為二,研究一的重點為,透過收集實徵資料,瞭解學生的摘要能力發展情形,並研發摘要評分規準,使教師在評估學生的摘要能力時有所依歸。而在研究過程中,專家批改摘要的結果,亦為研究二自動摘要評分的檢驗效標。研究一精選四份不同難度的文章作為測驗文本,要求受試者在進行閱讀後,透過撰寫摘要,重述文章的重要意涵。研究一的受試對象包含二至九年級學生,共2,003名。考量學生就讀年級的差異,受試者所閱讀的文章由研究者依難度進行指派,每位學生撰寫一至兩篇摘要,總計收集2,591篇摘要。所有摘要皆依本研究所建置的評分規準,透過四大向度(完整度、關鍵訊息、濃縮整合、以及遣詞用字)進行批改,綜合評估學生的摘要能力。批改者皆為本研究所招募的資深教師(本文稱專家批改者)。經由斯皮爾曼等級相關(Spearman’s rank correlation)分析每篇文本的兩個初閱分數,可發現評分者間具有高度的給分一致性,評分者間相關至少達 .85以上,評分品質穩定。
除此之外,由於研究中的部分學生針對不同測驗文本,同時撰寫兩篇摘要,故所有測驗文本的批改結果可藉由共同人的設計進行等化,再藉由IRT分析,連結所有年級的能力表現,量尺化學生的摘要能力發展結果。分析結果顯示,與學生的摘要原始得分具有相同的趨勢,各年級學生的平均能力值皆隨年級遞增。相關結果不僅代表教師評分的有效性以外,亦可透過各年級的平均能力值建構摘要能力量尺,提供摘要能力定位的參考標準。
而研究二著重自動化摘要批改模型的建立以及其效能之探討。本文利用機器學習(machine learning),以段落向量、潛在語意分析(Latent Semantic Analysis, LSA)、變換器之雙向編碼器表示(Bidirectional Encoder Representations from Transformers, BERT)等三種技術,結合密度尖峰分群法(density peaks clustering),生成電腦摘要。再依本研究建構的自動摘要評分模組,透過將學生摘要與電腦摘要相互比較的方式,評估學生摘要品質。為貼近教學實務需求,本研究之評分模組係依研究一之評分規準建置而成。擷取評分規準中屬於閱讀理解範疇的三大向度(完整度、關鍵訊息、濃縮整合),分別以學生摘要中納含主題的數量比率、學生摘要中關鍵詞彙的數量比率、和學生摘要與電腦摘要的語意相似性,等三個層面表徵學生摘要在完整度、關鍵訊息、濃縮整合的表現情形。
在效能檢核上,本文分為兩個層面進行探討。第一部分為自動摘要生成的效果,本研究分別利用「召回率導向摘要評估」(Recall-Oriented Understudy for Gisting Evaluation, ROUGE)、概念詞重覆率、主題涵蓋率,檢核三項電腦技術所節錄的自動摘要是否足以代表原始本文。其結果發現,段落向量與LSA的自動摘要品質良好,且兩者效能在伯仲之間,BERT的成效則相對較差。而在本文的另一個探討重點,摘要自動評分的效能上,本研究藉由專家人工評分的結果與三個評分模型各別評估的摘要品質結果,進行相關性分析與準確率統計,比較三者之間哪一個模型與專家評分的相關性或是準確率最高,便代表其效能最好。經由斯皮爾曼等級相關分析顯示,三個評分模型在總分的相關係數介於 .61至 .68之間,接近高相關,在個別向度的相關上也至少有 .46以上的水準,且所有的相關係數皆達顯著水準,代表不管是哪一個評分模組的自動評分結果皆與專家評分的趨勢相近,具有良好的代表性。在準確率統計方面,三者的成效亦相當優良,鄰近準確率至少皆達8成以上,三者差異不大。而在穩定性上,則以LSA的表現最好。
另一方面,本文亦導入專家評分者所整理的節錄式(extractive)摘要,同樣透過三個面向的評分模組,評估學生摘要品質並進行準確率統計。透過此方式,不僅可以得知哪一個模型的效能較好,更能進一步瞭解三個自動摘要評分模型的效能有多好。而相關結果顯示,縱使將電腦摘要替換為專家摘要作為比較基準,其自動評分的準確率並無明顯的差異,表示本研究所採用的電腦自動摘要技術良好,效能與專家摘要相近。
相較於現有摘要能力評量,本文研究最大的優勢為,透過研究一蒐集跨學習階段的學生摘要,確立評分規準的有效性以外,更將學生的摘要能力表現建構於同一量尺之上,可供長期追蹤學生的摘要發展情形之用。此外,亦突破傳統做法,結合書籍難度,準確評估學生摘要能力。另一方面,在研究二的部分,以往資訊技術研發的重點大多聚焦於如何有效地生成電腦化摘要,鮮有針對中文自動摘要批改的研究。少數以電腦自動化摘要批改為號召的系統,又多僅以語意相似性評估摘要品質,忽略了摘要能力其他成分的重要性。而本文將自動化摘要技術附加電腦評分模組進行整合,可呼應摘要實務教學所重視的完整性、關鍵訊息、濃縮整合等層面的細項摘要技能;而藉由與專家人工批改結果進行跨域連結、比較,本研究進一步探究不同模型應用於自動化摘要批改的效能,此作法可望為相關領域的研發提供寶貴的實徵證據。
Summarization is a key component of reading literacy. In recognizing the importance of assessing students’ summary abilities, this present research project aimed to improve current summary assessment tools. This paper is divided into two studies. For the first study, the goal was to construct summarization scoring rubrics that could be used for wide range of students and text types, and to investigate students’ summarizing ability. Four reading texts at different difficulty levels were selected as the research materials in this study. 2,003 students from second grade to ninth grade participated in this study. They were assigned to read one or two texts and restated the main ideas of the texts. The difficulties of the assigned texts were corresponded to the grades of the students. 2,591 summary articles were collected, and these articles were then graded by teachers recruited by this study.
The scoring rubrics developed by the present study included four dimensions: completeness, key information, integration, as well as wording and phrasing. Each summary article was graded by two teachers who assessed the qualities of the summaries by these four dimensions. The results of Spearman’s rank correlation analysis on the two scores of the articles were .85 or higher, showing that interrater reliability was high, and the rating quality was consistent and stable. Some participants were assigned to write two pieces of summary, so the rating results could be linked by common persons. After analyzed by item response theory, students’ abilities of summarization from second grade to ninth grade were scaled. From the analysis of multidimensional random coefficients multinomial logit model (MRCMLM), the results indicated that the students’ abilities (theta) increased as the grades increased, which was at the same trend as the original scores illustrated. The summarization ability scale and the average theta of each grade provided standards for students to understand how well their summarization skills developed when comparing to their peers.
The second study of this research project focused on designing automatic summary scoring models and evaluating the effectiveness of these models. This study incorporated different techniques of machine learning, including paragraph embedding, Latent Semantic Analysis (LSA), and Bidirectional Encoder Representations from Transformers (BERT) to combine density peaks clustering for generating automatic summarization texts (computer summary). Furthermore, this study also designed automatic scoring models corresponding to the dimensions of “completeness,” “key information,” and “integration.” Using the automatic scoring models to compare students’ written summaries and computer summaries, scores of each written summary were given. This study then examined the automatic rating results to the scores rated by the teachers in the study 1.
To evaluate the performance of the automatic summarization scoring, this study first investigated the qualities of computer summaries. Three types of indices were implemented: Recall-Oriented Understudy for Gisting Evaluation (ROUGE), ratio of concept words using, and coverage of themes. The results showed that the computer summaries generated by paragraph embedding and LSA had better qualities, and the efficacy between the two were similar, while the performance of BERT was relatively poor.
As for the other important aspect of the study two, this paper examined the performance of the automatic scoring models. Each of the written summary had four scores: one graded by experts (teachers in study one), and three rated by the different automatic scoring models. By comparing the experts’ rating with each automatic score, Spearman’s rank correlation analysis showed that the correlation coefficients of the total scores on the summaries were among .61 to .68, which was close to high correlation, and all correlation coefficients were at significant level. This indicated that rating results from three different automatic scoring models all had the similar tendency as the experts’ rating did, and the automatic scoring results had good representation. In terms of accuracy, all models performed well by reaching the adjacent accuracy higher than 80%. Among all, the LSA model had a better stability.
Different from the previous assessments of summarization, the present research project did not only construct valid scoring rubrics for assessing multiple dimensions of summary abilities, but it also established a summary scale for students across wide range of grades. Additionally, the effectiveness of the automatic summary scoring models proposed by this project was also verified. Combined the findings of these two studies, this paper provided solid evidences and revolutionary solutions to assess and track students’ summarization abilities in various contexts for a long term.
丁偉民(2005):文章摘要寫作評量系統(未發表)。國立臺灣師範大學資訊教育研究所碩士論文。
王木榮、董宜俐(2006):國小學童中文閱讀理解測驗。心理。
方金雅、鍾易達、邱上真(1998):國小學童閱讀摘要能力評定規範之發展。載於吳鐵雄(主編),國小教學評量的反量與前瞻研討會論文集(123–137)。國立臺南師範學院。
李垚暾(2011):使用模糊理論於中文寫作自動評分之新方法(未發表)。國立高雄應用科技大學資訊工程系碩士論文。
邱上真、洪碧霞(1999):中文閱讀成分與歷程模式之建立及其在實務上的應用: 評量與診斷、課程與教材、學習與教學-國語基本能力檢定診斷與協助系統之發展。行政院國家科學委員會專題研究成果報告(編號:NSC88-2614-H-017-004-F18)引自網站:https://www.grb.gov.tw/search/planDetail?id=435895
邱嘉慧(2011):使用摘要撰寫科技於英語閱讀實習課。行政院國家科學委員會專題研究成果報告(編號:NSC 99-2410-H-029-053)引自網站:https://www.grb.gov.tw/search/planDetail?id=2105884
林方均(2014):文章結構與摘要教學對高一學生補救課程設計之研究。載於國立臺南大學(主編),2014提升中小學補救教學成效之理論與實務研討論壇(253–273)。引自網站:https://priori.moe.gov.tw/download/103/B2-2-%E6%96%87%E7%AB%A0%E7%B5%90%E6%A7%8B%E8%88%87%E6%91%98%E8%A6%81%E6%95%99%E5%AD%B8%E5%B0%8D%E9%AB%98%E4%B8%80%E5%AD%B8%E7%94%9F%E8%A3%9C%E6%95%91%E8%AA%B2%E7%A8%8B%E8%A8%AD%E8%A8%88%E4%B9%8B%E7%A0%94%E7%A9%B6.pdf
林寶貴、錡寶香(2000):中文閱讀理解測驗之編製。特殊教育學刊,19,79–104。
林秀貞(1997):國小六年級學童社會科閱讀理解研究(未發表)。國立高雄師範大學教育學系碩士論文。
柯華葳、張郁雯、詹益綾、丘嘉慧(2017):PIRLS 2016臺灣四年級學生閱讀素養國家報告。國立中央大學。
范櫻娟(2017):運用摘要策略於說明文對國小六年級學童摘要能力之影響(未發表)。國立台中教育大學區域與社會發展學系碩士論文。
郭佩慧(2006):中文閱讀摘要學習系統的發展與應用(未發表)。國立台南大學測驗統計研究所碩士論文。
連啟舜(2002):國內閱讀理解教學研究成效之統合分析(未發表)。國立臺灣師範大學教育心理與輔導學系碩士論文。.
連啟舜、陳弘輝、曾玉村(2016):閱讀之摘要歷程探究,教育心理學報,48(2),133-158。https://doi.org/10.6251/BEP.20151124
陳柏熹(2001):題數限制與曝光率控制對多向度電腦化適性測驗之測量精確性與試題曝光率的影響(未發表)。國立中正大學心理學研究所博士論文。
教育部(2010):閱讀理解策略教學手冊。教育部。
許淑玫(2003):潛談閱讀理解的基模理論,國教輔導,42(2),2–7。
陸怡琮(2011):摘要策略教學對提升國小五年級學童摘要能力與閱讀理解的
成效。教育科學研究期刊,56(3),91–118。
曾元顯(2004):中文手機新聞簡訊自動摘要。第十六屆自然語言與語音處理研討會(177–189)。中華民國計算語言學學會。引自網站:引自網站: http://web.ntnu.edu.tw/~samtseng/papers/ROCLING_CellPhoneNewsSummarizer.pdf
張必隱(2004):閱讀心理學。北京師範大學。
張新仁(2009年6月20日):台灣閱讀摘要研究回顧與前瞻〔論文發表〕。台灣閱讀研究回顧與展望座談會,臺北市,臺灣。
黃彥博(2008):科學文章摘要自動化計分方式的比較研究(未發表)。國立臺南大學測驗統計研究所碩士論文。
黃淇瀅(2020):LSA、Word2Vec與Google BERT應用在中文寫作自動評分之成效比較(未發表)。國立臺中教育大學教育資訊與測驗統計研究所碩士論文。
黃嶸生(2002):整合式閱讀理解策略輔助系統對國小學童閱讀能力和策略運用的效果(未發表)。國立臺灣師範大學資訊教育研究所碩士論文。
馮樹仁(2002):以潛在語意分析發展摘要寫作評量系統(未發表)。國立臺灣師範大學資訊教育研究所碩士論文。
楊韻平(1993):兒童摘取文章大意的能力(未發表)。國立政治大學教育研究所碩士論文。
蔡雅泰(2006):概念構圖融入國語教學對國小五年級學童閱讀理解、大意摘要能力與語文學習態度影響之研究(未發表)。國立高雄師範大學教育學系博士論文。
劉士弘、陳冠宇、施凱文、陳柏琳、王新民、許聞廉(2017):當代非監督式方法之比較於節錄式語音摘要。中文計算語言學期刊,22(1),1–26。
劉慈恩、劉士弘、張國韋、陳柏琳(2020):基於端對端模型化技術之語音文件摘要。第三十二屆自然語言與語音處理研討會(29–56)。中華民國計算語言學學會。引自網站:https://aclanthology.org/2020.ijclclp-1.2.pdf
劉惠卿(2006):概念構圖教學對國小六年級學童國語文「摘取大意」學習成效之研究(未發表)。國立花蓮教育大學國民教育研究所碩士論文。
劉玲吟(1994):後設認知閱讀策略的教學對國小低閱讀能力學生閱讀效果之研究(未發表)。國立彰化師範大學特殊教育學系碩士論文。
謝育倫、劉士弘、陳冠宇、王新民、許聞廉、陳柏琳(2016):運用序列到序列生成架構於重寫式自動摘要。第二十八屆自然語言與語音處理研討會(115–128)。中華民國計算語言學學會。引自網站:https://aclanthology.org/O16-1012.pdf
魏靜雯(2004):心智繪圖與摘要教學對國小五年級學生閱讀理解與摘要能力之影響(未發表)。國立臺灣師範大學教育心理與輔導學研究所碩士論文。
蘇宜芬、洪儷瑜、陳柏熹、陳心怡(2018):閱讀理解成長測驗之編製研究。教育心理學報,49,558–580。http://doi.org/10.6251/BEP.201806_49(4).0003
Adams, R. J., Wilson, M. R., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23. https://doi.org/10.1177/0146621697211001
Alterman, R., & Bookman, L.A. (1990). Some computational experiments in summarization. Discourse Processes, 13, 143–174. https://doi.org/10.1080/01638539009544751
Altszyler, E., Ribeiro, S., Sigman, M., & Slezak, D. F. (2017). The interpretation of dream meaning: Resolving ambiguity using Latent Semantic Analysis in a small corpus of test. Consciousness and Cognition, 56, 178–187. https://doi.org/10.1016/j.concog.2017.09.004
American National Standard Institute. (1979). American National Standard for Writing Abstracts. Council of National Library and Information Associations.
Anderson, R., & Pearson, P. D. (1984). A schema-theoretic view of basic processes in reading comprehension. In P. D. Pearson (Ed.), Handbook of reading research (pp. 255–291). Longman.
Archana, A., & Sunitha, C. (2013). An overview on document summarization techniques. International Journal on Advanced Computer Theory and Engineering, 1(2), 113–118.
Baroni, M., & Lenci, A. (2010). Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4), 673–721. https://doi.org/10.1162/coli_a_00016
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.
Bond, G. L., Tinker, M. A., Wasson, B. B., &Wasson, J. B. (1994). Reading difficulties: Their diagnosis and correction. Alleyn and Bacon.
Brown, A.L., & Day, J. D. (1983). Macrorules for summarizing texts: The development of expertise. Journal of Verbal Learning and Verbal Behavior, 22, 1–14. https://doi.org/10.1016/S0022-5371(83)80002-4
Brown, A. L., & Smiley, S. S. (1977). Rating the importance of structural units of prose passages: A problem of metacognitive development. Child Development, 48(1), 1–8. https://doi.org/10.2307/1128873
Chang, T. H., & Sung, Y. T. (2019). Automatic Chinese essay scoring based on multilevel linguistic features. In Y. Lu, & B. Chen (Eds.), Computational and corpus approaches to Chinese language learning (pp. 253–269). Springer. https://doi.org/10.1007/978-981-13-3570-9_13
Chen, Y. W., & Lin, C. J. (2006). Combining SVMs with various feature selection strategies. In I. Guyon, M. Nikravesh, S. Gunn, & A. A. Zadeh (Eds.), Feature extraction (pp. 315–324). Springer. https://doi.org/10.1007/978-3-540-35488-8
Chen, K. Y., Shih, K. W., Liu, S. H., Chen, B., & Wang, H. M. (2015, December 13–17). Incorporating paragraph embeddings and density peaks clustering for spoken document summarization [Paper presentation]. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, Scottsdale, AZ, USA. https://doi.org/10.1109/ASRU.2015.7404796
Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words. In K. Erk , & N. A. Smith. (Eds.), 54th Annual Meeting of the Association for Computation Linguistic (pp. 484–494). Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1046
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June 2–7). BERT: Pre-training of deep bidirectional transformers for language understanding [Paper presentation]. The 2019 Confference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA.
Embretson, S. E., & Reise, S. P. (2013). Item response theory. Psychology Press. https://doi.org/10.4324/9781410605269
Gagne, E. D. (1985). The Cognitive psychology of school learning. Brown and Company.
Gajria, M., & Salvia, J. (1992). The effects of summarization instruction on text comprehension of students with learning disabilities. Exceptional Children, 58(6), 508–516. https://doi.org/10.1177/001440299205800605
Garner, R. (1982). Efficient text summarization: Costs and benefits. Journal of Educational Research, 75, 275–279. https://doi.org/10.1080/00220671.1982.10885394
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202. https://doi.org/10.3758/BF03195564
Graesser, A. C., Singer, M., & Trabasso, T. (1994). Constructing inferences during narrative text comprehension. Psychological Review, 101(3), 371–395. https://doi.org/10.1037/0033-295X.101.3.371
Hare, V. C., & Borchardt, K. M. (1984). Direct instruction of summarization skills. Reading Research Quarterly, 20(1), 62–78. https://doi.org/10.2307/747652
Head, M. H., Readence, J. E., & Buss, R. R. (1989). An examination of summary writing as a measure of reading comprehension. Reading Instruction and Instruction, 28(4), 1–11. https://doi.org/10.1080/19388078909557982
Hidi, S., & Anderson, V. (1986). Producing written summaries: Task demands, cognitive operations, and implications for instruction. Review of Educational Research, 56 (4), 473–493. https://doi.org/10.2307/1170342
Hinton, G. E. (1986, August 15–17). Learning distributed representations of concepts [Paper presentation]. The Eighth Annual Conference of the Cognitive Science Society, Amherst, MA, USA.
Imran, M., Castillo, C., Diaz, F., & Vieweg, S. (2015). Processing social media messages in mass emergency: A survey. ACM Computing Surveys, 47(4), 1–38. https://doi.org/10.1145/2771588
Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text comprehension and product. Psychological Review, 85(5), 363–394. https://doi.org/10.1037/0033-295X.85.5.363
Kintsch, E., Steinhart, D., Stahl, G., LSA Research Group, Matthews, C., & Lamb, R. (2000). Developing summarization skills through the use of LSA-based feedback. Interactive Learning Environments, 8(2), 87–109. https://doi.org/10.1076/1049-4820(200008)8:2;1-B;FT087
Kireyev, K., & Landauer, T. (2011, Jue 19–24). Word maturity: Computational modeling of word knowledge [Paper presentation]. The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA..
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240. https://doi.org/10.1037/0033-295X.104.2.211
Landauer, T. K., Lochbaum, K. E., & Dooley, S. (2009). A new formative assessment technology for reading and writing. Theory into Practice, 48(1), 44–52. https://doi.org/10.1080/00405840802577593
Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated essay scoring: A cross disciplinary perspective. In M. D. Shermis, & J. C. Burstein (Eds.), Automated essay scoring and annotation of essays with the Intelligent Essay Assessor (pp. 87–112). Lawrence Erlbaum Associates.
Le, Q., & Mikolov, T. (2014, June 21–26). Distributed representations of sentences and documents [Paper presentation]. The 31st International Conference on Machine Learning, Beijing, China.
Lin, C. Y. (2004, July 21–26). ROUGE: A package for automatic evaluation of summaries [Paper presentation]. The 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain.
Mani, I. (2001). Automatic summarization. John Benjamins. https://doi.org/10.1075/nlp.3
Maskey, S. R., & Hirschberg, J. (2003, September 1–4). Automatic summarization of broadcast news using structural features [Paper presenatation]. The Eighth European Conference on Speech Communication and Technology, Geneva, Switzerland.
Mayer, R. E. (1996). Learning strategies for making sense out of expository test: The SOI model for guiding three cognitive processes in knowledge construction. Educational Psychology Review, 18(4), 357–371. https://doi.org/10.1007/BF01463939
McKoon, G., & Ratcliff, R. (1992). Inference during reading. Psychological Review, 99(3), 440–466. https://doi.org/10.1037/0033-295X.99.3.440
Mercer, C. D., & Mercer, A. R. (1993). Teaching students with learning problems. Merrill.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013, May 2–4). Efficient estimation of word representations in vector space [Conference poster]. International Conference on Learning Representation, Scottsdale, AZ, USA.
Moens, M.-F. (2002). Automatic indexing and abstracting of document texts. Springer. https://doi.org/10.1007/b116177
Nallapati, R., Zhai, F., & Zhou, B. (2017). SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents. In S. Singh, & S. Markovitch (Eds.). The Thirty-First AAAI Conference on Artificial Intelligence (pp. 3075–3081). Association for the Advancement of Artifical Intelligence.
Nallapati, R., Zhou, B., dos Santos, C., Gu̇ lçehre, C., & Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence RNNs and beyond. In S. Riezler, & Y. Boldberg (Eds.), Proceedings of the 20th SIGNLL Conference on Computation al Natural Language Learning (pp. 280–290). Association for Computational Linguistics. https://doi.org/10.18653/v1/K16-1028
Narayan, S., Cohen, S. B., & Lapata, M. (2018). Ranking sentences for extractive summarization with reinforcement learning. In M. Walker, H. Ji, & A. Stent (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1747–1759). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1158
National Center for Education Statistics. (2009). The nation’s report card: Reading 2009 (NCES 2010-458). Institute of Education Science, U.S. Department of Education.
Nelson, J. R., Smith, D. J., & Dodd, J. M. (1992). The effects of teaching a summary skills strategy to students identified as learning disabled on their comprehension of science text. Education and Treatment of Children, 15, 228–243.
Oakhill, J., & Cain, K. (2007). Introduction to comprehension development. In K. Cain, & J. Oakhill (Eds.), Children's comprehension problems in oral and written language: A cognitive perspective (pp. 3–40). Guilford Press.
OECD (2010). PISA 2009 results: What students know and can do – Student performance in reading, mathematics and science (Volume I). https://doi.org/10.1787/9789264091450-en
Pearson, P. D., & Duke, N. K. (2002). Comprehension instruction in the primary grades. In C. C. Block, & M. Pressley (Eds.), Comprehension instruction: Research-based best practices (pp. 247–258). Guilford Press.
Pressley, M. (2000). What should comprehension instruction be the instruction of? In M. L. Kamil, P. B. Mosenthal, P. D. Pearson, & R. Barr (Eds.), Handbook of reading research (pp. 545–561). Lawrence Erlbaum Associates Publishers.
Radev, D. R., Hovy, E., & McKeown, K. (2002). Introduction to the special issue on summarization. Computational Linguistics, 28, 399–408. https://doi.org/10.1162/089120102762671927
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492–1496. https://doi.org/10.1126/science.1242072
Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. In L. Marquex, C. Callison-Burch, & J. Su (Eds), Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 379–389). Association for Computational Linguistics. https://doi.org/10.18653/v1/D15-1044
Swaby, B. E. R. (1989). Diagnosis and correction of reading difficulties. Allyn and Bacon.
See, A., Liu, P., & Manning, C. (2017). Get to the point: Summarization with pointer-generator networks. In R. Barzilay, & M.-Y. Kan (Eds), Proceedings of the 55th Annual Meeting of the Association for Computation Linguistic (pp. 1073–1083). Association for Computational Linguistics. https://doi.org/10.18653/v1/P17-1099
Sung, Y. T., Chang, T. H., Lin, W. C., Hsieh, K. S., and Chang, K. E. (2016). CRIE: an automated analyzer for Chinese texts. Behavior Research Methods, 48(4), 1238–1251. https://doi.org/10.3758/s13428-015-0649-1
Sung, Y. T., Liao, C. N., Chang, T. H., Chen, C. L., & Chang, K. E. (2016). The effect of online summary assessment and feedback system on the summary writing on 6th graders: The LSA-based technique. Computers & Education, 95, 1–18. https://doi.org/10.1016/j.compedu.2015.12.003
Torres-Moreno, J.-M. (2014). Automatic text summarization. John Wiley & Sons. https://doi.org/10.1002/9781119004752
Truran, M., Georg, G., Cavazza, M., & Zhou, D. (2010, September 21–24). Assessing the readability of clinical documents in a document engineering environment [Presentation paper]. The 10th ACM Symposium on Document Engineering, Manchester, United Kingdom.
van den Broek, P., & Kremer, K. (2000). The mind in action: What it means to comprehend. In B. M. Taylor, M. F. Graves, & P. van den Broek (Eds.), Reading for meaning (pp. 1–31). Teacher’s College Press.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017, December 4- 9). Attention is all you need [Paper presentation]. The 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA.
Venezky, R. L. (1967). English orthography: Its graphical structure and its relation to sound. Reading Research Quarterly,2 (3), 75–105. https://doi.org/10.2307/747031
Wade-Stein, D., & Kintsch, E. (2004). Summary street: Interactive computer support for writing. Cognition and Instruction, 22(3), 333–362. https://doi.org/10.1207/s1532690xci2203_3
Wan, X., & Yang, J. (2008, July 20–24). Multi-document summarization using cluster-based link analysis [Paper presentation]. The 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore.
Wang, W.-C., & Chen, P.-H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological Measurement, 28, 295–316. http://doi.org/10.1177/0146621604265938
Williams, J. P. (1984). Categorization, macrostructure, and finding the main idea. Journal of Educational Psychology, 76(5), 874–879. https://doi.org/10.1037/0022-0663.76.5.874
Wormeli, R. (2005). Summarization in any subject: 50 techniques to improve students learning. Association for Supervision and Curriculum Development.
Wright, B. D., & Linacre, J. M. (1989). Observations are always ordinal; measurements, however, must be interval. Archives of Physical Medicine and Rehabilitation, 70(12), 857–860.
Wu, M. L., Adams, R. J., Wilson, M. R., & Haldane, S. (2007). ConQuest (Version 2.0) [Computer Software]. ACER.