簡易檢索 / 詳目顯示

研究生: 楊旻容
Yang, Min-Jung
論文名稱: 科普文章自動化分級模型的建置
Constructing Automated Leveling Models for Popular Science Articles
指導教授: 張國恩
Chang, Kuo-En
宋曜廷
Sung, Yao-Ting
學位類別: 碩士
Master
系所名稱: 資訊教育研究所
Graduate Institute of Information and Computer Education
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 92
中文關鍵詞: 可讀性科普文章文本分類潛在語意分析支援向量機特徵選取
英文關鍵詞: Readability, Popular Scientific Article, Text Classification, Latent Semantic Analysis, Support Vector Machine, Feature Selection
論文種類: 學術論文
相關次數: 點閱:159下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   閱讀是學習的初始,是人類學習知識的重要管道。科普雜誌是非常有助益的課外讀物,它幫助民眾瞭解或學習科學的基礎概念與應用,培養科學精神去思考、探究日常周遭的事物。對於學習者來說,適當的讀本才能讓其閱讀效果最佳化,國內科普雜誌出版商,為其販售的科普讀物規劃了適合的閱讀對象,不過有些雜誌因商業考量界定的閱讀對象範圍較廣,或者因其目標對象包含識字階段的學童,將家長與老師陪讀的因素也考量進去,沒有較精確的適讀年齡層可供參考。本研究挑選了三個版本共150篇由中文撰寫的科普雜誌文章,利用可讀性評估來衡量其適讀對象年級,考量國小學童與國中以上年級學生對於科普雜誌的學習需求不同,以一般語言特徵和概念特徵建構了二階段的分級模型,將國內1~12年級國社自三科教科書的可讀性特徵作為基礎,來預測範圍由小學三年級至大學一年級的科普文章,同時也邀請了九位國小至高中的自然專科教師對文章閱讀層級作評斷,作為與模型分析比較之基準。本研究建置之科普文章自動化分級模型的分類準確率為59.73%,根據老師們評斷結果施以彈性放寬處置,模型準確率為73.15%,可作為自然課外讀物分級的初步參考。

      Reading is beginning of learning, and also a primary tool for knowledge acquiring. Popular Science magazine is a kind of helpful reading materials for readers. It assists people to realize or to learn the basic conception and application of science and also cultivates public a scientific spirit to think, to probe into every phenomenon of daily life. The level of reading materials which matches readers’ ability and purpose bring readers the best benefits in reading. The domestic publishers of Popular Science magazine provide their products a reference of appropriate readers, however they usually draw a large range of suitable readers’ ages, for the reason that they can profit more from their customers. Another reason is when students have some problems on reading magazines, parents and teachers may give them some aids, so publishers extend the target objects and it caused there are not any exactly leveled books for readers. This study selected 150 Popular Scientific articles writing or translation in the Chinese language from three different versions of the magazine. With the use of readability text classification, which combines linguistic features and concept words that were displayed by a list and have degree of difficulty, to construct a two-stage leveling models for different reading needs of different grades students. Readability Assessment could quantify the difficulty of the text, then students could choose the appropriate reading articles. The corpus of this two stage automated leveling models is based on 12 grades textbooks, which contains Chinese, social studies, and natural science three subjects, it could predict difficulty levels of any articles whether from a book or internet in scientific disciplines, with the readers’ grade range from 1st grade at primary school to 13th grade at university. To compare with the result of leveling models, the researcher also invited nine natural science teachers from primary, junior high and senior high school to estimate the suitable readers’ grade of these Popular Scientific articles. The rate of models’ accuracy is 59.73% for the strict standard and 73.15% for the less stringent but acceptable standard. The models could supply public a more precisely and verified result.

    目錄 表目錄 vi 圖目錄 vii 第一章 緒論 1 第一節 研究背景與動機 1 第二節 研究目的 6 第二章 文獻探討 7 第一節 可讀性 7 第二節 支援向量機 20 第三節 潛在語意分析 25 第四節 領域知識之概念難度評估 35 第三章 研究方法 37 第一節 資料前處理-斷詞與詞性標記 37 第二節 中文文本可讀性指標分析系統 40 第三節 潛在語意分析建置領域知識概念詞 44 第四節 支援向量機的訓練與測試 48 第四章 實驗設計 52 第一節 實驗工具 52 第二節 實驗資料 52 第三節 實驗流程 58 第四節 實驗結果 66 第五節 實驗結果討論 71 第五章 結論與未來發展 74 第一節 結論 74 第二節 未來發展 76 參考文獻 77 附錄一 中研院平衡語料庫詞類標記集 86 附錄二 文本可讀性指標自動化分析系統(CRIE)之指標 88

    參考文獻
    一、中文文獻
    于宗先(民49)。臺灣報紙可讀性之研究。報學,2(6),18。
    邱上真、洪碧霞(民85)。國語文低成就學生閱讀表現之追蹤研究( I ) - 成分技能取向之國語文成就測驗的編製。國科會專題研究 計畫成果報告。
    宋佩貞、鄭承昌(民98)。臺灣審定版國民小學英語教科書適讀性研究與應用。Journal of Textbook Research, 2(1),55-80。
    何勇海(2014)。少兒讀物“分級閱讀”為何難推行?福建日報。取自http://www.ce.cn/culture/gd/201402/14/t20140214_2302473.shtml
    李隆盛(民88)。科技與職業教育的展望。師大書苑有限公司。
    宋曜廷、陳茹玲、李宜憲、查日龢、曾厚強、林維駿、張道行、張國恩(民102)。中文文本可讀性探討:指標選取、模型建立與效度驗證。Chinese Journal of Psychology,55(1),75-106。
    林宗勳(民95)。Support Vector Machines簡介。台灣大學通訊與多媒體實驗室。取自http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/SVM2.pdf
    柯華葳(民98)。台灣閱讀現況-1:PIRLS說了什麼?。天下雜誌教育基金會:希望閱讀。取自http://reading.cw.com.tw/Controller?event=READDOC&docid=2000263
    柯華葳(民98)。教出閱讀力2。台北:天下雜誌。
    陳明蕾、王學誠、柯華葳(民98)。中文語意空間建置及心理效度驗證:以潛在語意分析技術為基礎。Chinese Journal of Psychology, 51(4),415-435。
    荊溪昱(民81)。《國小國語教材的課文長度、平均句長及常用字比率與年級關係之探討》。行政院國家科學委員會專題研究計畫,報告編號NSC 81-0301-H-017-04。台北:行政院國家科學委員會。
    荊溪昱(民84)。〈中文國文教材的適讀性研究:適讀年級值的推估〉。《教育研究資訊》,3(3),113-127。
    張之傑(民72)。科普與科學藝文,科學月刊,162,482。
    章瓊方(民104)。國小科普讀物適級推薦閱讀之研究(碩士論文)。臺灣師範大學圖書資訊學研究所學位論文,1-134。
    趙子萱(民100)。中文環境兒童圖書分級指標建立之探討(碩士論文)。臺灣師範大學圖書資訊學研究所學位論文,1-145。
    楊孝濚(民60)。〈影響中文可讀性語言因素的分析〉。《報學》,4(7),58-67。

    二、英文文獻
    Biemiller, A., & Slonim, N. (2001). Estimating root word vocabulary growth in normative and advantaged populations: Evidence for a common sequence of vocabulary acquisition. Journal of Educational Psychology, 93(3), 498.
    Borst, A., Gaudinat, A., Boyer, C., & Grabar, N. (2008). Lexically based distinction of readability levels of health documents. Acta Informatica Medica,16(2), 72-75.
    Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In D. Haussler (Ed.), 5th annual ACM workshop on computational learning theory (pp. 144-152). Pittsburgh, PA: ACM Press.
    Bruce, B., Rubin, A., & Starr, K. (1981). Why readability formulas fail. Professional Communication, IEEE Transactions on, (1), 50-52.
    Burke, V., & Greenberg, D. (2010). Determining readability: How to select and apply easy-to-use readability formulas to assess the difficulty of adult literacy materials. Adult Basic Education and Literacy Journal, 4(1), 34-42.
    Carroll, J. B., & White, M. N. (1973). Word frequency and age of acquisition as determiners of picture-naming latency. The Quarterly Journal of Experimental Psychology, 25(1), 85-95.
    Caylor, J. S., Sticht, T. G., Fox, L. C., & Ford, J. P. (1973). Methodologies for determining reading requirements of military occupational specialties (Report No. 73-5). Alexandria, VA: Human Resources Research Organization.
    Chall, J., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Cambridge, MA: Brookline Books.
    Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
    Chang, T. H., Sung, Y. T., & Lee, Y. T. (2012). A Chinese word segmentation and POS tagging system for readability research. Paper presented at the 42nd Annual Meeting of the Society for Computers in Psychology (SCiP 2012), Minneapolis, MN.
    Chang, T. H., Sung, Y. T., & Lee, Y. T. (2013, August). Evaluating the Difficulty of Concepts on Domain Knowledge Using Latent Semantic Analysis. In Asian Language Processing (IALP), 2013 International Conference on (pp. 193-196). IEEE.
    Chen, J. L., Cha, J. H., Chang, T. H., Sung, Y. T., & Hsieh, K. S. (2012). CRIE: A tool for analyzing Chinese text characteristics. Paper presented at the 42nd Annual Meeting of the Society for Computers in Psychology (SCiP 2012), Minneapolis, MN.
    Chen, Y. T., Chen, Y. H., & Cheng, Y. C. (2013). Assessing Chinese readability using term frequency and lexical chain. Computational Linguistics and Chinese Language Processing, 18(2), 1-17.
    Chen, Y. W., & Lin, C. J. (2006). Combining SVMs with various feature selection strategies. In Feature extraction (pp. 315-324). Springer Berlin Heidelberg.
    Dale, E., & Chall, J. (1948a). A formula for predicting readability: Instructions. Educational Research Bulletin, 27(2), 37–54.
    Dale, E., & Chall, J. (1948b). A formula for predicting readability. Educational Research Bulletin, 27(1), 11–20, 28.
    Daowadung, P., & Chen, Y. H. (2011, May). Using word segmentation and SVM to assess readability of Thai text for primary school students. InComputer Science and Software Engineering (JCSSE), 2011 Eighth International Joint Conference on (pp. 170-174). IEEE.
    Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
    Denning, J., Pera, M. S., & Ng, Y. K. (2015). A readability level prediction tool for K‐12 books. Journal of the Association for Information Science and Technology.
    DuBay, W. H. (2007). Smart Language: Reader, readability, and the grading of text. Costa Mesa, CA: Impact Information.
    DuBay, W. H. (2008). Unlocking Language: The Classic Readability Studies. IEEE Transactions on Professional Communication, 4(51), 416-417.
    Falkenjack, J., & Heimann Mühlenbock, K. (2012). Using the probability of readability to order Swedish texts. In The Fourth Swedish Language Technology Conference, October 24-26, Lund 2012 (pp. 27-28).
    Feng, L., Elhadad, N., & Huenerfauth, M. (2009). Cognitively motivated features for readability assessment. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (pp. 229-237). Association for Computational Linguistics.
    Feng, L., Jansche, M., Huenerfauth, M., & Elhadad, N. (2010). A comparison of features for automatic readability assessment. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (pp. 276-284). Association for Computational Linguistics.
    Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32, 221–233.
    Flesch, R. (1949). The art of readable writing. New York, NY: Harper & Brothers.
    Fry, E. (1968). A readability formula that saves time. Journal of reading, 11, 513-516.
    Fry, E. (2002). Readability versus leveling. The Reading Teacher, 56, 286–291.
    Gallagher, T. L., Fazio, X., & Gunning, T. G. (2012). Varying Readability of Science-Based Text in Elementary Readers: Challenges for Teachers. Reading Improvement, 49(3), 93-112.
    Gottlieb, R., & Rogers, J. (2004). Readability of health sites on the Internet. The International Electronic Journal of Health Education, 7, 38–42.
    Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36, 193-202.
    Hallahan, D. P., Kauffman, J. M., & Lloyd, J. W. (1999). Introduction to learning disabilities (2nd ed.). Needham Heights, MA: Allyn & Bacon.
    Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A Practical Guide to Support Vector Classification. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei.
    Jonathan I. M., Andrian M. (2000). Using latent semantic analysis to identify similarities in source code to support program understanding. Tools with Artificial Intelligence, 2000. ICTAI 2000. Proceedings. 12th IEEE International Conference 2000, 46 -53.
    Kameenui, E. J., & Simmons, D. C. (1990). Designing instructional strategies: The prevention of academic learning problems. Columbus, OH: Merrill Publishing Company.
    Klare, G. R. (1963). The measurement of readability. Ames, IA: Iowa State University Press.
    Klare, G. R. (1984). Readability. In P. D. Pearson, R. Barr., M. I. Kamil, & P. Mosenthal (Eds.), Handbook of reading research (pp. 681-744). New York: Longman.
    Klare, G. R. (2000). The measurement of Readability: Useful information for communicators. ACM Journal of Computer Documentation, 24, 107-121.
    Kincaid, J. P., Fishburne, L. R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, Fog Count and Flesch Reading Ease Formula) for navy enlisted personnel. Millington, TN: Navy Research Branch.
    Landauer, T. K. & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review , 104(2) , 211-140.
    Landauer, T. K., Foltz, P. W., & Laham, D., (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259-284.
    Landauer, T. K., Laham, D., Rehder, B., & Schreiner, M. E. (1997). How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans. In Proceedings of the 19th annual meeting of the Cognitive Science Society (pp. 412-417).
    Landauer, T. K., McNamara, D. S., Dennis, S., & Kintsch, W. (Eds.). (2013). Handbook of latent semantic analysis. Psychology Press.
    Lau, T. P., & King, I. (2006, May). Bilingual web page and site readability assessment. In Proceedings of the 15th international conference on World Wide Web (pp. 993-994). ACM.
    Lehnert, W. G., & Ringle, M. H. (Eds.). (1982). Strategies for natural language processing. Hillsdale, NJ: Lawrence Erlbaum Associates.
    McLaughlin, G. H. (1968). Proposals for British readability measures. In A. L. Brown & J. Downing (Eds.). The third international reading symposium (pp. 186-205). London: Cassell.
    McLaughlin, G. H. (1969). SMOG grading -- A new readability formula. Journal of Reading, 22, 639-646.
    McLaughlin, G. H. (1974). Temptations of the Flesch. Instructional Science, 2, 367–384.
    McLaughlin, G. H. (2008). SMOG: Simple measures of gobbledygook [Web page]. Retrieved from http://www.harrymclaughlin.com/SMOG.htm.
    McNamara, D. S., Louwerse, M. M., & Graesser, A. C. (2002). Coh-MetrixCoh-Metrix: Automated cohesion and coherence scores to predict text readability and facilitate comprehension. Memphis, TN: Institute for Intelligent Systems, University of Memphis.
    McNamara, D. S., Louwerse, M. M., McCarthy, P. M., & Graesser, A. C. (2010). Coh-Mertix: Capturing linguistic features of cohesion. Discourse Process, 47, 292-330.
    Meade, C. D., & Smith, C. F. (1991). Readability formulas: cautions and criteria. Patient education and counseling, 17(2), 153-158.
    Morrison, C. M., Chappell, T. D., & Ellis, A. W. (1997). Age of acquisition norms for a large set of object names and their relation to adult estimates and other variables. The Quarterly Journal of Experimental Psychology: Section A, 50(3), 528-559.
    Morrison, C. M., & Ellis, A. W. (1995). Roles of word frequency and age of acquisition in word naming and lexical decision. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(1), 116.
    Petersen, S. E., & Ostendorf, M. (2009). A machine learning approach to reading level assessment. Computer speech & language, 23(1), 89-106.
    Rehder, B., Schreiner, M. E., Wolfe, B. W., Laham, D., Landauer, T. K., & Kintsch, W. (1998). Using latent semantic analysis to assess knowledge: Some technical considerations. Discourse Processes, 25, 337-354.
    Scott, B. (2015). Readability Formulas [Web page]. Retrieved from http://www.readabilityformulas.com/readability-calculations.php
    Spache, G. (1953). A new readability formula for primary-grade reading materials. The Elementary School Journal, 53, 410–413.
    Sperling, R. (2006). Assessing reading materials for students who are learning disabled. Intervention in School and Clinic, 41, 138–143.
    Sticht, T. G. (1975). Reading for Working: A Functional Literacy Anthology. Alexandria, VA: Human Resources Research Organization.
    Sung, Y.-T., Chang, T. H., Chen, J.-L., Cha, J.-H., Huang, C.-H., Hu, M.-K., et al. (2011). The construction of Chinese Readability Index Explorer and the analysis of text readability. Paper presented at 21th Annual Meeting of Society for Text and Discourse Process. Poitiers, France.
    Tanaka-Ishii, K., Tezuka, S., & Terada, H. (2010). Sorting texts by readability. Computational Linguistics, 36(2), 203-227.
    Times, YK. (2006). File:Fry Graph SVG.svg [Web page]. Retrieved from https://en.wikipedia.org/wiki/File:Fry_Graph_SVG.svg#filehistory
    Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York: Springer Verlag.
    Whitney, A. W. (1971). A direct method of nonparametric measurement selection. IEEE Transactions on Computers, (9), 1100-1103.
    Zhao, J., & Kan, M. Y. (2010, June). Domain-specific iterative readability computation. In Proceedings of the 10th annual joint conference on Digital libraries (pp. 205-214). ACM.

    下載圖示
    QR CODE