研究生: |
方家慶 Chia-Ching Fang |
---|---|
論文名稱: |
利用測驗理論統計分析及了解化學學習進展 |
指導教授: |
方泰山
Fang, Tai-Shan |
學位類別: |
碩士 Master |
系所名稱: |
化學系 Department of Chemistry |
畢業學年度: | 87 |
語文別: | 中文 |
論文頁數: | 172 |
中文關鍵詞: | 測驗理論 、試題反應理論 、古典測驗理論 、概念分析 、酸鹼 、學習進展 、化學學習進展 |
英文關鍵詞: | IRT, Item Response Theory, Classical Test Theory, concept analysis, acid and base, The learning process, chemistry learning progress |
論文種類: | 學術論文 |
相關次數: | 點閱:229 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文利用「測驗法」,以了解我國學生化學學習進展。主要以「酸、鹼、鹽」為主概念,並擬訂「酸、鹼、鹽」之概念分析圖與雙向細目表做為命題準據,進行初測、統計、修正…進一步發展成評測所需要的試題。之後以「分層隨機取樣法」針對台北縣、市,公立國小、國中及高中進行測驗。再利用試題反應理論將國小、國中、高中之分測驗結果資料,利用每份測驗均以預先設計的共通試題做為連接,將所有學生放在同一量尺比較。
結果顯示出:
一、國小平均分數、國中高分組前1/3的人(我國考選人數)之平均分數與高中平均分數,三者呈線性關係,表示我國化學習進展,在追求卓越的精英教育目的下,呈現穩定發展。
二、國小學生有30%的人,超過國中平均分數47分,國中學生有4%的人分數超過高三生平均數,表示這些學生巳具有資格跳級學習,如能彈性調整學制空間(尤其在小學有相當大的調整空間),則可達培養國家精英人才、發展高科技之目的。
三、國小學習進展為理解→知識→應用,國中為知識→應用→理解,高中為知識→理解→應用。
四、國小學生學習成就高低主要受「老師」與「家長」所影響;國中生的能力分數主要受「興趣」所支配;高中生因為經聯考篩選過,分數主要是受「學校」所影響。
The Best Test Methodology is utilized to understand the chemistry learning progress of Grade 6, 9 & 12 students in northern Taiwan, ROC. Test Content is based on the subject "Acid, Base and Salt". The concept analysis tree-diagram guided the strategy of the preposition of the test and two-way components of the acid base and salt were checked. With a series of pretests, statistics and modifications of these methods, the items were further developed to meet demands in test. The students, who were selected by "Stratified Random Sampling" of Grade 6, 9 & 12 students, in primary schools, junior high schools and senior high schools in Taipei City and Taipei County, took the 3 different level tests, respectively. Item Response Theory (IRT) as well as classical Test Theory (CTT) was used to analyze the individual tested data, which were collected from fest; three tests were connected to one scale with the previously designed common items. All students were compared with one another on the same scale. The results conclude the learning progress from grade 6, grade 9 to grade 12 which were consulted with background SPSS analysis as follows:
1.There is a linear relation among the average scores of the 3 tests taken by the students in primary school, the top one-third of the junior high school, which is the number of students admitted to senior high schools. The three average scores represent that the chemistry learning progress of Northern Taiwan has been steadily developing to the goal of educating students to be excellent in academic performances.
2.The score of thirty percent of the primary school students have exceeded the average score as 47 of the junior high school; four percent of the junior high school students have higher score than that of the senior high school. This fact indicates that those high achievement students mentioned above are probably qualified to the skip-a-grade learning. If the length of schooling in Taiwanese educational system can be more flexible, especially for more space in high school, the talented students will be specially fostered by our country and then will contribute to high tech development. Otherwise, the content which student learned should be revised.
3.The learning process of the primary school pupils is lined with the hierarchy:Comprehension→Knowledge→Application, that of the junior high school students is Knowledge→Application→Comprehension, and finally that of the senior high school students is Knowledge→Comprehension→Application .
4.Teachers and parents are the two-majon influencing factors on the achievment of the primary school students. 「The junior high school students are governed with interest. For the senior high school students screened with the joint entrance examination of senior high school, the learning progress affected by the school factor.」
方泰山,(民81):科學教育指標之研究:化學學習進展指標(Ⅰ)。國科會研究報告。
方泰山,(民82):科學教育指標之研究:化學學習進展指標(Ⅱ)。國科會研究報告。
方泰山,(民83):科學教育指標之研究:化學學習進展指標(Ⅲ)。國科會研究報告。
方泰山,(民84):科學教育指標之研究:化學學習進展指標(Ⅳ):國科會研究報告。
方泰山、廖焜熙(80):由命題的頭腦體操論化學概念分析。科學教育月刊,(139),2-8。
毛連塭、陳麗華(民76):精熟學習法。台北:心理。
牟中原(民88):普通化學與時事-汞污泥事件。中華民國第二屆化學教育學術研討會。會議手冊:pp.10-15。
余民寧(民82):試題反應理論的介紹-IRT的未來。研習資訊,第11卷,第三期, 7-11。
余民寧(民82):測驗理論發展趨勢。中國測驗學會成立六十週年慶論文集,23-62。台北:心理。
吳美玲(民86):理化科教學示範。中華民國第十三屆科學教育學術研討會。會議手冊:180-188。
周立勳(1994):國小班分組合作學習之研究。國立政治大學教育研究所博士論文。
林世華(民76):潛在特質理論與其應用於適性測驗之評估研究。教育心理學報, 20期, 131-182。
林妙香(民79):Rasch’s Logistic Model之題庫架構與應用。測驗年刊,第37輯,97-112頁,台北:中華民國測驗學會。
林秀娟等(民83):動態評量結合試題反應理論在空間視覺學習之潛能評量之研究。中國測驗學會測驗年刊,41輯,73-108。
林清山,(民72):多變項分析統計學。台北:東華。
林清山,(民81):心理與教育統計學。台北:東華。
洪志明、施朱娟(民86):國小酸鹼概念精通學習之研究。中華民國第十三屆科學教育學術研討會-會議手冊及短篇論文彙編:pp.553-557。
洪碧霞、吳鐵雄(民78):簡介電腦化適性測驗的發展及其實施要素並兼論我國大專聯考電腦化適性化的可行性。中國測驗學會測驗年刊,6輯,75-94。
曹淇峰(民83):選題策略應用在編製酸鹼概念測驗之比較研究。國立臺灣師範大學碩士論文。
曹淇峰、方泰山(民82):應用古典與近代測驗理論分析酸鹼概念試題之比較研究,第九屆科學教育研討會。
楊文金(民80),形成假說技能試題之結構分析研究,國立臺灣師範大學科學教育研究所博士論文。
楊冠政(民66),各國科學課程發展趨勢。科學教育月刊,第6期2月。
許榮富、洪振方(民82):RASCH部分給分模式在科學知識理解之測量分析。中國測驗學會測驗年刊,40輯,153-168。
許擇基(民78):用新的教育測量理論和技術來增進入學考試的功能,宗義台灣教育問題討論會論文集,自立報系。
許擇基、劉長萱(民81):試題作答理論簡介。台北:中國行為科學社。
郭生玉(民79)。心理與教育測驗(五版)。台北:精華。
陳竹亭(民88):創造力教學與教學創造力-執行「科學創造力」科教計畫的思辯與學習。中華民國第二屆化學教育學術研討會。會議手冊:31-35。
陳姍姍、方泰山(民82):我國國三學生酸鹼概念之研究。國立臺灣師範大學化學研究所碩士論文。
陳昭錦、方泰山(民81):潛在特質測驗理論與古典測驗理論應用在大專聯招化學選擇題的比較與探究。科學教育月刊,150期,20-32。
陳英豪、吳裕益(民80)。測驗與評量(修訂一版)。高雄:復文。
黃芳裕、黃長司、黃文彰(民88):我國北區高化學能力的高中學生之思考表徵。中華民國第二屆化學教育學術研討會。會議手冊:31-35。
黃安邦(民80)。心理測驗。台北:五南。
黃萬居(民81):小高年級學生酸鹼概念現況,學教育學術研討會論文彙編。
楊宏珩、段曉林(民88):合作學習-高中化學進行教學之行動研究。中華民國第二屆化學教育學術研討會。會議手冊:111-116。
葉重新(民80):心理測驗。台北:三民。
葛樹人(民78):心理測驗學。台北:桂冠。
簡茂發等(民83):試題反應理論在國民教育階段國小數學科基本學習成就評量上的運用。中國測驗學會測驗年刊41輯,1-18。
蘇育任(民88):運用社會議題學習核化學的研究。中華民國第二屆化學教育學術研討會。會議手冊:16-21。
Ackerman, T. A. (1994). Using multidimensional item response theory to understand what item and test are measuring. Applied Measurement in Education, 4, 255-278.
Andersen, E. B. (1990). The statistical analysis of categorical data. Berlin: Springer-Verlag.
Andersen, E.B. (1973). Conditional inference and models for measuring. Copenhagen: Mentalhygiejnisk Forlag.
Assessment Systems Corporation (1988, 1994). User’s Manual for the MicroCAT Testing System (Version 3). St. Paul, MN: Author.
Baker, F.B. (1977). Advances in item analysis. Review of Eduational Research, 47, 151-178.
Baker, F.B. (1985). The basic of item response theory. Portsmouth, NH: Heinemann.
Baltimore, M.D. (1993). Maryland School Performance Report. Maryland State Department of Education.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395-479). Reading, MA: Addison-Wesley.
Bock, R.D. (1972). Estimathing item parameters and latent ability when responses are scored in two or more nomial categories. Psychometrika, 37, 29-51.
Carlson, J. E. (1987). Multidimensional item response theory estimation: A computer program (Research Report ONR87-2). Iowa City, IA: ACT.
Chang, S. -H. & Li, Y. –Z. (1997, July). Applying Item Response Theory to Joined College Entrance Examination Scoring: Polytomous Models For Guessing-Corrected Items. Psychological Testing, Vol, 44, No. 2.
Chang, S. -H. (1990). Fitting a polytomous item response model to likert-type data. Applied Psychological Measurement. 14, 59-71.
Clogg, C. C. (1988). Latent class models for measuring. Latent trait and latent class models (pp.173-205). New York: Plenum.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334.
Cronbach, L.j., Gleser, G.C., Nanda, H., Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Johyn Wiley & Sons.
Croon, M. (1993). Ordinal latent class analysis for single-peaked items. Kwantetatieve methoden, 14, 127-142.
Cros, D., Chastrette, M. and Fayol, M. (1988). Conceptions of second-year university students of some fundamental notions in chemistry. International Journal of Science Education, 10 (3), 331-336.
Cros, D., Maurice, M., Amouroux, R., Chastrette, M., Leber, J. and Fayol, M. (1986). Conceptions of first-year university students of the constituents of matter and the notions of acids and bases. European Journal of Science Education, 8 (3), 305-313.
Davey, T. & Oshima, T. C. (1995). Linking multidimensional item calibrations. Unpublished manuscript. Iowa City, IA: The American College Testing.
DcDonald, R. P. (1967). Non-linear factor analysis. Psychometric Monographs, No. 15.
De Ayala, R. J., Dodd, B. G. &Koch, W. r. (1991). Partial credit analysis of writing ability. Educational and Psychological Measurement, 51, 103-114.
De Champlain. A.F., & Gessaroli, M. E. (1991). Assessing test dimensionality using an index based on nonlinear factor analysis. Paper presented ant the Annual Meeting of the American Educational Research Association, Chicago, IL.
Dodd, B. G. (1990). The effect of item selection procedure and stepsize on computerized adaptive attitude measurement using the rating Scale model. Applied Psychological Measurement, 14, 355-366.
Dodd, B. G., Koch, W. R. & De Ayala, R. J. (1989). Operational characteristics of adaptive testing procedures using the graded model. Applied Psychological Measurement, 13, 129-143.
Dodd, B. G., Koch, W. R. & De Ayala, R. J. (1993). Computerized adaptive testing using the partial credit model: effects of item pool characteristics and different stopping rules. Educational and Psychological Measurement, 53, 61-77.
DuBois, P. H. (1997). A history of psychological testing Boston, MA: Allyn & Bacon.
Ebel, R.L. (1979). Essentials of Education Measurement. 3rd ed., Englewood Cliffs, N.J.: Prentice-Hall.
Embretson, S.E. (1984). A general latent trait model for response processes. Psychometrika, 49, 175-186.
Embretson, S.E. (Ed.)(1985). Test design: Developments in psychology and psychometrics. Orlando, FL: Academic.
Fischer, Gerhard H. & Molenaar, Ivo W. (1995). Rasch Models Foundations, Recent Development, and Applications.
Fraser, C., & McDonald, R.P. (1988). NOHARM; Least squares item factor analysis. Multivariate Behavioral Research, 23, 267-269.
Frederiksen, N., Glaser, R., Lesgole, A., & Shafto, M.G. (Eds.) (1990). Diagnostic monitoring of skill and knowledge acquisition. Hillsdale, NJ: Lawrence Erlbaum Associates.
Freedle, R. (Ed.) (1990). Artificial intelligence and the future of testing. Hillsdale, NJ: Lawence Erlbaum Associates.
Glas, C. (1990). RIDA: Rasch incomplete design analysis. Arnhem, The Netherlands: National Institute for Educational Measurement.
Gronlund, N.E. (1982). Constructing achievement testing (3rd ed.).Englewood Cliffs, NJ: Prentice-Hall.
Gulliksen, J. (1987). Theory of mental tests. New York: John Wiley & Sons (Original edition was published in 1950)
Gutkin, T.B., & Wise, S.L. (1991). The computer and the decisionmaking process. Hillsdale, NJ: Lawrence Erlbaum Associates.
Guttman, L. A. (1950). The basis for scalogram analysis. In S. A. Stouffer et al. (Eds.), Measurement and prediction (pp. 60-90). Princeton, NJ: Princeton University Press.
Hagenaars, J. A. (1990) Cutegorical longitudinal data: log-linear panel, trend and cohort analysis. Newbury Park, CA: Sage
Hambleton, R. K. (1985). Item response theory: Principles and application. Boston: Klvwer Nijhoff.
Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp.147-200). New York: Macmillan.
Hambleton, R. K., & Cook, L. L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement 14, 75-96.
Hambleton, R.K. (ED.)(1983). Applications of item response theory. Vancouver, BC: Educational Research Institute of British Columbia.
Hambleton, R.K., & Zaal, k J.N. (Eds.) (1991). Aduances in educational and psychological testing. Boston: Kluwrer.
Hambleton, Swaminathan, & Rogers (1991). Fundamentals of item response theory. Newbury Part, CA: SAGE.
Hattie, J., Krakowski, K., Roger, H. J., & Swaminathan, H. (1996). An assessment of Stoutes Index of essential Unidimensionality. Applied Psychological Measurement, 20(1), 1-14.
Heinen, A. & Croon, M. (1992, June). Latent structure measurement models for polychotomous items. International conference on Social Science Methodology, Trento, Italy.
Heinen, T. (1993). Discrete latent variable models. Tilburg, The Netherlands: Tilburg University Press.
Heinen, T. (1996). Latent class and discrete latent trait models: similarities and differences.
Hsu, T. C. & Yu, L. (1989). Using computers to analyze item response data. Educational Measurement: Issues and Practice, 8.
Hulin, C.L., Drasgow, F., & Parsons, C.K. (1983). Item response theory: Application to psychological measurement. Homewood, IL: Dow Jones-Irwin.
Jeffrey, O. (1987). A comparative analysis of three item selection procedures. (ERIC Document No. ED 292 876).
Koch, W. R. (1983). Liker Scaling using the graded response latent trait model. Applied Psychological Measurement, 7, 15-32.
Levine, M.V., & Rubin, D.B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4, 269-290.
Li, Y. H. (1996). MDEQUATE: A Computer program to compute the multidimensional IRT equating parameters, unpublished manuscript.
Li, Y. H. (1997). An Introduction of Multidimensional IRT Equating Methods for Dichotomous Item Response Data. Psychological Testing, Vol. 44, No.2, 169-194.
Lin, M.H. (1997). Empirical Review of Unidimensionality Measure for the IRT. Taipei, Taiwan: National Science Council, R.O. C., Vol. 7, No. 1,46-65.
Lin, S.H. (1993). Fitting item response theory models to the College Entrance Examination of Taiwan. Ann Arbor MI: A Bell & Howell Company.
Linacre, J.M. and Wright, B.D. (1993). A user’s Guide to BIGSTEPS. Chicago. MESA Press.
Lindsay, B., Clogg, C. C., & Grego, J. (1991). Semigarametric estimation in the Rasch model and related exponential response models, including a simple latent class model for item analysis. Journal of the American Statistical Association, 86, 96-107.
Linn, R.L. (Ed.) (1989). Educational measurement (3re ed.). New York: Macmillan.
Liou, M. (1998). Unidimensionality versus statistical accuracy: a note on Bejar’s method for detecting dimensionality of achievement tests. Applied Psychological Measurement, 12, 381-386.
Lord, F. M. & Novick, M. R. (1968). Statically theories of mental test scores. Reading, MA: Addison-Wesley.
Lord, F. M. (1952). A theory of test scores. Psychometric Monograph, No. 7.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Masters, G. N., & Hyde, N. H. (1984). Measurement attitude to school with a latent trait model. Applied Psychological measurement, 8, 39-48.
McCutcheon, A. L. (1993). Logic models with latent response and polychotomous effects variables. Analysis of latent variables in developmental research.
McDonald, R.P. (1989). Future directions for item response theory. International Journal of Educational Research, 13, 205-220.
Millman, J. & Arter, J. A. (1984). Issues in item banking. Journal of Educational Measurement, 21, 315-330.
Mislevy, R.J., & Bock, R.D. (1982/1990). BILOG 3: Maximumlikelihood item analysis and test scoring with logistic models for binary items. Chicago: International Educational Services.
Nandakumar, R. & Stout, W. (1993). Refinements of Stoutes procedure for assessing latent trait unidimensionality. Journal of Educational Statistics, 18, 41-68.
Nandakumar, R., & Yu, F. (1996). Empirical validation of DIMTEST on nonnormal ability distributions. Journal of Educational Measurement, 33, 355-368.
Nunally, J. C. & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). N. Y.: McGraw-Hill.
Oshima, T. C. & Davey, T. (1994, April). Evaluation of procedures for linking multidimensional item Calibrations, Paper presented at the annual meeting of National Council on Measurement in Education, New Orleans.
Oshima, T. C., Davey, T. & Lee, S. (1997, March). Multidimensional linking: Four practical approaches. Paper presented at the annual meeting of American Educational Research Association, Chicago, IL.
Rasch, G. (1960). Probilistic models for some intelligence and attainment test. Copenhagen: Danish Institute of Educational Research.
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press.
Reckase, M. D. (1997). A linear logistic multidimensional model for dichotomous item response data. In w. j. Linden and R. K. Hambleton (EDS.), Handbook of modern item response theory (pp. 271-268). New York: Springer-Verlag.
Robertson, G., Rengel, E. & Wang J. (1989), March 27-31). The refinement of a single scale for the measurement of writing ability. Paper Presented in G. H. Roid (Chair), Issues in Measuring Growth and Change in Direct writing Assessment. Symposium Conducted at the Annual Meetings of the American Educational Research Association and the National Council on Measurement in Education. San Francisco.
Rogers, A. M. (1992, April). Applying polytomous item response model to real data. A Paper Presented at Annual Meeting of American Educational Research Association. San Francisco.
Ross, B. and Munby, M. (1991). Concept mapping and misconceptions: a study of high-school students’ understandings of acids and bases. International Journal of Science Education, 13(1), 11-23. 詳見Ross, 碩士論文,加拿大Queen’s 大學,Kingston, Ontario, April, 1989, 內有評量工具。
Samejima, F. (1969). Estimating of latent ability using a response pattern of grded scores. Psychometric Monograpg, No. 17.
Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimensional latent space. Psychometrika, 39 111-121.
Sharan, Y., & Sharan, S. (1994). What do we want to study? How should we go about it? Group investigation in the cooperative social studies classroom. In R.J. Stahl, (Ed.). Cooperative learning in the social studies: A handbook for teachers, (pp.157-76). Menlo Park CA: Addison-Wesley Publishing Company.
Stout, W., Nandakumar, R., Junker, B., Chang, H.H., & Steidinger, D. (1992). DIMTEST: A Fortran program for assessing dimensionality of binary item responses. Applied Psychological Measurement, 16, 236.
Suen, H. K. (1990). Principles of test theories. Hillsdale, NJ: Lawrence Erlbaum Associates.
Swaminathan, H. & Gifford, J. A. (1986). Bayesian estimation in the three-parameter logistic model. Psychometrika, 51, 589-601.
Taylor, P.C. (1996). Action research: Enabling teachers to adopt the role of teacher-researcher. Paper presented at 1996 workshop seminars on research method of science classroom environment, Chang-Hua.
Thisen, D.M. (1986). MULTILOG: Item analysis and scoring with multiple category response models (Version 5). Mooresville, IN: Scientific Software.
Traub, Ross E. (1994). Reliability for the Social Sciences Theory and Applications. Thousand Oaks, CA: Sage.
Tucker, L.R. (1946). Maximum Validity of a test with equivalent items. Psychometrika, 11, 1-13.
Urry, V.W. (1974). Approximations to item parameters of mental test models and their uses. Educational and Psychological Measurement, 34, 253-269.
Urry, V.W. (1978). ANCILLES: Item parameter estimation program with normal ogive and logistic three-parameter model options. Washington, DC: Civil Service Commission, Development Center.
Wainer, H., & Braun, H. I. (Eds.) (1988). Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates.
Wainer, H., 7 Messick, S. (Eds.) (1983). Principals of modern psychological measurement: A Festschrift for Frederic M. Lord. Hillsdale, NJ: Lawrence Erlbaum Associates.
Wainer, H., Dorans, N.J., Flaugher, R., Green, B.F., Mislevy, R.J., Steinberg, L., & Thissen, D. (1990). Computerized adaptive testing: A primer. Hillsdale, NJ: Lawrence Erlbaum Associates.
Weiss, D.J. (Ed.) (1980). Proceedings of the 1979 computerized adaptive testing conference. Minneapolis: University of Minnesota.
Weiss, D.J. (Ed.) (1983). New horizons in testing: Latent trait test theory and computerized adaptice testing. New York: Academic.
Wingersky, M.S. (1983). LOGIST: A program for computing maximum likelihood procedures for logistic test models. In R. K. Hambleton (Ed.), Applications of item response theory (pp.45-56). Vancouver, BC: Educational Institute of British Columbia.
Wright, B.D. & Panchapakesan, N. (1969). A procedure for sample-free item analysis. Educational and Psychological Measurement, 29, 23-48.
Wright, B.D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14, 97-166.
Wright, B.D., & Masters, G.N. (1982). Rating scale analysis. Chicago: MESA.
Wright, B.D., & Stone, M.H. (1979). Best test design. Chicago: Mesa.
Yoes, M. E. and Ho, K. T. (1991). The degree of person misfit on a nationally standardized achievement test. (ERIC Document No. Ed 334 212).