研究生: |
洪素蘋 Su-Pin Hung |
---|---|
論文名稱: |
在認知診斷測量架構中的試題差異功能偵測效果探討 Detecting differential item functioning in a framework of cognitive diagnostic measurement |
指導教授: |
陳柏熹
Chen, Po-Hsi 陳學志 Chen, Hsueh-Chih |
學位類別: |
博士 Doctor |
系所名稱: |
教育心理與輔導學系 Department of Educational Psychology and Counseling |
論文出版年: | 2012 |
畢業學年度: | 101 |
語文別: | 英文 |
論文頁數: | 157 |
中文關鍵詞: | 認知診斷測量 、試題差異功能檢驗 、限制式高階層再參數化DINA模式 、限制式高階層再參數化DINO模式 |
英文關鍵詞: | Cognitive diagnostic measurement, Differential item functioning, restricted higher-order reparameterized DINA model, restricted higher-order reparameterized DINO model |
論文種類: | 學術論文 |
相關次數: | 點閱:206 下載:19 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
試題差異功能檢驗已被視為在測驗發展過程的重要程序。隨著認知診斷評量持續在實務與方法學研究方面受到關注,在認知診斷測量架構下的試題差異功能議題自然也莫可忽視。本研究涵蓋三大目的,首先,本研究提出以模式為基礎所進行的試題差異功能偵測方法以處理認知診斷評量架構下的補償與非補償性資料;其次,本研究聚焦於過去在認知診斷測量架構下的試題差異功能研究中所忽視的當測驗受到偏誤試題污染的相關議題。最後,本研究以更系統性的探討可能影響試題差異功能偵測方法成效的因素,並將這些可能的影響因素導入於模擬研究設計中。本研究以馬克夫鍊蒙地卡羅演算法分別針對兩個所提出的模式進行參數估計,並且比較參數回覆性效果,同時檢驗在不同測驗情境下,使用模式為基礎的試題差異功能偵測方法與非參數取向的MH以及LR等試題差異功能偵測方法的型一錯誤率以及統計檢定力。除此之外,本研究加入了淨化程序於MH以及LR等試題差異功能偵測方法之中,並探討加入試題淨化程序後對於試題差異功能偵測的效能能否提升。最後,本研究使用2007年國際數學與科學教育成就趨勢調查研究中四年級數學科評量為範例,說明如何運用所提出的試題差異功能偵測方法於實務情境中。研究結果發現,在參數回覆方面,本研究所提出的兩個模式為基礎的試題差異功能偵測方法其參數回覆性效果甚佳。而在不同試題差異功能偵測方法的比較方面,本研究發現在相同測驗情境下以模式為基礎的試題差異功能檢驗方法其型一錯誤率的控制以及統計檢定力均優於MH以及LR。再者,模擬研究結果發現,當處理認知診斷測量資料時,試題遭受污染而未加以進行淨化程序即進行試題差異功能偵測,將會影響偵測效果,並且得到錯誤的結論。隨著淨化程序的加入,可以幫助改善MH以及LR等試題差異功能偵測方法在特定情境下的型一錯誤率的控制以及統計檢定力。不過此兩種方法,即使加入淨化程序後,仍無助於解決當受試者平均能力分布差異很大時,所造成的第一類型錯誤率膨脹的問題。最後,本研究也發現相較於MH以及LR等試題差異功能偵測方法,本研究所提出的模式為基礎的試題差異功能偵測方法在試題差異功能偵測的結果解釋較為細緻,並且能藉由模式擴展找出可能造成試題差異功能原因的前瞻性。
Detection of Differential item functioning, DIF has been recognizing as an important procedure especially in test development. With the cognitive diagnostic measurements, CDMs continue to receive attention both in applied and methodological studies. DIF related issues in the framework of CDMs remain to concern. The purpose of the study had three objectives; first, to propose model based DIF detection method in dealing compensatory and non-compensatory cognitive diagnostic data; second, to address on the contaminated matching criterion issue that has be overlook in the past DIF study within the CDM framework; third, to investigate more possible factors that may affect DIF detection methods and introduced into the simulation design. An MCMC algorithm employing Gibbs sampling was used to estimate the two proposed models and simulation study was done to examine model recovery, Type I error rates, and power under different testing conditions. For DIF detection, the model based method was also compared with the MH method and LR method. Furthermore, the purification procedure is applied in the MH and LR methods and compared with the model based method to investigate the effectiveness of DIF detection methods. Finally, TIMSS 2007 fourth grade mathematics assessment was used to demonstrate and the results were used to illustrate the implementation of the new method. The parameter recovery of the proposed models yielded well. The simulation results of DIF methods comparison appeared to confirm that the model based method outperformed the MH and LR methods in Type I error control and power rate under comparable testing conditions. Moreover, the result revealed that the biased matching criterion may also determine the effectiveness of DIF detection in a framework of cognitive diagnostic measurement. With purification procedure, could improve the Type I errors and power rates for MH and LR under specific circumstance. Finally, the model based method had the strength of interpreting results more elaborately compared to the other DIF methods.
Candell, G. L. & Drasgow, F. (1988). An purification procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260.
Chaimongkol, S. (2005). Modeling differential item functioning (DIF) using multilevel logistic regression models: A Bayesian perspective. Unpublished doctoral dissertation, The Florida State University.
de la Torre, J. (2009). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34, 115–130.
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179-199.
de la Torre, J. & Karelitz, T. M. (2009). Impact of diagnosticity on the adequacy of models for cognitive diagnostic under a linear attribute structure: A simulation study. Journal of Educational Measurement, 46(4), 450-469.
de la Torre, J., & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333-353.
de la Torre, J., & Douglas, J. (2008). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction and subtraction data. Psychometrika, 73, 595-624.
de la Torre, J., & Lee, Y. -S. (2010). A Note on the Invariance of the DINA Model Parameters. Journal of Educational Measurement, 47(1), 115-127.
de la Torre, J., Hong, Y., & Deng, W. (2010). Factors affecting the item parameter estimation and classification accuracy of the DINA model. Journal of Educational Measurement, 47(2), 227-249.
DeMars, C. E. (2010). Type I Error Inflation for Detecting DIF in the Presence of Impact. Educational and Psychological Measurement, 70(6), 961-972.
DeCarlo, L. T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-Matrix. Applied Psychological Measurement, 35, 8-26.
Dogan, E., & Tatsuoka, K. K. (2008). An international comparison using a diagnostic testing model: Turkish students’ profile of mathematical skills on TIMSS-R. Educational Studies in Mathematics, 68(3), 263-272.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland, & H. Wainer (Eds.), Differential item functioning (pp. 35-66). Hillsdale, NJ: Erlbaum.
Fidalgo, A. M., Mellenbergh, G. J., & Mu iz, J. (2000). Effects of amount of DIF, test length, and purification type on Robustness and power of Mantel-Haenszel procedures. Methods of Psychological Research Online, 5(3), 43-54.
Finch, W. H., & French, B. F. (2007). Detecting of crossing differential item functioning: A comparison of four methods. Educational and Psychological Measurement, 67, 565-582.
French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373-391.
Forst, U. A.,Hyde, J. S.,Fennema, E.(1994).Gender, mathematics performance, and mathematics-related attitudes affect :a meta-analysis. International Journal of Educational Research,21(4),373-385.
Gierl, M. J., Zheng, Y., & Cui, Y. (2008).Using the Attribute Hierarchy Method to Identify and Interpret Cognitive Skills that Produce Group Differences. Journal of Educational Measurement, 45(1), 65-89.
Haertel, E. H. (1989). Using restricted latent class models to map the attribute structure of achievement items. Journal of Educational Measurement, 26, 333-352.
Hartz, S. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Unpublished doctoral dissertation, University of Illinois, Urbana-Champaign.
Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191-210.
Henson, R., Templin, J., & Douglas, J. (2007). Using efficient model based sum-scores for conducting skills diagnoses. Journal of Educational Measurement, 44(4), 361-376.
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum.
Jodoin, M. G. and Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329-349.
Junker, B., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258-272.
Lane, S.,Wang, N.,Magone, M.(1996).Gender-related differential item functioning on a middle-school mathematics performance assessment. Educational Measurement: Issues and practice,15(4),121-127.
Lee, Y.-S., Park, Y. S., & Taylan, D. (2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and the U.S. National sample using the TIMSS 2007. International Journal of Testing, 11, 144-177.
Li, H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647-677.
Li, F. (2008). A modified higher-order DINA model for detecting differential item functioning and differential attribute functioning. Unpublished doctoral dissertation, University of Georgia.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Magis, D., Beland, S. & Raiche, G. (2011). difR: Collection of methods to detect dichotomous differential item functioning (DIF) in psychometrics. R package version 4.1. difR Package.
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187-212.
Mazor, K. M., Kanjee, A., & Clauser, B. E. (1995). Using logistic regression and the
Mantel-Haenszel with multiple ability estimates to detect differential item functioning. Journal of Educational Measurement, 32, 131-144.
Narayanan, P., & Swaminathan, H. (1996). Identification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257-274.
Oshima, T. C., & Miller, M. D. (1992). Multidimensionality and Item Bias in item response theory. Applied Psychological Measurement 16, 237-248.
Penfield, R. D., and Camilli, G. (2007). Differential item functioning and item bias. In C. R. Raoand S. Sinharray (Eds.), Handbook of Statistics 26: Psychometrics (pp. 125-167). Amsterdam, The Netherlands: Elsevier.
R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing,Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116.
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample and studied item parameters on SIBTEST and Mantel-Haenzel type I error performance. Journal of Educational Measurement, 33, 215-230.
Roussos, L. A., Templin, J. L., & Henson, R. A. (2007). Skills diagnosis using IRT-based latent class models. Journal of Educational Measurement, 44(4), 293-311.
Rupp, A. A., & Templin, J. L.(2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state of the art. Measurement, 6, 219-262.
Rupp, A. A., Templin, J. L., & Henson, R. A. (2010). Diagnostic assessment: Theory, methods, and applications. New York: Guilford Press.
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194
Shih, C. L., & Wang, W. C. (2009). Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33, 184-199.
Spiegelhalter, D. J., Thomas, A., & Best, N. (2003). WinBUGS version 1.4 [Computer Program.]. Cambridge, UK: MRC Biostatistics Unit, Institute of Public Health.
Swaminathan, H., & Rogers, J. H. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Templin, J., & Henson, R. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287-305.
Templin, J. L., Henson, R. A., Templin, S. E., & Roussos, L.(2008). Robustness of hierarchical modeling of skill association in cognitive diagnosis models. Applied Psychological Measurement, 32(7), 559-574.
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale, NJ: Erlbaum
von Davier, M. (2005). A general diagnostic model applied to language testing data (ETS Research Rep. No. RR-05-16). Princeton, NJ: Educational Testing Service.
Uttaro, T., & Millsap, R. E. (1994). Factors influencing the Mantel-Haenszel procedure in the detection of differential item functioning. Applied Psychological Measurement, 18, 15-25.
Wang, W. -C., & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.
Wang, W.-C. (2008). Assessment of differential item functioning. Journal of Applied Measurement, 9(4), 387-408.
Wang, W.-C., & Su, Y.-H. (2004a). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mental-Haenszel method. Applied Measurement in Education, 17, 113-144.
Wang, W.-C., & Su, Y.-H. (2004b). Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Measurement in Education, 28, 450-480.
Zhang, W. (2006). Detecting differential item functioning using the DINA model. Unpublished doctoral dissertation, University of North Carolina at Greensboro.