簡易檢索 / 詳目顯示

研究生: 柯明錦
Ming-Jin Ke
論文名稱: Evaluation of Mean and Covariance Structure Analysis Model in Detecting Differential Item Functioning of Polytomous Items: in Comparison with GMH, and Poly-SIBTEST
Evaluation of Mean and Covariance Structure Analysis Model in Detecting Differential Item Functioning of Polytomous Items: in Comparison with GMH, and Poly-SIBTEST
指導教授: 蔡蓉青
Tsai, Rung-Ching
學位類別: 碩士
Master
系所名稱: 數學系
Department of Mathematics
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 89
中文關鍵詞: DIFMIMICMACSDIFFTESTGMHPoly-SIBTEST
論文種類: 學術論文
相關次數: 點閱:178下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

Item bias is a well-known and important issue in educational testing and therefore the investigation of Differential Item Functioning (DIF) has long been proven valuable in evaluating the fairness or quality of an item. In this thesis, a simulation study was conducted for polytomous responses to evaluate the efficacy of using the multiple-group Structural Equation Model (SEM) to detect DIF, in comparison with two nonparametric DIF indices, Poly-SIBTEST and Generalized Mantel-Haenzel (GMH) methods. The multiple group SEM model was used to generate data with five ordered response categories. Items exhibiting DIF were modified by allowing their threshold parameters to differ between the focal and reference groups. Five factors were manipulated: three test lengths (10, 20, 30), three sample sizes (500, 1000, and 3000), two ability distributions, four percentages of DIF items (0%, 10%, 20%, and 30%), and four different size proportions of the focal and reference groups (80/20, 70/30, 60/40, and 50/50), to produce all the conditions of the data sets. Each condition was replicated 100 times to facilitate Type I error and Power calculations of the three detection procedures under consideration.
Our results suggested that the DIFFTEST approach under MG-MACS to DIF detection had the smallest overall Type I error rate and comparable or higher Power than GMH and Poly-SIBTEST in all conditions. When polytomous data were generated under the MG-MACS model, the DIFFTEST procedure was viable for DIF detection even for tests with as few as ten items. In conclusion, DIFFTEST performed the best for detecting DIF in polytomous items while comparing to GMH and Poly-SIBTEST, for yielding lower Type I error rates and higher Power for different types of DIF conditions.

1. Introduction ……………………………………………………………………...P1 2. DIF Indices…………………………………………………..………………......P20 3.Simulation Study..……………….……………...………………..………...........P27 3.1 Data Generation…………………………………………………………...P27 3.2 Factors Manipulated…………………………………………….………...P33 4. Results………………………………….…………………….………..………...P34 4.1 Type I error Study ……………………………...………........……………P48 4.2 Power Study ………………………………………..……......……………P54 5. Discussion and Conclusion………………………………………….…..……...P61 6. Limitations and Suggestion for Future Research……….…..…………….....P63 Appendix …………………………………………………………………….……P66 References ………………………………………………………….......................P77

References
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. Lord, & M. R. Novick, Statistical theories of mental test scores. Reading MA: Addison-Wesley.

Bollen, K. A. (1989). Structural equations with latent variables. New York, NY: Wiley.

Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15, 113-141.

Camilli, G., & Shepard, L. A. (1994). Methods for Identifying Biased Test Items. Sage:Thousand Oaks.

Chang, H. H. & Mazzeo, J.(1994). The unique correspondence of the item response function and item category response functions in polytomously scored item response models. Psychometrika, 59, 391-404.

Chang, H., Mazzeo, J. & Roussos, R. (1996). Detect DIF for polytomously scored items: An adaptation of Shealy-Stout's SIBTEST procedure. Journal of Educational Measurement, 33, 333-353.

Chan, D. (2000). Detection of Differential Item Functioning on the Kirton Adaptation-Innovation Inventory using multiple-group Mean and Covariance Structure analyses. Multivariate Behavioral Research, 35, 169-199.

Cohen, A. S., Kim, S.-H.,& Baker, F. B. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17, 335-350.

Everitt, B. S., & Hothorn, T. (2006). A handbook of statistical analyses Using R. Chapman & Hall/CRC: Newark.

Fidalgo A. M., & Madeira J. M.(2008) Generalized Mantel-Haenszel Methods for Differential Item Functioning Detection. Educational and Psychological Measurement, 68(6): 940-958.

Finch. H. (2005). The MIMIC Model as a Method for Detecting DIF: Comparison With Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio. Applied Psychological Measurement, 29, 278-295.

Finch, H., & French, B. F. (2007). Detection of Crossing Differential Item Functioning. A Comparison of Four Methods. Educational and Psychological Measurement, 67(4): 565-582.

Flowers, C. P., Oshima, T. C., & Raju, N. S. (1999). A description and demonstration of the Polytomous-DFIT framework. Applied Psychological Measurement, 23, 309-326.

French, A. W., & Miller, T. R.(1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33, 315-332.

González-Romá, V., Hernández, A., & Gómez-Benito, J. (2006). Power and Type I error of the mean and covariance Structure analysis model for detecting Differential Item Functioning in graded response items. Multivariate Behavioral Research, 41, 29-53.

Hernández, A., & González-Romá, V. (2003). Evaluating the multiple-group mean and covariance Structure analysis model for the detection of Differential Item Functioning in polytomous ordered items. Psichotema, 15, 322-327.

Jöreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70, 631-639.

Jöreskog, K. G., & Sörbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command language. Chicago, IL: Scientific Software International.

Kristjansson, E., Aylesworth, R., McDowell, I., & Zumbo, B. D. (2005). A comparison of four methods for detecting DIF in ordered response items. Educational and Psychological Measurement, 65,935-953

Lee, J. (2009). Type I error and Power of the mean and covariance structure confirmatory factor analysis for differential item functioning detection: methodological issues and resolutions. PhD Dissertation. University of Kansas.

Li, H.-H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647-677.

Mantel, N. (1963). Chi-square tests with one degree of freedom; Extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association, 58, 690-700.

Mantel, N., & Haenszel, W. (1959) Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.

Mapuranga, R., Dorans, N. J., & Middleton, K. (2008). A review of recent developments in differential item functioning, Paper presented at the annual meeting of the National Council on Measurement in Education (NCME) held in March, 2008, New York.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.

McArdle, J. J., & McDonald, R. P. (1984). Some algebraic properties of the reticular action model for moment structures. British Journal of Mathematical and Statistical Psychology, 37, 234-251.

Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525-543.

Millsap, R. E., & Everson, H. T. (1993). Methodology review: statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334.

Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132.

Muthén, B. (1988). Some uses of structural equation modeling in validity studies: Extending IRT to external variables. In H. Wainer, & H. Braun (Eds.), Test Validity (pp. 213-238). Hillsdale, NJ: Erlbaum Associates.
Muthén, B.(2006). Robust chi square difference testing with mean and variance adjust test stasistics. Mplus Web.

Muthén, B., & Christoffersson, A. (1981). Simultaneous factor analysis of dichotomous variables in several groups. Psychometrika, 46, 407-419.

Muthén, B. O., Kao, C., & Burstein, L. (1991). Instructionally sensitive psychometrics: An application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, 1-22.

Muthén, B., & Lehman. J. (1985). Multiple-group IRT modeling: Applications to item bias analysis. Journal of Educational Statistics, 10, 133-142.

Muthén L.K., & Muthén B.O. (2004) Mplus User’s Guide. Los Angeles

Narayanan, P., & Swaminathan, H. (1996) Identification of items that show non-uniform DIF. Applied Psychological Measurement, 20, 257-274.

Penfield, R. D., & Lam, T. C. M.(2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19, 5-15.

Potenza, M. T., & Dorans, N. J.(1995). DIF assessment for polytomously scored items: a framework for classification and evaluation. Applied Psychological Measurement, 19, 23-37.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 17, 1-100.

Satorra, A. (2000). Scaled and adjusted restricted tests in multi-sample analysis of moment structures. In Innovations in Multivariate Statistical Analysis: A Festschrift for Heinz Neudecker. Heijmans, D.D.H., Pollock,D.S.G. and Satorra, A. (eds.), pp. 233-247, Kluwer Academic Publishers, Dordrecht.

Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507-514.

Satorra, A., & Bentler, P. (1999) A scaled difference chi-square test statistic
for moment structure analysis. Technical report university of California,
Los Angeles.http://preprints.stat.ucla.edu/260/chisquare.pdf

Shealy, R., & Stout, W. (1993a). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194.

Shealy, R., & Stout, W. F. (1993b). An item response theory model for test bias and differential test functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 197-239). Hillsdale, NJ: Erlbaum.

Shih, C.-L., & Wang, W.-C.* (2009). Differential item functioning detection using the MIMIC method with a pure short anchor. Applied Psychological Measurement, 33, 184-199.

Sörbom, D. (1974). A general method for studying differences in factor means and factor structures between groups. British Journal of Mathematical and Statistical Psychology, 27, 229-239.

Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1202-1306.

Stark, S., Chernyshenko, S., Chuah, D., Lee, W., & Wadlington, P. (2001). Detection of DIF using the SIBTEST procedure. University of Illinois at Urbana-Champaign: IRT Modeling Lab. Available on the internet at http://work.psych.uiuc.edu/irt/dif_sibtest.asp.

Teresi, J., & Fleishman, J. (2007). Differential item functioning and health assessment. Quality of Life Research, 16, 33-42.

Thissen, D., Steinberg, L., & Gerard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99,118-128.

Tsai, L.T., Yang, C.C., Wang, W.C., & Shih. C.L. (2008). A simulation study on using MIMIC model to assess the accuracy of differential item functioning (in Chinese), Psychological Testing, 55, 287-312.
Wang, W.-C., & Su, Y.-H. (2004). Factors Influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28, 450-480.

Wang, W.-C., Shih. C.-L., & Yang, C.C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69, 713-731.

Wall, L. (1996). Prorgamming Perl, 2nd edition, O'Reilly & Associates.

Woods, C. M.(2009).Evaluation of MIMIC–Model Methods for DIF Testing With Comparison to Two–Group Analysis. Multivariate Behavioral Research, 44,1-27.

Zwick, R., Donoghue, J. R., & Grima, A. (1993). Assessment of differential item
functioning for performance tasks. Journal of Educational Measurement, 30,
233-251.

下載圖示
QR CODE