Author: |
黃維綱 |
---|---|
Thesis Title: |
多群組離散型驗證性因素分析模型在多元計分試題差異功能檢定之研究 |
Advisor: | 蔡蓉青 |
Degree: |
碩士 Master |
Department: |
數學系 Department of Mathematics |
Thesis Publication Year: | 2012 |
Academic Year: | 100 |
Language: | 中文 |
Number of pages: | 44 |
Keywords (in Chinese): | 試題差異功能 、多群組離散型驗證性因素分析模型 、強韌性卡方差異檢定 、基線模式開放法 、Bonferroni 修正 |
Keywords (in English): | DIF, multiple-group categorical CFA, robust chi-square difference test, free baseline strategy, Bonferroni correction |
Thesis Type: | Academic thesis/ dissertation |
Reference times: | Clicks: 178 Downloads: 23 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
本研究在探討在多群組離散型驗證性因素分析模型下,利用強韌性卡方差異檢定,並且配合基線模式開放法來檢測多元計分題之試題差異功能(DIF) 的有效性。我們利用模擬實驗來調查在不同的樣本數、群組之平均潛在能力差異、DIF 比例、DIF 強度、DIF 類型以及顯著水準類型等因素條件下,該檢測之型一誤差和檢定力的表現,以了解這些因素對檢測有效性的影響。研究結果發現:整體而言,強韌性卡方差異檢定能有效的檢測出DIF 試題;在以下的情境檢定力較高:大樣本數、重度DIF、DIF 類型為僅因素負荷量上有DIF 和因素負荷量和閾值均有DIF 時檢定力較高;是否有群組之平均潛在能力差異、DIF 比例二者則對檢定力影響不顯著;Bonferroni 修正由於過度保守,建議無須特別採用。另外,與過去文獻比較時發現:使用基線模式開放法比基線模式限制法有明顯較低的型一誤差,而基線模式限制法在經過Oort 調整過顯著水準後則有可接受的型一誤差。但不論調整與否,基線模式限制法比基線模式開放法平均來講有較佳的檢定力。再者,分析時視多元計分試題資料為離散型、檢測DIF 前先篩選出不配適的模型並不會使檢定力增加。
The aim of this study is to assess the efficiency of using multiple group categorical CFA and robust chi-square difference test in DIF detection for polytomous items under the free baseline strategy. Simulation studies are conducted to examine the empirical type I error and power of DIF detection and the effects of five factors are investigated, including sample sizes, impacts, DIF percentages, DIF sizes, and types of DIF. Based on our results, robust chi-square difference test is shown to be efficient in detecting DIF for polytomous items, especially under the conditions of large sample size, large DIF size, and either factor loadings or both factor loadings and thresholds having DIF. Moreover, impact and DIF percentages do not seem to make significant difference in power for DIF detection. Bonferroni correction appears to be too conservative and therefore is not recommended for use. Compared to past studies with constrained baseline strategy, free baseline strategy seems to result in smaller type I errors. However, correcting the significance level of the former strategy using Oort’s approach will result in acceptable type I error. On average, higher powers are usually obtained for constrained-baseline than free-baseline strategy no matter whether Oort’s correction is applied. Furthermore, regarding polytomous data as discrete rather than continuous and adding the process of examining model fit before DIF detection do not seem to increase power in DIF detection.
中文部分
何宗岳(2011)。模擬與實徵試題差異功能之指標效能分析:IRT 法及CFA 法之比較,國立嘉義大學教育學系研究所,博士論文,未出版。
洪秀玉(2007)。以多組群驗證性因素分析探討測量恆等性之模擬研究,國立臺中教育大學教育測驗統計研究所,碩士論文,未出版。
陳冠志(2006)。因素負荷量之測量恆等性檢測模擬研究,國立臺中教育大學教育測驗統計研究所,碩士論文,未出版。
蔡良庭、楊志堅、王文中、施慶麟(2008)。應用MIMIC 模式評估方法以檢定試題差異性之研究。測驗學刊,55,287-312。
英文部分
Asparouhov, T., & Muthén, B. (2006). Robust chi-square difference testing with mean and variance adjusted test statistics. Mplus Web Notes, 10.
Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures. The issue of partial measurement invariance. Psychological Bulletin, 105, 456-466.
Camilli, G., & Shepard, L. (1994). Methods for Identifying Biased Test Items. New Park, CA: Sage.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing masurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9, 233-255.
Dorans, N., & Holland, P. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35-66). Hillsdale, NJ: Erlbaum.
Drasgow, F., & Kanfer, R. (1985). Equivalence of psychological measurement in heterogeneous populations. Journal of Applied Psychology, 70, 662-680.
Elosua, P. (2011). Assessing measurement equivalence in ordered-categorical data. Psicológica, 32, 403-421.
Elosua, P., & Wells, C. (2008). A comparison of MACS and the IRT likelihood ratio test for ddentifying DIF. Paper presented at III European Congress of Methodology, Oviedo.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.
González-Romá, V., Hernández, A., & Gómez-Benito, J. (2006). Power and type I error of the meanand covariance structure analysis model for detecting differential item functioning in graded response items. Multivariate Behavioral Research, 41, 29-53.
Holland, W. P. & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test Validity (pp. 129-145). Hillsdale, NJ: LEA.
Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8 User’s Reference Guide, Chicago, IL: Scientific Software.
Ke, M-J. (2010). Evaluation of Mean and Covariance Structure Analysis Model in Detecting Differential Item Functioning of Polytomous Items: in Comparison with GMH, and Poly-SIBTEST. Unpublised master thesis, Department of Mathematics, National Taiwan Normal University, Taipei.
Kim, E. S. & Yoon, M. (2011). Testing measurement invariance: A comparison of multiple-group categorical CFA and IRT. Structural Equation Modeling: A Multidisciplinary Journal, 18, 212-228.
Lubke, G. H., & Muthén, B. O. (2004). Applying multigroup confirmatory factor models for continuous outcomes to Likert scale data complicates meaningful group comparisons. Structural Equation Modeling: A Multidisciplinary Journal, 11,
514-534.
MacIntosh, R., & Hashim, S. (2003). Variance estimation for converting MIMIC model parameters to IRT parameters in DIF analysis. Applied Psychological Measurement, 27, 372-379.
Mellenbergh, G. J. (1985). Vraag-onzuiverheid: deflnitie, detectie en onderzoek. Nederlands Tijdschrift voor de Psychologie, 40, 425-435.
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127-143.
Maydeu-Olivares, A., & Cai, L. (2006). A cautionary note on using G2 (dif) to assess relative model fit in categorical data analysis. Multivariate Behavioral Research, 41, 55-64.
Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Organizational Research Methods, 7, 361-388.
Muthén, B. (1993). Goodness of fit with categorical and other nonnormal variables. In: Bollen, K.A., Long, J.S. (Eds.), Testing Structural Equation Models (pp. 205-234). Sage, Newbury Park, CA.
Muthén, B. (1985). A method for studying the homogeneity of test items with respect to other relevant variables. Journal of Educational Statistics, 10, 121-132.
Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished manuscript, College of Education, University of California, Los Angeles.
Muthén, B. O., & Muthén, L. K. (1998-2012). Mplus: Statistical analysis with latent variables. Version 4.2. Los Angeles: Statmodel.
Oort, F. J. (1998). Simulation study of item bias detection with restricted factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 5, 107-124.
Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87, 517-529.
Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C.C. Clogg (Eds.), Latent variable analysis:Applications to developmental research (pp. 399-419). Thousand Oaks, CA: Sage.
Satorra, A., & Bentler, P. M. (2001). A scaled difference chisquare test statistic for moment structure analysis. Psychometrika, 66, 507-514.
Shih, C.-L., & Wang, W.-C. (2009). Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33, 184-199.
Muthén, L. K., & Muthén, B. O. (1998-2007). Mplus User’s Guide. Los Angeles: Muthén & Muthén.
Sörbom, D. (1974). A general method for studying differences in factor means and factor structures between groups. British Journal of Mathematical and Statistical Psychology, 27, 229-239.
Sörbom, D. (1982). Structural equation models with structured means. In K. G. Jöreskog & H. Wold (Eds), Systems Under Indirect Observation (pp. 183-195). Amsterdam: North-Holland.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting Differential Item Functioning With Confirmatory Factor Analysis and Item Response Theory: Toward a Unified Strategy, Journal of Applied Psychology, 91, 1292-1306.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestion, practices, and recommandations for organizational research. Organizational Research Methods, 3, 4-69.
Wang, W.-C., & Shih, C.-L. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69, 713-731.
Yang, C. C. (2005). Multiple indicators multiple causes latent class analysis model for alcoholic diagnosis. Structural Equation Modeling: A Multidisciplinary Journal, 12, 138-156.