研究生: |
吳宜玲 Wu, Yi-Ling |
---|---|
論文名稱: |
跨向度轉換程序對多向度多階段適性測驗測量精準度的影響 The Influence of Routing Modules Between Dimensions on Measurement Precision in Multidimensional Multistage Adaptive Testing |
指導教授: |
陳柏熹
Chen, Po-Hsi |
口試委員: |
蘇雅蕙
Su, Ya-Hui 黃宏宇 Huang, Hung-Yu 劉振維 Liu, Chen-Wei 陳珈文 Chen, Chia-Wen 陳柏熹 Chen, Po-Hsi |
口試日期: | 2022/08/25 |
學位類別: |
博士 Doctor |
系所名稱: |
教育心理與輔導學系 Department of Educational Psychology and Counseling |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 281 |
中文關鍵詞: | 試題反應理論 、電腦化適性測驗 、多階段適性測驗 |
英文關鍵詞: | item response theory, computerized adaptive testing, multistage adaptive testing |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202201870 |
論文種類: | 學術論文 |
相關次數: | 點閱:193 下載:26 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多階段適性測驗(computerized multistage adaptive testing, MSAT)為電腦化適性測驗(computerized adaptive testing, CAT)的一種特例,它擁有CAT的優點,相較於線性測驗,可以使用較少的題數達到與CAT相近的測量精準度。本研究探討題間二向度MSAT與題內四向度MSAT,在不同跨向度轉換程序對於測量精準度的影響,分為三個子研究。
研究一為提出題間二向度MSAT設計,在已知受試者樣本分佈狀態且向度能力間相關正確,分別使用單向度二參數對數模式(unidimensional two-parameter logistic model, 2PL)與多向度二參數IRT 模式(multidimensional item response model, M2PL)估計能力,研究發現M2PL模式測量精準較高。其次,在跨向度測驗的轉換程序,利用答對題數或迴歸模型進行適性,受試者的能力估計均方根差(root mean square error, RMSE)較小。
研究二為提出題內四向度MSAT設計,在已知受試者樣本分佈狀態且向度能力間的相關正確,設計不同跨向度的轉換程序,研究發現利用迴歸模型進行跨向度的轉換程序時,當向度間能力相關程度越高,模板1–3–3–3的設計最佳,模板1–3–2–3的設計為其次,均優於無跨向度轉換程序的設計,受試者的能力估計RMSE較小。
研究三探討當受試者樣本分佈狀態未知且不一定正確,不同能力組合之受試者進行題間二向度MSAT與題內四向度MSAT,對於測量經準度的影響。研究發現,在極端能力受試者的能力估計RMSE較大,而中等能力受試者的能力估計RMSE較小。當受試者能力越不符合向度間能力相關程度時,其能力估計RMSE越大。
在進行多向度MSAT時,利用向度間能力的相關進行適性,可以有效的降低受試者的能力估計RMSE,受試者僅需作答部分的試題,就能達到良好的測量精準度,節省測驗時間。
Computerized multistage adaptive testing (MSAT) is a particular case of computerized adaptive testing (CAT). MSAT has the merits of CAT, and the measurement accuracy is similar to CAT with fewer items than linear tests. This study explored the effect of two-dimensional MSAT in between-item structure and four-dimensional MSAT in within-item structure with different routing modules between dimensions on the measurement accuracy. The study is divided into three sub-studies.
In Study 1, a two-dimensional MSAT in a between-item structure was proposed. The unidimensional two-parameter logistic model (2PL) and multidimensional item response model (M2PL) were employed to estimate the ability of the subjects with known sample distribution status and the correct correlation between the dimensional abilities. It was found that the M2PL model was more accurate. Secondly, in the routing modules between dimensions, the root mean square error (RMSE) in the estimation of the ability is fewer when the number of correct answers or the regression model was used for adaptation.
In Study 2, a four-dimensional MSAT in a within-item structure was proposed. With the correct correlation between the dimensional abilities and the known sample distribution status of the subjects, it was found that using the regression model for routing modules between dimensions, the panel 1–3–3–3 design was superior to the panel 1–3–2–3 design, and all better than the design without routing module between dimensions, and the ability estimation error was minimal.
In study 3, the effect of the two-dimensional MSAT in between-item structure and four-dimensional MSAT in within-item structure on the measurement accuracy for the subjects with different ability combinations when their sample distribution status was unknown. It was found that the RMSE was the highest for the extreme ability examinees and the lowest for the moderate ability examinees. When the examinees' ability did not match the cross-directional ability correlation, their RMSE of ability estimation was larger.
In sum, the two MSAT designs can utilize between-dimension correlation to improve testing efficiency. The examinees only need to answer some of the items and save testing time.
王文中(2004):〈Rasch測量模式與其在教育與心理之應用〉。《教育與心理研究》,27,637–694。[Wang, W. C. (2004). Rasch measurement theory and application in education and psychology. Journal of Education and Psychology, 27, 637–694.]
陳柏熹(2006):〈能力估計方法對多向度電腦化適性測驗測量精準度的影響〉。《教育心理學報》,38,195–211。[Chen, P.-H. (2006). The influences of the ability estimation methods on the measurement accuracy in multidimensional computerized adaptive testing. Bulletin of Educational Psychology, 38, 195–211.]
陳柏熹、王文中(2004):〈曝光率控制對多向度電腦化適性測驗能力估計信度之影響-以2001年國中基本學力測驗資料為例〉。《教育與心理研究》,27,181–213。[Chen, P. H., & Wang, W. C. (2004). Influences of item exposure control on reliability of ability estimation in multidimensional adaptive testing: Using the empirical data of the 2001 basic competency test for junior high school. Journal of Educational and Psychology, 27, 181–213.]
陳柏熹、黃宏宇、王文中(2008):〈題組之相關特性對電腦化適性測驗測量精準度的影響〉。《測驗學刊》,55,129–150。[Chen, P.-H., Huang, H.-Y., & Wang, W.-C. (2008). The influences of the features of testlet on computerized adaptive testing. Psychological Testing, 55, 129–150.] https://doi.org/10.7108/PT.200804.0129
Adams, R. J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23. https://doi.org/10.1177/0146621697211001
Armstrong, R. D., Jones, D. H., Li, X., & Wu, I.-L. (1996). A study of a network-flow algorithm and a noncorrecting algorithm for test assembly. Applied Psychological Measurement, 20, 89–98. https://doi.org/10.1177/014662169602000108
Bachman, L. F., & Palmer, A. S. (1981). The construct validation of the FSI oral interview. Language Learning, 31, 67–86. https://doi.org/10.1111/j.1467-1770.1981.tb01373.x
Bachman, L. F., & Palmer, A. S. (1982). The construct validation of some components of communicative proficiency. TESOL Quarterly, 16, 449–465. https://doi.org/10.2307/3586464
Bachman, L. F., & Palmer, A. S. (1989). The construct validation of self-ratings of communicative language ability. Language Testing, 6, 14–29. https://doi.org/10.1177/026553228900600104
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques. Marcel Dekker.
Bejar, I. I., & Weiss, D. J. (1979). Computer programs for scoring test data with item characteristic curve models (Research Report. No. 79-1). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program.
Berger, M. P. F. (1994). A general approach to algorithmic design of fixed-form tests, adaptive tests, and testlets. Applied Psychological Measurement, 18, 141–153. https://doi.org/10.1177/014662169401800204
Birnbaum, A. (2008). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical Theories of Mental Test Scores (pp. 397–479). Addison-Wesley.
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444. https://doi.org/10.1177/014662168200600405
Breithaupt, K. J., Zhang, O. Y., & Hare, D. R. (2014). The multistage testing approach to the AICPA uniform certified public accounting examinations. In D.Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 343–354). Routledge.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
Chalmers, R. P. (2016). Generating adaptive and non-adaptive test interfaces for multidimensional item response theory applications. Journal of Statistical Software, 71(5), 1–39. https://doi.org/10.18637/jss.v071.i05
Chen, P. H. (2009). Comparison of adaptive bayesian estimation and weighted bayesian estimation in multidimensional computerized adaptive testing. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing (pp. 1–20). Minneapolis.
Davies, N. F. (1976). Receptive versus productive skills in foreign language learning. The Modern Language Journal, 60, 440–443. https://doi.org/10.2307/326052
Edwards, M. C., Flora, D. B., & Thissen, D. (2012). Multistage computerized adaptive testing with uniform item exposure. Applied Measurement in Education, 25, 118–141. https://doi.org/10.1080/08957347.2012.660363
Galton, F. (1886). Regression towards mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246–263. https://doi.org/10.2307/2841583
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Springer Publishing Company. https://doi.org/10.1007/978-94-017-1988-9
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. SAGE Publications.
Hambleton, R. K., & Xing, D. (2006). Optimal and nonoptimal computer-based test designs for making pass-fail decisions. Applied Measurement in Education, 19, 221–239. https://doi.org/10.1207/s15324818ame1903_4
Hamner, B., & Frasco, M. (2018). Metrics: Evaluation metrics for machine learning. R package version 0.1.4. https://CRAN.R-project.org/package=Metrics
Han, K. T. (2013). Item pocket method to allow response review and change in computerized adaptive testing. Applied Psychological Measurement, 37, 259–275. https://doi.org/10.1177/0146621612473638
Han, K. T. (2019). Framework for developing multistage testing with intersectional routing for short-length tests. Applied Psychological Measurement, 44, 87–102. https://doi.org/10.1177/0146621619837226
Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44–52. https://doi.org/10.1111/j.1745-3992.2007.00093.x
Hsu, C.-L., Wang, W.-C., & Chen, S.-Y. (2013). Variable-length computerized adaptive testing based on cognitive diagnosis models. Applied Psychological Measurement, 37, 563–582. https://doi.org/10.1177/0146621613488642
Hsu, C.-L., & Wang, W.-C. (2015). Variable-length computerized adaptive testing using the higher order DINA model. Journal of Educational Measurement, 52, 125–143. https://doi.org/10.1111/jedm.12069
Hsu, C.-L., & Wang, W.-C. (2019). Multidimensional computerized adaptive testing using non-compensatory item response theory models. Applied Psychological Measurement, 43, 464–480. https://doi.org/10.1177/0146621618800280
Huang, H.-Y. (2020). Utilizing response times in cognitive diagnostic computerized adaptive testing under the higher-order deterministic input, noisy 'and' gate model. The British Journal of Mathematical and Statistical Psychology, 73, 109–141. https://doi.org/10.1111/bmsp.12160
In’nami, Y., Koizumi, R., & Nakamura, K. (2016). Factor structure of the Test of English for Academic Purposes (TEAP®) test in relation to the TOEFL iBT® test. Language Testing in Asia, 6(1), 1–23. https://doi.org/10.1186/s40468-016-0025-9
Jewsbury, P. A., & van Rijn, P. W. (2020). IRT and MIRT models for item parameter estimation with multidimensional multistage tests. Journal of Educational and Behavioral Statistics, 45, 383–402. https://doi.org/10.3102/1076998619881790
Jodoin, M. G., Zenisky, A., & Hambleton, R. K. (2010). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19, 203–220. https://doi.org/10.1207/s15324818ame1903_3
Kelderman, H. (1996). Multidimensional Rasch models for partial-credit scoring. Applied Psychological Measurement, 20, 155–168. https://doi.org/10.1177/014662169602000205
Kingsbury, G. G. (1996). Item review and adaptive testing [paper presentation]. Annual Meeting of the National Conference on Measurement in Education, New York.
Kim, S., Moses, T., & Yoo, H. (2015). A comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement, 52, 70–79. https://doi.org/10.1111/jedm.12063
Lee, C., & Qian, H. (2022). Hybrid threshold-based sequential procedures for detecting compromised items in a computerized adaptive testing licensure exam. Educational and Psychological Measurement, 82, 782–810. https://doi.org/10.1177/00131644211023868
Li, G., Cai, Y., Gao, X., Wang, D., & Tu, D. (2021). Automated test assembly for multistage testing with cognitive diagnosis. Frontiers in Psychology, 12, Article 509844. https://doi.org/10.3389/fpsyg.2021.509844
Llosa, L. (2007). Validating a standards-based classroom assessment of English proficiency: A multitrait-multimethod approach. Language Testing, 24, 489–515. https://doi.org/10.1177/0265532207080770
Lord, F. M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227–242. https://doi.org/10.1007/BF02297844
Lord, F. M. (1974). Practical methods for redesigning a homogenous test, also for designing a multilevel test. ETS Research Bulletin Series, 1, 1–26. https://doi.org/10.1002/j.2333-8504.1974.tb00659.x
Lord, F. M. (1977). A broad-range tailored test of verbal ability. Applied Psychological Measurement, 1, 95–100. https://doi.org/10.1177/014662167700100115
Lord, F. M. (1980). Applications of item response theory to practical testing problems (1st ed.). Routledge. https://doi.org/10.4324/9780203056615
Loyd, B. H. (1984, February 9-11). Efficiency and precision in two-stage adaptive testing [paper presentation]. Annual Meeting of the Eastern Educational Research Association, West Palm Beach.
Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389–404. https://doi.org/10.1177/014662169602000406
Luecht, R. M. (1998). Computer-assisted test assembly using optimization heuristics. Applied Psychological Measurement, 22, 224–236. https://doi.org/10.1177/01466216980223003
Luecht, R. M. (2003). Exposure control using adaptive multi-stage item bundles [paper presentation]. National Council on Measurement in Education, Chicago.
Luecht, R. M. (2014). Design and implementation of large-scale multistage testing systems. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 69–83). Routledge.
Luecht, R. M., & Hirsch, T. M. (1992). Item selection using an average growth approximation of target information functions. Applied Psychological Measurement, 16, 41–51. https://doi.org/10.1177/014662169201600104
Luecht, R. M., & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35, 229–249. https://doi.org/10.1111/j.1745-3984.1998.tb00537.x
Luecht, R. M., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189–202. https://doi.org/10.1207/s15324818ame1903_2
Luo, X., & Kim, D. (2018). A top-down approach to designing the computerized adaptive multistage test. Journal of Educational Measurement, 55, 243–263. https://doi.org/10.1111/jedm.12174
Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer Publishing Company.
Martin, M. O., von Davier, M., Foy, P., & Mullis, I. V. S. (2021). Chapter 3 PIRLS 2021 assessment design. In I. V. S. Mullis & M. O. Martin (Eds.), PIRLS 2021 assessment frameworks (pp. 57–74). International Association for the Evaluation of Educational Achievement (IEA).
Melican, G. J., Breithaupt, K., & Zhang, Y. (2009). Designing and implementing a multistage adaptive test: the uniform CPA exam. In W. J. van der Linden, & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 167–189). Springer Publishing Company.
Mills, C. N., Potenza, M. T., Fremer, J. J., & Ward, W. C. (2002). Computer-based testing: Building the Foundation for Future Assessments. Lawrence Erlbaum Associates Publishers.
Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359–381. https://doi.org/10.1007/BF02306026
Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Chapter 3: Scaling procedures in NAEP. Journal of Educational Statistics, 17, 131–154. https://doi.org/10.3102/10769986017002131
Mulder, J., & van der Linden, W. J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74, 273–296. https://doi.org/10.1007/s11336-008-9097-5
Olaf, M., Heike, T., Detlef, S., & Bjorn, B. (2018). truncnorm: Truncated normal distribution. R package version 1.0-8. https://CRAN.R-project.org/package=truncnorm
Oller, J. W. Jr. (1983). Evidence for a general language proficiency factor: An expectancy grammar. In J. W. Jr. Oller (Ed.), Issues in language testing research (pp. 3–10). Newbury House.
Oranje, A., Mazzeo, J., Xu, X., & Kulick, E. (2014). A multistage testing approach to group-score assessments. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 371–390). Routledge.
Patton, J. M., Cheng, Y., Yuan, K. H., & Diao, Q. (2013). The influence of item calibration error on variable-length computerized adaptive testing. Applied Psychological Measurement, 37, 24–40. https://doi.org/10.1177/0146621612461727
R Core Team (2021). R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. https://www.R-project.org/
Rasch, G. (1960). Probabilistic model for some intelligence and achievement tests. Danish Institute for Educational Research.
Reese, L. M., Schnipke, D. L., & Luebke, S. W. (1999). Incorporating content constraints into a multi-stage adaptive testlet design (Computerized Testing Report 97-02). Law School Admissions Council.
Robin, F., Steffen, M., & Liang, L. (2014). The multistage test implementation of the GRE® revised General Test. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 325–341). Routledge.
Sang, F., Schmitz, B., Vollmer, H. J., Baumert, J., & Roeder, P. M. (1986). Models of second language competence: A structural equation approach. Language Testing, 3, 54–79. https://doi.org/10.1177/026553228600300103
Sawaki, Y. (2007). Construct validation of analytic rating scales in a speaking assessment: Reporting a score profile and a composite. Language Testing, 24, 355–390. https://doi.org/10.1177/0265532207077205
Sawaki, Y., Stricker, L. J., & Oranje, A. H. (2009). Factor structure of the TOEFL internet-based test. Language Testing, 26, 5–30. https://doi.org/10.1177/0265532208097335
Schnipke, D. L., & Reese, L. M. (1999). A comparison of testlet-based test designs for computerized adaptive testing (Computerized Testing Report 97-01). Law School Admissions Council. https://eric.ed.gov/?id=ED409366
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331–345. https://doi.org/10.1007/BF02294343
Segall, D. O. (2001). General ability measurement: An application of multidimensional item response theory. Psychometrika, 66, 79–97. https://10.1007/BF02295734
Segall, D. O. (2010). Principles of multidimensional adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized Adaptive Testing: Theory and Practice (pp. 57–75). Springer Publishing Company.
Shin, S.-K. (2005). Did they take the same test? Examinee language proficiency and the structure of language tests. Language Testing, 22, 31–57. https://doi.org/10.1191/0265532205lt296oa
Stocking, M. L. (1994). Three practical issues for modern adaptive testing item pools (Research Report RR-94-5). ETS. https://doi.org/10.1002/j.2333-8504.1994.tb01578.x
Stocking, M. L. (1997). Revising item responses in computerized adaptive tests: A comparison of three models. Applied Psychological Measurement, 21, 129–142. https://doi.org/10.1177/01466216970212003
Urry, V. W. (1977). Tailored testing: A successful application of latent trait theory. Journal of Educational Measurement, 14, 181–196. https://doi.org/10.1111/j.1745-3984.1977.tb00035.x
van der Linden, W. J., & Glas, C. A. W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13, 35–53. https://doi.org/10.1207/s15324818ame1301_2
van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. Springer Publishing Company. https://doi.org/10.1007/978-0-387-85461-8
Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with s. Springer Publishing Company. https://doi.org/10.1007/b97626
Vispoel, W. P. (1998). Reviewing and changing answers on computer adaptive and self-adaptive vocabulary tests. Journal of Educational Measurement, 35, 328–345. https://doi.org/10.1111/j.1745-3984.1998.tb00542.x
Vispoel, W. P., Hendrickson, A. B., & Bleiler, T. (2000). Limiting answer review and change on computerized adaptive vocabulary tests: Psychometric and attitudinal results. Journal of Educational Measurement, 37, 21–38. https://doi.org/10.1111/j.1745-3984.2000.tb01074.x
von Davier, M., Sinharay, S., Oranje, A., & Beaton, A. (2006). The statistical procedures used in National Assessment of Educational Progress: recent developments and future directions. In C. R. Rao & S. Sinharay (Eds.), Handbook of Statistics: Psychometrics (pp. 1039–1055). Elsevier.
Wainer, H., & Kiely, G. (1987). Item clusters and computerized adaptive testing: a case for testlets. Journal of Educational Measurement, 24, 185–201. https://doi.org/10.1111/j.1745-3984.1987.tb00274.x
Wainer, H. (1990). Computerized adaptive testing: A primer. Lawrence Erlbaum Associates Publishers.
Wainer, H. (1993). Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, 12, 15–20. https://doi.org/10.1111/j.1745-3992.1993.tb00519.x
Wainer, H., Kaplan, B., & Lewis, C. (1992). A comparison of the performance of simulated hierarchical and linear testlets. Journal of Educational Measurement, 29, 243–251. https://doi.org/10.1111/j.1745-3984.1992.tb00376.x
Wainer, H., Lewis, C., Kaplan, B., & Braswell, J. (1990). An adaptive algebra test: A testlet-based, hierarchically-structured test with validity-based scoring (Technical Report 90–92). ETS.
Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339–368. https://doi.org/10.3102/10769986012004339
Wang, S., Fellouris, G., & Chang, H.-H. (2017). Computerized adaptive testing that allows for response revision: design and asymptotic theory. Statistica Sinica, 27, 1987–2010. https://doi.org/10.5705/ss.202015.0304
Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35, 109–135. https://doi.org/10.1111/j.1745-3984.1998.tb00530.x
Wang, W.-C. (2004). Direct estimation of correlation as a measure of association strength using multidimensional item response models. Educational and Psychological Measurement, 64, 937–955. https://doi.org/10.1177/0013164404268671
Wang, W.-C., & Chen, P.-H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing Applied Psychological Measurement, 28, 295–316. https://doi.org/10.1177/0146621604265938
Wang, W.-C., Chen, P.-H., & Cheng, Y.-Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116–136. https://doi.org/10.1037/1082-989X.9.1.116
Wang, W.-C., Liu, C.-W., & Wu, S.-L. (2013). The random-threshold generalized unfolding model and its application of computerized adaptive testing. Applied Psychological Measurement, 37, 179–200. https://doi.org/10.1177/0146621612469720
Wang, W.-C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29, 126–149. https://doi.org/10.1177/0146621604271053
Weiss, D. J. (1974). Strategies of adaptive ability measurement. Psychometric Methods Program, Research Report 74–5, Department of Psychology, University of Minnesota, Minneapolis.
Weiss, D. J., & McBride, J. R. (1984). Bias and information of bayesian adaptive testing. Applied Psychological Measurement, 8, 273–285. https://doi.org/10.1177/014662168400800303
Wickham, H., Francois, R., Henry, L., & Muller, K. (2022). dplyr: A grammar of data manipulation. R package version 1.0.9. https://CRAN.R-project.org/package=dplyr
Wise, S. L. (1996). A critical analysis of the arguments for and against item review in computerized adaptive testing [paper presentation]. National Conference on Measurement in Education, New York.
Wu, M., Tam, H. P., & Jen, T.-H. (2016). Educational measurement for applied researchers. Springer Publishing Company. https://doi.org/10.1007/978-981-10-3302-5
Xiong, X. (2018). A hybrid strategy to construct multistage adaptive tests. Applied Measurement in Education, 42, 1–14. https://doi.org/10.1177/0146621618762739
Yamamoto, K., Khorramdel, L., & Shin, H. J. (2018). Introducing multistage adaptive testing into international large-scale assessments designs using the example of PIAAC. Psychological Test and Assessment Modeling, 60, 347–368.
Yamamoto, K., Shin, H. J., & Khorramdel, L. (2018). Multistage adaptive testing design in international large-scale assessments. Educational Measurement: Issues and Practice, 37, 16–27. https://doi.org/10.1111/emip.12226
Yamamoto, K., Shin, H. J., & Khorramdel, L. (2019). Introduction of multistage adaptive testing design in PISA 2018. OECD Education Working Papers, 209, OECD Publishing, Paris. https://doi.org/10.1787/b9435d4b-en
Yan, D., Lewis, C., & von Davier, A. A. (2014). A tree-based approach for multistage testing. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 169–188). Routledge.
Yang, L., & Reckase, M. D. (2020). The optimal item pool design in multistage computerized adaptive tests with the p-optimality method. Educational and Psychological Measurement, 80, 955–974. https://doi.org/10.1177/0013164419901292
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–214. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x
Zenisky, A. L., & Hambleton, R. K. (2014). Multistage test designs: Moving research results into practice. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 21–37). Routledge.
Zenisky, A. L., Hambleton, R. K., & Luecht, R. M. (2010). Multistage testing: issues, designs and research. In W. J. van der Linden, & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 355–372). Springer Publishing Company. https://doi.org/10.1007/978-0-387-85461-8_18