研究生: |
許純瑜 |
---|---|
論文名稱: |
評分者信心與評分者內變異之相關研究 |
指導教授: | 陳柏熹 |
學位類別: |
碩士 Master |
系所名稱: |
教育心理與輔導學系 Department of Educational Psychology and Counseling |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 66 |
中文關鍵詞: | 評分者信心 、評分者內變異 、隨機效果多面向模式 |
英文關鍵詞: | rater confidence, intra-rater variation, random-effects facet model |
論文種類: | 學術論文 |
相關次數: | 點閱:207 下載:7 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究的目的是在探討評分者在評分過程中的評分信心程度與其評分結果變異程度的關係。研究方法的部分,本研究共分為兩個子研究,研究一是藉由模擬的方式以了解影響隨機效果多面向模式參數與評分者內變異數估計準確度的因素,作為研究二中資料分析的參考。研究二則是透過隨機效果多面向模式進行自編電腦化創造力測驗的實徵資料分析,並進一步探討評分者信心程度與評分者內變異數、受試者能力估計之間的相關。
研究一的結果顯示,每位評分者進行評分的作品份數、評分規準數以及評分者內變異分配情形均會影響隨機效果多面向模式中固定效果參數與隨機效果參數的估計準確性。當評分作品份數越多、評分規準個數增加,或評分者內變異較小的情況下,對於參數估計會較為準確;相反的,若評分作品份數太少、評分標準個數較少,或評分者變異較大的情況下,則參數估計的準確度會較差。每份作品評分人數的多寡則不會影響兩種參數估計的準確性。研究二的結果顯示,在創新性評分規準中,評分者自評信心分數與評分者內變異數兩者之間的相關未達顯著,然而將其中一位評分者的結果排除後,可以發現其餘六位評分者的信心分數與評分者內變異數大小呈正相關;實用性評分規準的部分,評分者的自評信心分數與評分者內變異數兩者的相關則未達顯著。針對以上結果,作者最後提出若干未來研究與實務建議。
The goal of the research was to explore the correlation of rater confidence and intra-rater variation. There are two studies in this research. In Study 1, simulations were conducted to examine the variables that might be the factors which influence precision of parameters estimation under random-effects facet model. The result of Study 1 was as reference to Study 2. Study 2 was an empirical study. The real data was analyzed with random-effect facets model and was examined the correlation between rater confidence and intra-rater variation.
The results of Study 1 indicated that rating numbers per rater, numbers of rating criteria and magnitudes of intra-rater variation affected the precision of parameters estimation through random-effects facet model. The parameters estimation was higher precision for the situation of more rating numbers per rater, more rating criteria numbers and small intra-rater variation. There was no difference on precision of parameters estimation between 2 and 4 raters. The results of Study 2 indicated that there was no significant correlation between rater confidence and intra-rater variation on creativity criteria. However, when we excluded the data of one rater, we found that there was positive correlation between rater confidence and intra-rater variation. There was also no significant correlation between rater confidence and intra-rater variation on utility criteria. According to the results of this research, the researcher proposed some opinions for future study and practice.
國民中學學生基本學力測驗推動工作委員會 (2009)。評分者,兩個恰恰好?-以國民中學學生基本學力測驗寫作測驗為例。飛揚,60,12-15。
陳柏熹、李佩隃、許純瑜、洪素蘋 (2010)。創造歷程與創造成果的結構方程模式分析。國立台灣師範大學教育學院「2010兩岸資優與創造力教育發展研討會」宣讀之論文 (台北)。
黃國禎、朱蕙君、王榕榆 (2008)。以答題信心度為基礎之線上診斷評量系統。師大學報:教育類,53,1-24。
Andrich, D. (1978). A rating formulation for ordered response categories. psychometrika, 43, 561-573.
Bands, C., & Murphy, K. (1985). Toward narrowing the research-practice gap in performance appraisal. Personnel Psychology, 38, 335-345.
Bernardin, H. J. & Orban, J. (1985). Leniency effect as a function of rating format, purpose for appraisal and individual differences. Presented at Annual meeting of the Academy of Management, Boston.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A bayesian random effects model for testlets. Psychometrika, 64(2), 153-168.
Cleveland, J. N. & Murphy, K. R. (1992). Analyzing performance appraisal as goal-directed behavior. In G. Ferris and K. Rowland (Eds.), Research in personnel and human resources management, (Vol. 10, pp. 121-185). Greenwich, CT: JAI Press.
Hambleton, R. K., & Jones, R. W. (1994). Comparison of empirical and judgmental procedures for detecting differential item functioning. Educational Research Quarterly, 18, 23-36.
Hensel, R., Meijers, F. Leeden, R. Ver der & Kessels, J. (2010). 360 degree feedback: how many raters do you need for reliable ratings on the capacity to develop competeces with personal qualities as development goals? International Journal of Human Resource Management, 21(15), 2811-2828.
Klein, S. P., Stecher, B. M., Shavelson, R. J., McCaffrey, D., Ormseth, T., Bell, R. M., Comfort, K. & Othman, A. R. (1998). Analytic versus holistic scoring performance tasks. Applied Measurement in Education, 11(2), 121-137.
Landy, F. J. & Farr, J. L. (1983). The measurement of work performance: Methods,theory and applications. New York, NY: Academic Press.
Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA Press.
Linacre, J. M. (1994). Sample size and item calibration stability. Rasch Measurement Transactions,7:4, 328.
Lo, J. J., Wang, H. M., & Yeh, S. W. (2004). Effects of confidence scores and remedial instruction on prepositions learning in adaptive hypermedia. Computers & Education, 42, 45-63.
Lunz, M., Stahl, J. & Wright, B. (1994). Interjudge reliability and decision reproducibility. Educational and Psychological Measurement, 54(4), 913-925.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
Murphy, K. R. & Cleveland, J. N. (1995). Understanding performance appraisal: Social organizational, and goal-based perspectives. Thousand Oaks, CA: Sage Publications.
Murphy, K., Cleveland, J., Henle, C., Morgan, K., Orth, M., & Tziner, A. (1996). Effects of proximal and distal context variables on performance appraisal quality: A model and framework for research. Proceedings of Fifteenth Biennial Applied Behavioral Sciences Symposium, US Air Force Academy (USAFA-TR-96-2). Colorado Springs, Co.
Napier, N. & Latham, G. (1986). Outcome expectancies of people who conduct performance appraisals. Personnel Psychology, 39, 827-837.
Petr, D. W. (2000). Measuring (and enhancing?) student confidence with confidence scores. Paper presented at the 2000 30th ASEE/IEEE Frontiers in Education Conference in Kansas City, Mo.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Institute of Educational Research.
Sackett, P. R. & Wilson, M. A. (1982). Factors affecting the consensus judgment process in managerial assessment centers. Journal of Applied Psychology, 67, 10-17.
Steiner, I. D. (1972). Group processes and productivity. New York: Academic Press.
Tziner, A. & Murphy, K. R. (1999). Additional evidence of attitudinal influences in performance appraisal. Journal of Business and Psychology, 13(3), 407-419.
Villanova, P., Bernardin, H. J., Dahmus, S. A., & Sims, R. L. (1993). Rater Leniency and performance appraisal discomfort. Educational and Psychological Measurement, 53, 789-799.
Wang, W. C., & Wilson, M. (2005). Exploring local item dependence using a random-effects facet model. Applied Psychological Measurement, 29, 296-318.
Wolfe, E. W. (2004). Identifying rater effects using latent trait models. Psychology Science, 46, 35-51.
Wolfe, E.W. & Gitomer, D. H. (2001). The influence of changes in assessment design on thepsychometric quality of scores. Applied Measurement in Education, 14, 91-107.
Wolfe, E.W., & Miller, T.R.(1997). Barriers to the implementation of the portfolio assessment in secondary education. Applied Measurement in Educational, 10(3), 235-251.
Wright, B. D. & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions. Part 2. Chicago:MESA. ,8:3, 370.
Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: MESA.
Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). ConQuest [Computer software and manual]. Camberwell, Victoria, Australia: Australian Council for Educational Research.
Zalesny, M. D. (1990). Rater Confidence and social influence in performance appraisals. Journal of Applied Psychology, 75(3), 274-289.