Score Interpretation (score + interpretation)

Distribution by Scientific Domains


Selected Abstracts


Does an Argument-Based Approach to Validity Make a Difference?

EDUCATIONAL MEASUREMENT: ISSUES AND PRACTICE, Issue 1 2010
Carol A. Chapelle
Drawing on experience between 2000 and 2007 in developing a validity argument for the high-stakes Test of English as a Foreign LanguageÔ (TOEFL®), this paper evaluates the differences between the argument-based approach to validity as presented byKane (2006)and that described in the 1999 AERA/APA/NCME Standards for Educational and Psychological Testing. Based on an analysis of four points of comparison,framing the intended score interpretation, outlining the essential research, structuring research results into a validity argument, and challenging the validity argument,we conclude that an argument-based approach to validity introduces some new and useful concepts and practices. [source]


An Investigation of Alternative Methods for Item Mapping in the National Assessment of Educational Progress

EDUCATIONAL MEASUREMENT: ISSUES AND PRACTICE, Issue 2 2001
Rebecca Zwick
What is item mapping and how does it aid test score interpretation? Which item mapping technique produces the most consistent results and most closely matches expert opinion? [source]


Reliability of the Clinical Teaching Effectiveness Instrument

MEDICAL EDUCATION, Issue 9 2005
H H Van Der Hem-Stokroos
Introduction, The Clinical Teaching Effectiveness Instrument (CTEI) was developed to evaluate the quality of the clinical teaching of educators. Its authors reported evidence supporting content and criterion validity and found favourable reliability findings. We tested the validity and reliability of this instrument in a European context and investigated its reliability as an instrument to evaluate the quality of clinical teaching at group level rather than at the level of the individual teacher. Methods, Students participating in a surgical clerkship were asked to fill in a questionnaire reflecting a student,teacher encounter with a staff member or a resident. We calculated variance components using the urgenova program. For individual score interpretation of the quality of clinical teaching the standard error of estimate was calculated. For group interpretation we calculated the root mean square error. Results, The results did not differ statistically between staff and residents. The average score was 3.42. The largest variance component was associated with rater variance. For individual score interpretation a reliability of >,0.80 was reached with 7 ratings or more. To reach reliable outcomes at group level, 15 educators or more were needed with a single rater per educator. Discussion, The required sample size for appraisal of individual teaching is easily achievable. Reliable findings can also be obtained at group level with a feasible sample size. The results provide additional evidence of the reliability of the CTEI in undergraduate medical education in a European setting. The results also showed that the instrument can be used to measure the quality of teaching at group level. [source]


Summative Assessment in Medicine: The Promise of Simulation for High-stakes Evaluation

ACADEMIC EMERGENCY MEDICINE, Issue 11 2008
John R. Boulet PhD
Abstract Throughout their careers, physicians are exposed to a wide array of assessments, including those aimed at evaluating knowledge, clinical skills, and clinical decision-making. While many of these assessments are used as part of formative evaluation activities, others are employed to establish competence and, as a byproduct, to promote patient safety. In the past 10 years, simulations have been successfully incorporated in a number of high-stakes physician certification and licensure exams. In developing these simulation-based assessments, testing organizations were able to promote novel test administration protocols, build enhanced assessment rubrics, advance sophisticated scoring and equating algorithms, and promote innovative standard-setting methods. Moreover, numerous studies have been conducted to identify potential threats to the validity of test score interpretations. As simulation technology expands and new simulators are invented, this groundbreaking work can serve as a basis for organizations to build or expand their summative assessment activities. Although there will continue to be logistical and psychometric problems, many of which will be specialty- or simulator-specific, past experience with performance-based assessments suggests that most challenges can be addressed through focused research. Simulation, whether it involves standardized patients (SPs), computerized case management scenarios, part-task trainers, electromechanical mannequins, or a combination of these methods, holds great promise for high-stakes assessment. [source]