Classical Test Theory (classical + test_theory)

Distribution by Scientific Domains


Selected Abstracts


A primer on classical test theory and item response theory for assessments in medical education

MEDICAL EDUCATION, Issue 1 2010
André F De Champlain
Context, A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. Objectives, The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. Methods, The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. Discussion, Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions. Medical Education 2010: 44: 109,117 [source]


(Mis) Conception About Generalizability Theory

EDUCATIONAL MEASUREMENT: ISSUES AND PRACTICE, Issue 1 2000
Robert L. Brennan
In what sense is generalizability theory an extension of classical test theory? In what sense is generalizability theory an application of analysis of variance. [source]


A Bayesian predictive analysis of test scores

JAPANESE PSYCHOLOGICAL RESEARCH, Issue 1 2001
Hidetoki Ishii
In the classical test theory, a high-reliability test always leads to a precise measurement. However, when it comes to the prediction of test scores, it is not necessarily so. Based on a Bayesian statistical approach, we predicted the distributions of test scores for a new subject, a new test, and a new subject taking a new test. Under some reasonable conditions, the predicted means, variances, and covariances of predicted scores were obtained and investigated. We found that high test reliability did not necessarily lead to small variances or covariances. For a new subject, higher test reliability led to larger predicted variances and covariances, because high test reliability enabled a more accurate prediction of test score variances. Regarding a new subject taking a new test, in this study, higher test reliability led to a large variance when the sample size was smaller than half the number of tests. The classical test theory is reanalyzed from the viewpoint of predictions and some suggestions are made. [source]


Refining the measurement of exposure to violence (ETV) in urban youth

JOURNAL OF COMMUNITY PSYCHOLOGY, Issue 5 2007
Robert T. Brennan
Correlational analysis, classical test theory, confirmatory factor analysis, and multilevel Rasch modeling were used to refine a measure of adolescents' exposure to violence (ETV). Interpersonal violence could be distinguished from other potentially traumatic events; it was also possible to distinguish three routes of exposure (victimization, witnessing, and learning of). Correlations confirmed that ETV subscales are related to measures of aggression, delinquency, and depression/anxiety. Reliability was improved by combining ETV subscales and/or caregiver and youth reports. Valid and reliable measures of ETV are critical to future research in associating violence exposure with common mental health and behavioral outcomes and disorders, and tracking how early violence exposure may affect future outcomes for adolescents. © 2007 Wiley Periodicals, Inc. J Comm Psychol 35: 603,618, 2007. [source]


How Often Do Subscores Have Added Value?

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 2 2010
Results from Operational, Simulated Data
Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman suggested a method based on classical test theory to determine whether subscores have added value over total scores. In this article I first provide a rich collection of results regarding when subscores were found to have added value for several operational data sets. Following that I provide results from a detailed simulation study that examines what properties subscores should possess in order to have added value. The results indicate that subscores have to satisfy strict standards of reliability and correlation to have added value. A weighted average of the subscore and the total score was found to have added value more often. [source]


A family of measures to evaluate scale reliability in a longitudinal setting

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 1 2009
Annouschka Laenen
Summary., The concept of reliability denotes one of the most important psychometric properties of a measurement scale. Reliability refers to the capacity of the scale to discriminate between subjects in a given population. In classical test theory, it is often estimated by using the intraclass correlation coefficient based on two replicate measurements. However, the modelling framework that is used in this theory is often too narrow when applied in practical situations. Generalizability theory has extended reliability theory to a much broader framework but is confronted with some limitations when applied in a longitudinal setting. We explore how the definition of reliability can be generalized to a setting where subjects are measured repeatedly over time. On the basis of four defining properties for the concept of reliability, we propose a family of reliability measures which circumscribes the area in which reliability measures should be sought. It is shown how different members assess different aspects of the problem and that the reliability of the instrument can depend on the way that it is used. The methodology is motivated by and illustrated on data from a clinical study on schizophrenia. On the basis of this study, we estimate and compare the reliabilities of two different rating scales to evaluate the severity of the disorder. [source]


A primer on classical test theory and item response theory for assessments in medical education

MEDICAL EDUCATION, Issue 1 2010
André F De Champlain
Context, A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. Objectives, The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. Methods, The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. Discussion, Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions. Medical Education 2010: 44: 109,117 [source]