Item Response Theory Models (item + response_theory_models)

Distribution by Scientific Domains


Selected Abstracts


Assessing Goodness of Fit of Item Response Theory Models: A Comparison of Traditional and Alternative Procedures

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 4 2003
Clement A. Stone
Testing the goodness of fit of item response theory (IRT) models is relevant to validating IRT models, and new procedures have been proposed. These alternatives compare observed and expected response frequencies conditional on observed total scores, and use posterior probabilities for responses across , levels rather than cross-classifying examinees using point estimates of , and score responses. This research compared these alternatives with regard to their methods, properties (Type 1 error rates and empirical power), available research, and practical issues (computational demands, treatment of missing data, effects of sample size and sparse data, and available computer programs). Different advantages and disadvantages related to these characteristics are discussed. A simulation study provided additional information about empirical power and Type 1 error rates. [source]


Comparison of the Performance of Varimax and Promax Rotations: Factor Structure Recovery for Dichotomous Items

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 1 2006
Holmes Finch
Nonlinear factor analysis is a tool commonly used by measurement specialists to identify both the presence and nature of multidimensionality in a set of test items, an important issue given that standard Item Response Theory models assume a unidimensional latent structure. Results from most factor-analytic algorithms include loading matrices, which are used to link items with factors. Interpretation of the loadings typically occurs after they have been rotated in order to amplify the presence of simple structure. The purpose of this simulation study is to compare the ability of two commonly used methods of rotation, Varimax and Promax, in terms of their ability to correctly link items to factors and to identify the presence of simple structure. Results suggest that the two approaches are equally able to recover the underlying factor structure, regardless of the correlations among the factors, though the oblique method is better able to identify the presence of a "simple structure." These results suggest that for identifying which items are associated with which factors, either approach is effective, but that for identifying simple structure when it is present, the oblique method is preferable. [source]


A primer on classical test theory and item response theory for assessments in medical education

MEDICAL EDUCATION, Issue 1 2010
André F De Champlain
Context, A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. Objectives, The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. Methods, The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. Discussion, Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions. Medical Education 2010: 44: 109,117 [source]


Applied psychometrics in clinical psychiatry: the pharmacopsychometric triangle

ACTA PSYCHIATRICA SCANDINAVICA, Issue 5 2009
P. Bech
Objective:, To consider applied psychometrics in psychiatry as a discipline focusing on pharmacopsychology rather than psychopharmacology as illustrated by the pharmacopsychometric triangle. Method:, The pharmacopsychological dimensions of clinically valid effects of drugs (antianxiety, antidepressive, antimanic, and antipsychotic), of clinically unwanted effects of these drugs, and the patients' own subjective perception of the balance between wanted and unwanted effects are analysed using rating scales assessed by modern psychometric tests (item response theory models) Results:, Symptom rating scales fulfilling the item response theory models have been shown to be psychometrically valid outcome scales as their total scores are sufficient statistics for demonstrating dose,response relationship within the various classes of antianxiety, antidepressive, antimanic or antipsychotic drugs. The total scores of side-effect rating scales are, however, not sufficient statistics, implying that each symptom has to be analysed individually. Self-rating scales with very few items appear to be sufficient statistics when measuring the patients' own perception of quality of life. Conclusion:, Applied psychometrics in psychiatry have been found to cover a pharmacopsychometric triangle illustrating the measurements of wanted and unwanted effects of pharmacotherapeutic drugs as well as health-related quality of life. [source]


The Dimensionality of DSM-IV Alcohol Use Disorders Among Adolescent and Adult Drinkers and Symptom Patterns by Age, Gender, and Race/Ethnicity

ALCOHOLISM, Issue 5 2009
Thomas C. Harford
Background:, There is limited information on the validity of Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) alcohol use disorders (AUD) symptom criteria among adolescents in the general population. The purpose of this study is to assess the DSM-IV AUD symptom criteria as reported by adolescent and adult drinkers in a single representative sample of the U.S. population aged 12 years and older. This design avoids potential confounding due to differences in survey methodology when comparing adolescents and adults from different surveys. Methods:, A total of 133,231 current drinkers (had at least 1 drink in the past year) aged 12 years and older were drawn from respondents to the 2002 to 2005 National Surveys on Drug Use and Health. DSM-IV AUD criteria were assessed by questions related to specific symptoms occurring during the past 12 months. Factor analytic and item response theory models were applied to the 11 AUD symptom criteria to assess the probabilities of symptom item endorsements across different values of the underlying trait. Results:, A 1-factor model provided an adequate and parsimonious interpretation for the 11 AUD criteria for the total sample and for each of the gender,age groups. The MIMIC model exhibited significant indication for item bias among some criteria by gender, age, and race/ethnicity. Symptom criteria for "tolerance,""time spent," and "hazardous use" had lower item thresholds (i.e., lower severity) and low item discrimination, and they were well separated from the other symptoms, especially in the 2 younger age groups (12 to 17 and 18 to 25). "Larger amounts,""cut down,""withdrawal," and "legal problems" had higher item thresholds but generally lower item discrimination, and they tend to exhibit greater dispersion at higher AUD severity, particularly in the youngest age group (12 to 17). Conclusions:, Findings from the present study do not provide support for the 2 separate DSM-IV diagnoses of alcohol abuse and dependence among either adolescents or adults. Variations in criteria severity for both abuse and dependence offer support for a dimensional approach to diagnosis which should be considered in the ongoing development of DSM-V. [source]