Score Variance (score + variance)

Distribution by Scientific Domains


Selected Abstracts


Cut-off scores in MMSE: a moving target?

EUROPEAN JOURNAL OF NEUROLOGY, Issue 5 2010
J. Morgado
Background:, Cognitive tests are known to be influenced by language, culture and education. In addition, there may be an impact of ,epoch' in cognition, because there is secular increase in scores of IQ tests in children. If we assume this is a long lasting process, then it should persist later in life. Methods:, To test this hypothesis, we compared the performance of two cohorts of individuals (,50 years of age), evaluated 20 years apart using the Mini-Mental State Examination (MMSE). Results:, Study population included 135 participants in 1988 and 411 in 2008. MMSE scores were higher in 2008 than in 1988 for literacy x age-matched subgroups, the difference being significant for participants with lower literacy. Score variance was explained by literacy (, = 0.479, t = 14.598, P = 0.00), epoch (, = 0.34, t = 10.33, P = 0.00) and age (, = ,0.142, t = ,4.184, P = 0.00). Conclusion:, The present results are in accordance with a lifelong secular improvement in cognitive performance. The operational cut-off values may change with time, which may have clinical impact in the diagnosis of disorders like mild cognitive impairment or dementia. [source]


Do maternal stress and home environment mediate the relation between early income-to-need and 54-months attentional abilities?

INFANT AND CHILD DEVELOPMENT, Issue 5 2007
Janean E. Dilworth-Bart
Abstract Using Ecological Systems Theory and stage sequential modelling procedures for detecting mediation, this study examined how early developmental contexts impact preschoolers' performances on a measure of sustained attention and impulse control. Data from 1273 European-American and African-American participants in the NICHD Study of Early Child Care were used to identify the potential mediators of the relation between early household income-to-need (INR) and 54-month impulsivity and inattention. Exploratory analyses were also conducted to determine whether the relationships between early income, home environment, parenting stress, and the outcome variables differ for African-American versus European-American-American children. We found modest support for the study hypothesis that 36-month home environment quality mediated the INR/attention relationship. INR accounted for more home environment score variance and home environment accounted for more Impulsivity score variance for African-American children. Home environments were related to inattention in the European-American, but not African-American, group. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Reliability and Attribute-Based Scoring in Cognitive Diagnostic Assessment

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 3 2009
Mark J. Gierl
The attribute hierarchy method (AHM) is a psychometric procedure for classifying examinees' test item responses into a set of structured attribute patterns associated with different components from a cognitive model of task performance. Results from an AHM analysis yield information on examinees' cognitive strengths and weaknesses. Hence, the AHM can be used for cognitive diagnostic assessment. The purpose of this study is to introduce and evaluate a new concept for assessing attribute reliability using the ratio of true score variance to observed score variance on items that probe specific cognitive attributes. This reliability procedure is evaluated and illustrated using both simulated data and student response data from a sample of algebra items taken from the March 2005 administration of the SAT. The reliability of diagnostic scores and the implications for practice are also discussed. [source]


Generalisability in unbalanced, uncrossed and fully nested studies

MEDICAL EDUCATION, Issue 4 2010
Ajit Narayanan
Medical Education 2010: 44: 367,378 Objectives, There is growing interest in multi-source, multi-level feedback for measuring the performance of health care professionals. However, data are often unbalanced (e.g. there are different numbers of raters for each doctor), uncrossed (e.g. raters rate the doctor on only one occasion) and fully nested (e.g. raters for a doctor are unique to that doctor). Estimating the true score variance among doctors under these circumstances is proving a challenge. Methods, Extensions to reliability and generalisability (G) formulae are introduced to handle unbalanced, uncrossed and fully nested data to produce coefficients that take into account variances among raters, ratees and questionnaire items at different levels of analysis. Decision (D) formulae are developed to handle predictions of minimum numbers of raters for unbalanced studies. An artificial dataset and two real-world datasets consisting of colleague and patient evaluations of doctors are analysed to demonstrate the feasibility and relevance of the formulae. Another independent dataset is used for validating D predictions of G coefficients for varying numbers of raters against actual G coefficients. A combined G coefficient formula is introduced for estimating multi-sourced reliability. Results, The results from the formulae indicate that it is possible to estimate reliability and generalisability in unbalanced, fully nested and uncrossed studies, and to identify extraneous variance that can be removed to estimate true score variance among doctors. The validation results show that it is possible to predict the minimum numbers of raters even if the study is unbalanced. Discussion, Calculating G and D coefficients for psychometric data based on feedback on doctor performance is possible even when the data are unbalanced, uncrossed and fully nested, provided that: (i) variances are separated at the rater and ratee levels, and (ii) the average number of raters per ratee is used in calculations for deriving these coefficients. [source]


The reliability of summative judgements based on objective structured clinical examination cases distributed across the clinical year

MEDICAL EDUCATION, Issue 7 2007
George R Bergus
Context, Objective structured clinical examinations (OSCEs) can be used for formative and summative evaluation. We sought to determine the generalisability of students' summary scores aggregated from formative OSCE cases distributed across 5 clerkships during Year 3 of medical school. Methods, Five major clerkships held OSCEs with 2,4 cases each during their rotations. All cases used 15-minute student,standardised patient encounters and performance was assessed using clinical and communication skills checklists. As not all students completed every clerkship or OSCE case, the generalisability (G) study was an unbalanced student × (case : clerkship) design. After completion of the G study, a decision (D) study was undertaken and phi (,) values for different cut-points were calculated. Results, The data for this report were collected over 2 academic years involving 262 Year 3 students. The G study found that 9.7% of the score variance originated from the student, 3.1% from the student,clerkship interaction, and 87.2% from the student,case nested within clerkship effect. Using the variance components from the G study, the D study suggested that if students completed 3 OSCE cases in each of the 5 different clerkships, the reliability of the aggregated scores would be 0.63. The ,, calculated at a cut-point 1 standard deviation below the mean, would be approximately 0.85. Conclusions, Aggregating case scores from low stakes OSCEs within clerkships results in a score set that allows for very reliable decisions about which students are performing poorly. Medical schools can use OSCE case scores collected over a clinical year for summative evaluation. [source]


Achieving acceptable reliability in oral examinations: an analysis of the Royal College of General Practitioners membership examination's oral component

MEDICAL EDUCATION, Issue 2 2003
Val Wass
Background, The membership examination of the Royal College of General Practitioners (RCGP) uses structured oral examinations to assess candidates' decision making skills and professional values. Aim, To estimate three indices of reliability for these oral examinations. Methods, In summer 1998, a revised system was introduced for the oral examinations. Candidates took two 20-minute (five topic) oral examinations with two examiner pairs. Areas for oral topics had been identified. Examiners set their own topics in three competency areas (communication, professional values and personal development) and four contexts (patient, teamwork, personal, society). They worked in two pairs (a quartet) to preplan questions on 10 topics. The results were analysed in detail. Generalisability theory was used to estimate three indices of reliability: (A) intercase (B) pass/fail decision and (C) standard error of measurement (SEM). For each index, a benchmark requirement was preset at (A) 0·8 (B) 0·9 and (C) 0·5. Results, There were 896 candidates in total. Of these, 87 candidates (9·7%) failed. Total score variance was attributed to: 41% candidates, 32% oral content, 27% examiners and general error. Reliability coefficients were: (A) intercase 0·65; (B) pass/fail 0·85. The SEM was 0·52 (i.e. precise enough to distinguish within one unit on the rating scale). Extending testing time to four 20-minute oral examinations, each with two examiners, or five orals, each with one examiner, would improve intercase and pass/fail reliabilities to 0·78 and 0·94, respectively. Conclusion, Structured oral examinations can achieve reliabilities appropriate to high stakes examinations if sufficient resources are available. [source]


A Bayesian predictive analysis of test scores

JAPANESE PSYCHOLOGICAL RESEARCH, Issue 1 2001
Hidetoki Ishii
In the classical test theory, a high-reliability test always leads to a precise measurement. However, when it comes to the prediction of test scores, it is not necessarily so. Based on a Bayesian statistical approach, we predicted the distributions of test scores for a new subject, a new test, and a new subject taking a new test. Under some reasonable conditions, the predicted means, variances, and covariances of predicted scores were obtained and investigated. We found that high test reliability did not necessarily lead to small variances or covariances. For a new subject, higher test reliability led to larger predicted variances and covariances, because high test reliability enabled a more accurate prediction of test score variances. Regarding a new subject taking a new test, in this study, higher test reliability led to a large variance when the sample size was smaller than half the number of tests. The classical test theory is reanalyzed from the viewpoint of predictions and some suggestions are made. [source]