Score Distribution (score + distribution)

Distribution by Scientific Domains


Selected Abstracts


Recentering and Realigning the SAT Score Distributions: How and Why

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 1 2002
Neil J. Dorans
The process employed to produce the conversions that take scores from the original SAT scales to recentered scales, in which reference group scores are centered near the midpoint of the score-reporting range, is laid out. For the purposes of this article, SAT Verbal and SAT Mathematical scores were placed on recentered scales, which have reporting ranges of 920 to 980, means of 950, and standard deviations of 11. (The 920-to-980 scale is used in this article to highlight the distinction between it and the old 200-to-800 scale. In actuality, recentered scores were reported on a 200-to-800 scale.) Recentering was accomplished via a linear transformation of normally distributed scores that were obtained from a continuized, smoothed frequency distribution of original SAT scores that were originally on augmented two-digit scales (i.e., discrete scores rounded to either 0 or 5 in the third decimal place). These discrete scores were obtained for all students in the 1990 Reference Group using 35 different editions of the SAT spanning October 1988 to June 1990. The performance of this 1990 Reference Group on the original and recentered scales is described. The effects of recentering on scores of individuals and the 1990 Reference Group are also examined. Finally, recentering did not occur solely on the basis of its technical merit. Issues associated with converting recentering from a possibility into a reality are discussed. [source]


A view from the bridge: agreement between the SF-6D utility algorithm and the Health Utilities Index

HEALTH ECONOMICS, Issue 11 2003
Bernie J. O'Brien
Abstract Background: The SF-6D is a new health state classification and utility scoring system based on 6 dimensions (,6D') of the Short Form 36, and permits a "bridging" transformation between SF-36 responses and utilities. The Health Utilities Index, mark 3 (HUI3) is a valid and reliable multi-attribute health utility scale that is widely used. We assessed within-subject agreement between SF-6D utilities and those from HUI3. Methods: Patients at increased risk of sudden cardiac death and participating in a randomized trial of implantable defibrillator therapy completed both instruments at baseline. Score distributions were inspected by scatterplot and histogram and mean score differences compared by paired t -test. Pearson correlation was computed between instrument scores and also between dimension scores within instruments. Between-instrument agreement was by intra-class correlation coefficient (ICC). Results: SF-6D and HUI3 forms were available from 246 patients. Mean scores for HUI3 and SF-6D were 0.61 (95% CI 0.60,0.63) and 0.58 (95% CI 0.54,0.62) respectively; a difference of 0.03 (p<0.03). Score intervals for HUI3 and SF-6D were (-0.21 to 1.0) and (0.30,0.95). Correlation between the instrument scores was 0.58 (95% CI 0.48,0.68) and agreement by ICC was 0.42 (95% CI 0.31,0.52). Correlations between dimensions of SF-6D were higher than for HUI3. Conclusions: Our study casts doubt on the whether utilities and QALYs estimated via SF-6D are comparable with those from HUI3. Utility differences may be due to differences in underlying concepts of health being measured, or different measurement approaches, or both. No gold standard exists for utility measurement and the SF-6D is a valuable addition that permits SF-36 data to be transformed into utilities to estimate QALYs. The challenge is developing a better understanding as to why these classification-based utility instruments differ so markedly in their distributions and point estimates of derived utilities. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Reliability of Computerized Emergency Triage

ACADEMIC EMERGENCY MEDICINE, Issue 3 2006
Sandy L. Dong MD
Objectives: Emergency department (ED) triage prioritizes patients based on urgency of care. This study compared agreement between two blinded, independent users of a Web-based triage tool (eTRIAGE) and examined the effects of ED crowding on triage reliability. Methods: Consecutive patients presenting to a large, urban, tertiary care ED were assessed by the duty triage nurse and an independent study nurse, both using eTRIAGE. Triage score distribution and agreement are reported. The study nurse collected data on ED activity, and agreement during different levels of ED crowding is reported. Two methods of interrater agreement were used: the linear-weighted , and quadratic-weighted ,. Results: A total of 575 patients were assessed over nine weeks, and complete data were available for 569 patients (99.0%). Agreement between the two nurses was moderate if using linear , (weighted ,= 0.52; 95% confidence interval = 0.46 to 0.57) and good if using quadratic , (weighted ,= 0.66; 95% confidence interval = 0.60 to 0.71). ED overcrowding data were available for 353 patients (62.0%). Agreement did not significantly differ with respect to periods of ambulance diversion, number of admitted inpatients occupying stretchers, number of patients in the waiting room, number of patients registered in two hours, or nurse perception of busyness. Conclusions: This study demonstrated different agreement depending on the method used to calculate interrater reliability. Using the standard methods, it found good agreement between two independent users of a computerized triage tool. The level of agreement was not affected by various measures of ED crowding. [source]


An exploration of anger phenomenology in multiple sclerosis

EUROPEAN JOURNAL OF NEUROLOGY, Issue 12 2009
U. Nocentini
Background and purpose:, Multiple sclerosis (MS) patients are often emotionally disturbed. We investigated anger in these patients in relation to demographic, clinical, and mood characteristics. Patients and methods:, About 195 cognitively unimpaired MS patients (150 relapsing,remitting and 45 progressive) were evaluated with the State Trait Anger Expression Inventory, the Chicago Multiscale Depression Inventory, and the State Trait Anxiety Inventory. The patients' anger score distribution was compared with that of the normal Italian population. Correlation coefficients among scale scores were calculated and mean anger scores were compared across different groups of patients by analysis of variance. Results:, Of the five different aspects of anger, levels of withheld and controlled Anger were respectively higher and lower than what is expected in the normal population. Although anger was correlated with anxiety and depression, it was largely independent from these mood conditions. Mean anger severity scores were not strongly influenced by individual demographic characteristics and were not higher in more severe patients. Conclusions:, The presence of an altered pattern of anger, unrelated to the clinical severity of MS, suggests that anger is not an emotional reaction to disease stress. An alteration of anger mechanisms might be a direct consequence of the demyelination of the connections among the amygdale, the basal ganglia and the medial prefrontal cortex. [source]


Development and preliminary testing of a Paediatric Version of the Haemophilia Activities List (pedhal)

HAEMOPHILIA, Issue 2 2010
W. G. GROEN
Summary., Worldwide, children with haemophilia suffer from limitations in performing activities of daily living. To measure such limitations in adults a disease-specific instrument, the Haemophilia Activities List (HAL), was created in 2004. The aim of this study was to adapt the HAL for children with haemophilia and to assess its psychometric properties. The structure and the main content were derived from the HAL. Additionally, items of the Childhood Health Assessment Questionnaire and the Activity Scale for Kids were considered for inclusion. This version was evaluated by health professionals (n = 6), patients (n = 4), and parents (n = 3). A pilot test in a sample of 32 Dutch children was performed to assess score distribution, construct validity (Spearman's rho) and reproducibility. Administration of the pedhal was feasible for children from the age of 4 years onwards. The pedhal scores of the Dutch children were in the high end of the scale, reflecting a good functional status. Most subscales showed moderate associations with the joint examination (rho = 0.42,0.63) and moderate-to-good associations with the physical function subscale of the CHQ-50 (rho = 0.48,0.74). No significant associations were found for the pedhal and the subscales mental health and behaviour, except for the subscales leisure and sport and mental health (rho = 0.47). Test,retest agreement was good. The pedhal is a promising tool, but further testing in populations with a higher level of disability is warranted to study the full range of its psychometric properties. [source]


Random-Groups Equating with Samples of 50 to 400 Test Takers

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 2 2010
Samuel A. Livingston
Five methods for equating in a random groups design were investigated in a series of resampling studies with samples of 400, 200, 100, and 50 test takers. Six operational test forms, each taken by 9,000 or more test takers, were used as item pools to construct pairs of forms to be equated. The criterion equating was the direct equipercentile equating in the group of all test takers. Equating accuracy was indicated by the root-mean-squared deviation, over 1,000 replications, of the sample equatings from the criterion equating. The methods investigated were equipercentile equating of smoothed distributions, linear equating, mean equating, symmetric circle-arc equating, and simplified circle-arc equating. The circle-arc methods produced the most accurate results for all sample sizes investigated, particularly in the upper half of the score distribution. The difference in equating accuracy between the two circle-arc methods was negligible. [source]


Generating Dichotomous Item Scores with the Four-Parameter Beta Compound Binomial Model

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 3 2007
Patrick O. Monahan
A Monte Carlo simulation technique for generating dichotomous item scores is presented that implements (a) a psychometric model with different explicit assumptions than traditional parametric item response theory (IRT) models, and (b) item characteristic curves without restrictive assumptions concerning mathematical form. The four-parameter beta compound-binomial (4PBCB) strong true score model (with two-term approximation to the compound binomial) is used to estimate and generate the true score distribution. The nonparametric item-true score step functions are estimated by classical item difficulties conditional on proportion-correct total score. The technique performed very well in replicating inter-item correlations, item statistics (point-biserial correlation coefficients and item proportion-correct difficulties), first four moments of total score distribution, and coefficient alpha of three real data sets consisting of educational achievement test scores. The technique replicated real data (including subsamples of differing proficiency) as well as the three-parameter logistic (3PL) IRT model (and much better than the 1PL model) and is therefore a promising alternative simulation technique. This 4PBCB technique may be particularly useful as a more neutral simulation procedure for comparing methods that use different IRT models. [source]


MELD,Moving steadily towards equality, equity, and fairness

LIVER TRANSPLANTATION, Issue 5 2005
James Neuberger
Background and aims: A consensus has been reached that liver donor allocation should be based primarily on liver disease severity and that waiting time should not be a major determining factor. Our aim was to assess the capability of the Model for End-Stage Liver Disease (MELD) score to correctly rank potential liver recipients according to their severity of liver disease and mortality risk on the OPTN liver waiting list. Methods: The MELD model predicts liver disease severity based on serum creatinine, serum total bilirubin, and INR and has been shown to be useful in predicting mortality in patients with compensated and decompensated cirrhosis. In this study, we prospectively applied the MELD score to estimate 3-month mortality to 3437 adult liver transplant candidates with chronic liver disease who were added to the OPTN waiting list at 2A or 2B status between November, 1999, and December, 2001. Results: In this study cohort with chronic liver disease, 412 (12%) died during the 3-month follow-up period. Waiting list mortality increased directly in proportion to the listing MELD score. Patients having a MELD score <9 experienced a 1.9% mortality, whereas patients having a MELD score > or =40 had a mortality rate of 71.3%. Using the c-statistic with 3-month mortality as the end point, the area under the receiver operating characteristic (ROC) curve for the MELD score was 0.83 compared with 0.76 for the Child-Turcotte-Pugh (CTP) score (P < 0.001). Conclusions: These data suggest that the MELD score is able to accurately predict 3-month mortality among patients with chronic liver disease on the liver waiting list and can be applied for allocation of donor livers.(Gastroenterology 2003;124:91,96.) Context: The Model for Endstage Liver Disease (MELD) score serves as the basis for the distribution of deceased-donor (DD) livers and was developed in response to "the final rule" mandate, whose stated principle is to allocate livers according to a patient's medical need, with less emphasis on keeping organs in the local procurement area. However, in selected areas of the United States, organs are kept in organ procurement organizations (OPOs) with small waiting lists and transplanted into less-sick patients instead of being allocated to sicker patients in nearby transplant centers in OPOs with large waiting lists. Objective: To determine whether there is a difference in MELD scores for liver transplant recipients receiving transplants in small vs large OPOs. Design and setting: Retrospective review of the US Scientific Registry of Transplant Recipients between February 28, 2002, and March 31, 2003. Transplant recipients (N = 4798) had end-stage liver disease and received DD livers. Main outcome measures: MELD score distribution (range, 6,40), graft survival, and patient survival for liver transplant recipients in small (<100) and large (> or =100 on the waiting list) OPOs. RESULTS: The distribution of MELD scores was the same in large and small OPOs; 92% had a MELD score of 18 or less, 7% had a MELD score between 19 and 24, and only 2% of listed patients had a MELD score higher than 24 (P = .85). The proportion of patients receiving transplants in small OPOs and with a MELD score higher than 24 was significantly lower than that in large OPOs (19% vs 49%; P<.001). Patient survival rates at 1 year after transplantation for small OPOs (86.4%) and large OPOs (86.6%) were not statistically different (P = .59), and neither were graft survival rates in small OPOs (80.1%) and large OPOs (81.3%) (P = .80). Conclusions: There is a significant disparity in MELD scores in liver transplant recipients in small vs large OPOs; fewer transplant recipients in small OPOs have severe liver disease (MELD score >24). This disparity does not reflect the stated goals of the current allocation policy, which is to distribute livers according to a patient's medical need, with less emphasis on keeping organs in the local procurement area. (JAMA 2004;291:1871,1874.) [source]


Poorly performing physicians: Does the script concordance test detect bad clinical reasoning?,

THE JOURNAL OF CONTINUING EDUCATION IN THE HEALTH PROFESSIONS, Issue 3 2010
François Goulet MD
Abstract Introduction Evaluation of poorly performing physicians is a worldwide concern for licensing bodies. The Collège des Médecins du Québec currently assesses the clinical competence of physicians previously identified with potential clinical competence difficulties through a day-long procedure called the Structured Oral Interview (SOI). Two peer physicians produce a qualitative report. In view of remediation activities and the potential for legal consequences, more information on the clinical reasoning process (CRP) and quantitative data on the quality of that process is needed. This study examines the Script Concordance Test (SCT), a tool that provides a standardized and objective measure of a specific dimension of CRP, clinical data interpretation (CDI), to determine whether it could be useful in that endeavor. Methods Over a 2-year period, 20 family physicians took, in addition to the SOI, a 1-hour paper-and-pencil SCT. Three evaluators, blind as to the purpose of the experiment, retrospectively reviewed SOI reports and were asked to estimate clinical reasoning quality. Subjects were classified into 2 groups (below and above median of the score distribution) for the 2 assessment methods. Agreement between classifications is estimated with the use of the Kappa coefficient. Results Intraclass correlation for SOI was 0.89. Cronbach alpha coefficient for the SCT was 0.90. Agreement between methods was found for 13 participants (Kappa: 0.30, P = 0.18), but 7 out of 20 participants were classified differently in both methods. All participants but 1 had SCT scores below 2 SD of panel mean, thus indicating serious deficiencies in CDI. Discussion The finding that the majority of the referred group did so poorly on CDI tasks has great interest for assessment as well as for remediation. In remediation of prescribing skills, adding SCT to SOI is useful for assessment of cognitive reasoning in poorly performing physicians. The structured oral interview should be improved with more precise reporting by those who assess the clinical reasoning process of examinees, and caution is recommended in interpreting SCT scores; they reflect only a part of the reasoning process. [source]


Psychometric Properties of IRT Proficiency Estimates

EDUCATIONAL MEASUREMENT: ISSUES AND PRACTICE, Issue 3 2010
Michael J. Kolen
Psychometric properties of item response theory proficiency estimates are considered in this paper. Proficiency estimators based on summed scores and pattern scores include non-Bayes maximum likelihood and test characteristic curve estimators and Bayesian estimators. The psychometric properties investigated include reliability, conditional standard errors of measurement, and score distributions. Four real-data examples include (a) effects of choice of estimator on score distributions and percent proficient, (b) effects of the prior distribution on score distributions and percent proficient, (c) effects of test length on score distributions and percent proficient, and (d) effects of proficiency estimator on growth-related statistics for a vertical scale. The examples illustrate that the choice of estimator influences score distributions and the assignment of examinee to proficiency levels. In particular, for the examples studied, the choice of Bayes versus non-Bayes estimators had a more serious practical effect than the choice of summed versus pattern scoring. [source]


Selection Strategies for Univariate Loglinear Smoothing Models and Their Effect on Equating Function Accuracy

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 2 2009
Tim Moses
In this study, we compared 12 statistical strategies proposed for selecting loglinear models for smoothing univariate test score distributions and for enhancing the stability of equipercentile equating functions. The major focus was on evaluating the effects of the selection strategies on equating function accuracy. Selection strategies' influence on the estimation of cumulative test score distributions was also assessed. The results of this simulation study differentiate the selection strategies and define the situations where their use has the most important implications for equating function accuracy. The recommended strategy for estimating test score distributions and for equating is AIC minimization. [source]


Using Kernel Equating to Assess Item Order Effects on Test Scores

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 2 2007
Tim Moses
This study explored the use of kernel equating for integrating and extending two procedures proposed for assessing item order effects in test forms that have been administered to randomly equivalent groups. When these procedures are used together, they can provide complementary information about the extent to which item order effects impact test scores, in overall score distributions and also at specific test scores. In addition to detecting item order effects, the integrated procedures also suggest the equating function that most adequately adjusts the scores to mitigate the effects. To demonstrate, the statistical equivalences of alternate versions of two large-volume advanced placement exams were assessed. [source]


Estimating false discovery rates for peptide and protein identification using randomized databases

PROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 12 2010
Gregory Hather
Abstract MS-based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression-based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two-peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%). [source]