Test Scores (test + score)

Distribution by Scientific Domains
Distribution within Medical Sciences

Kinds of Test Scores

  • achievement test score
  • cognitive test score
  • neuropsychological test score

  • Selected Abstracts


    ECONOMIC INQUIRY, Issue 1 2008
    This article examines whether noncognitive skills,measured both by personality traits and by economic preference parameters,influence cognitive tests' performance. The basic idea is that noncognitive skills might affect the effort people put into a test to obtain good results. We experimentally varied the rewards for questions in a cognitive test to measure to what extent people are sensitive to financial incentives. To distinguish increased mental effort from extra time investments, we also varied the questions' time constraints. Subjects with favorable personality traits such as high performance motivation and an internal locus of control perform relatively well in the absence of rewards, consistent with a model in which trying as hard as you can is the best strategy. In contrast, favorable economic preference parameters (low discount rate, low risk aversion) are associated with increases in time investments when incentives are introduced, consistent with a rational economic model in which people only invest when there are monetary returns. The main conclusion is that individual behavior at cognitive tests depends on noncognitive skills. (JEL J20, J24) [source]

    The Roles of Gender and Affirmative Action Attitude in Reactions to Test Score Use Methods,

    Donald M. Truxillo
    The present study explored the effects of 2 variables, affirmative action (AA) attitude and gender, on reactions to 3 test score use (TSU) methods: top-down selection. banding with random selection, and banding with preferences. In a study of 94 upper-division and graduate business students, AA attitude was associated with different reactions to TSU methods in terms of fairness and organizational attractiveness. Moreover, women with negative AA attitudes and men rated banding with preferences lower than the other two methods, but women with positive AA attitudes did not Results are discussed in terms of applicant reactions models, implications for organizations and future research. [source]

    Black and White Differences in Cognitive Function Test Scores: What Explains the Difference?

    Kala M. Mehta DSc
    Several studies have reported that older black and Latino adults have lower cognitive function test scores than older white adults, but few have comprehensively examined reasons for score differences. This study evaluates whether differences in health and socioeconomic indicators, including literacy level, can explain differences in cognitive function test scores between older black and white adults. [source]

    Using Kernel Equating to Assess Item Order Effects on Test Scores

    Tim Moses
    This study explored the use of kernel equating for integrating and extending two procedures proposed for assessing item order effects in test forms that have been administered to randomly equivalent groups. When these procedures are used together, they can provide complementary information about the extent to which item order effects impact test scores, in overall score distributions and also at specific test scores. In addition to detecting item order effects, the integrated procedures also suggest the equating function that most adequately adjusts the scores to mitigate the effects. To demonstrate, the statistical equivalences of alternate versions of two large-volume advanced placement exams were assessed. [source]

    Effects of Differentially Time-Consuming Tests on Computer-Adaptive Test Scores

    Brent Bridgeman
    Time limits on some computer-adaptive tests (CATs) are such that many examinees have difficulty finishing, and some examinees may be administered tests with more time-consuming items than others. Results from over 100,000 examinees suggested that about half of the examinees must guess on the final six questions of the analytical section of the Graduate Record Examination if they were to finish before time expires. At the higher-ability levels, even more guessing was required because the questions administered to higher-ability examinees were typically more time consuming. Because the scoring model is not designed to cope with extended strings of guesses, substantial errors in ability estimates can be introduced when CATs have strict time limits. Furthermore, examinees who are administered tests with a disproportionate number of time-consuming items appear to get lower scores than examinees of comparable ability who are administered tests containing items that can be answered more quickly, though the issue is very complex because of the relationship of time and difficulty, and the multidimensionality of the test. [source]

    Previously undiagnosed aortic stenosis revealed by auscultation in the hip fracture population , echocardiographic findings, management and outcome

    ANAESTHESIA, Issue 8 2009
    M. E. McBrien
    Summary The 2001 Report of the National Confidential Enquiry into Perioperative Deaths recommended that an echocardiogram should be performed on patients with aortic stenosis prior to anaesthesia. In this study we present the patient details, management and outcome of the 272 hip fracture patients with a previously undiagnosed murmur and echocardiographically proven aortic stenosis admitted from 2001,2005 in our hospital. The patients with aortic stenosis were significantly older, and had significantly lower Abbreviated Mental Test Scores, than the control group of 3698 hip fracture patients without aortic stenosis. There were significant trends toward general anaesthesia over spinal anaesthesia, and use of invasive monitoring of blood pressure, as the severity of the aortic stenosis increased. There were no significant trends towards higher 30-day or 1-year mortality rates as the severity of the aortic stenosis increased. Resources for rapid pre-operative echocardiograms should be made available for hip fracture patients as the results have significant implications for their subsequent anaesthetic management. [source]

    Who benefits from learning with 3D models? the case of spatial ability

    T. Huk
    Abstract Empirical studies that focus on the impact of three-dimensional (3D) visualizations on learning are to date rare and inconsistent. According to the ability-as-enhancer hypothesis, high spatial ability learners should benefit particularly as they have enough cognitive capacity left for mental model construction. In contrast, the ability-as-compensator hypothesis proposes that low spatial ability learners should gain particular benefit from explicit graphical representations as they have difficulty mentally constructing their own visualizations. This study examines the impact that interactive 3D models implemented within a hypermedia-learning environment have on understanding of cell biology. Test scores in a subsequent knowledge acquisition test demonstrated a significant interaction term between students' spatial ability and presence/absence of 3D models. Only students with high spatial ability benefited from the presence of 3D models, while low spatial ability students got fewer points when learning this way. When using 3D models, high spatial ability students perceived their cognitive load to be low whereas the opposite was true for low spatial ability students. The data suggest that students with low spatial ability became cognitively overloaded by the presence of 3D models, while high spatial ability students benefited from them as their total cognitive load remained within working memory limits. [source]

    War-related posttraumatic stress disorder in Black, Hispanic, and majority White Vietnam veterans: The roles of exposure and vulnerability

    Bruce P. Dohrenwend
    Elevated prevalence rates of chronic posttraumatic stress disorder (PTSD) have been reported for Black and Hispanic Vietnam veterans. There has been no comprehensive explanation of these group differences. Moreover, previous research has relied on retrospective reports of war-zone stress and on PTSD assessments that fail to distinguish between prevalence and incidence. These limitations are addressed by use of record-based exposure measures and clinical diagnoses of a subsample of veterans from the National Vietnam Veterans Readjustment Study (NVVRS). Compared with Majority White, the Black elevation is explained by Blacks' greater exposure; the Hispanic elevation, by Hispanics' greater exposure, younger age, lesser education, and lower Armed Forces Qualification Test scores. The PTSD elevation in Hispanics versus Blacks is accounted for mainly by Hispanics' younger age. [source]

    System-perpetuating asymmetries between explicit and implicit intergroup attitudes among indigenous and non-indigenous Chileans

    Andrés Haye
    The present research demonstrates a dissociation between explicit and implicit intergroup evaluation in the reciprocal attitudes between indigenous (Mapuche) and non-indigenous Chileans. In both social groups, the explicit measures of attitudes towards the respective in-group and out-group were compared with the Implicit Association Test scores. The results indicate that the members of the low-status minority might explicitly express a moderate evaluative preference for their in-group but might implicitly devalue it. Conversely, the members of the high-status majority might implicitly devalue their out-group but might explicitly express no bias. These results are theoretically framed in terms of system justification, conventional stereotypes and motivated correction processes. [source]

    Paranoid beliefs and self-criticism in students

    A. Mills
    Paranoid beliefs are associated with negative and malevolent views of others. This study, however, explored hostile and compassionate self-to-self relating in regard to paranoid beliefs. A total of 131 students were given a series of scales measuring paranoid ideation, forms and functions of self-criticism, self-reassurance, self-compassion and depression. Test scores were subjected to correlation and hierarchical regression analyses to explore the relative contribution of study variables to paranoid beliefs. In this population, paranoid beliefs were associated with forms and functions of self-criticism, especially self-hating and self-persecution. Paranoid beliefs were negatively correlated with self-kindness and abilities to be self-reassuring. These variables were also associated with depression (as were paranoid beliefs). A hierarchical regression found that self-hatred remained a predictor of paranoid ideation even after controlling for depression and self-reassurance. Paranoid beliefs seem to be associated with a critical and even hating experience of self. These inner experiences of self may be profitable targets for therapeutic interventions.,Copyright © 2007 John Wiley & Sons, Ltd. [source]

    Effects of back care education in elementary schoolchildren

    ACTA PAEDIATRICA, Issue 8 2000
    G Cardon
    The purpose of this study was to investigate the effects of a back care education programme, consisting of six sessions of 1 h each, in fourth- and fifth-grade elementary schoolchildren. Testing consisted of a practical performance and a back care knowledge test. Forty-two subjects and 36 controls performed a pre-test and were tested within 1 wk after the programme. To monitor effects and follow-up effects on a larger sample, 82 different pupils were tested within 1 wk after the programme and 116 other children 3 mo after. Both larger samples were compared with one group of 129 controls. Interrater reliability for the test items of the practical assessment was high; intraclass correlation coefficients varied from 0.785 to 0.980. In the pre/post design study, interaction between time and condition was significant for the sum score of the practical assessment and for the knowledge test (p < 0.001), with higher scores for the intervention group (15% improvement for the knowledge test score, 31.6% for the practical sum score). Significantly higher sum scores for the knowledge test and for all practical assessment items were found in the intervention groups, tested within 1 wk and 3 mo after the programme, in comparison with the control group (p <0.001). Conclusion: The effectiveness of a primary educational prevention programme on back care principles was demonstrated in this study. Effectiveness, long-term outcomes and behavioural changes need further evaluation to optimize back care prevention programmes for elementary schoolchildren. [source]

    Measurement error: implications for diagnosis and discrepancy models of developmental dyslexia

    DYSLEXIA, Issue 3 2005
    Sue M. Cotton
    Abstract The diagnosis of developmental dyslexia (DD) is reliant on a discrepancy between intellectual functioning and reading achievement. Discrepancy-based formulae have frequently been employed to establish the significance of the difference between ,intelligence' and ,actual' reading achievement. These formulae, however, often fail to take into consideration test reliability and the error associated with a single test score. This paper provides an illustration of the potential effects that test reliability and measurement error can have on the diagnosis of dyslexia, with particular reference to discrepancy models. The roles of reliability and standard error of measurement (SEM) in classic test theory are also briefly reviewed. This is followed by illustrations of how SEM and test reliability can aid with the interpretation of a simple discrepancy-based formula of DD. It is proposed that a lack of consideration of test theory in the use of discrepancy-based models of DD can lead to misdiagnosis (both false positives and false negatives). Further, misdiagnosis in research samples affects reproducibility and generalizability of findings. This in turn, may explain current inconsistencies in research on the perceptual, sensory, and motor correlates of dyslexia. Copyright © 2005 John Wiley & Sons, Ltd. [source]

    Consequences of Test Score Use as Validity Evidence: Roles and Responsibilities

    Paul D. Nichols
    This article has three goals. The first goal is to clarify the role that the consequences of test score use play in validity judgments by reviewing the role that modern writers on validity have ascribed for consequences in supporting validity judgments. The second goal is to summarize current views on who is responsible for collecting evidence of test score use consequences by attempting to separate the responsibilities of the test developer and the test user. The last goal is to offer a framework that attempts to prescribe the conditions under which the responsibility for collecting evidence of consequences falls to the test developer or to the test user. [source]

    Fall-related brain injuries and the risk of dementia in elderly people: a population-based study

    H. Luukinen
    Severe head injury in early adulthood may increase the risk of dementia in older age, but it is not known whether head injury in later life also increases the risk of dementia. A representative sample (82%) of persons aged 70 years or older with a Mini-Mental State Examination (MMSE) test score of ,26 (n = 325) were followed-up for 9 years to record all their fall-related head injuries resulting in traumatic brain injury (TBI). At the end of the follow-up period, 152 persons (81% of the surviving population) were examined for clinical dementia, according to DSM-IV criteria. Eight persons sustained a TBI and 34 developed dementia. Brain injury was associated with younger age at detection of dementia even when adjusted for sex and educational status (low educational status significantly associated with dementia); age-specific hazard ratio (95% confidence interval) 2.80 (1.35,5.81). In a population scoring ,28 points in the baseline MMSE an apolipoprotein E (ApoE) ,4 phenotype was also associated with younger age at the time of detecting dementia; 3.56 (1.35,9.34), and the effect of brain injury and ApoE ,4 phenotype was synergistic; 7.68 (2.32,25.3). We conclude that fall-related TBI predicts earlier onset of dementia and the effect is especially high amongst subjects who carry the ApoE ,4 allele. [source]

    Comparison of the MMSE and RUDAS cognitive screening tools in an elderly inpatient population in everyday clinical use

    J. Pang
    Abstract We compared test score and performance times of Folstein's Mini Mental State Examination (MMSE) and the Rowland Universal Dementia Assessment Scale (RUDAS). Forty-six patients were recruited. The mean score was 20.6 for the MMSE and 20.5 for the RUDAS. Linear regression analysis revealed an r value of 0.83 (P < 0.05). The mean performance time was 9.4 min for both the MMSE and the RUDAS. Patient satisfaction was similar for both tests. Surveyed clinicians preferred the MMSE because of greater familiarity. We concluded that the RUDAS correlates well with the MMSE and is no more time-consuming to perform. It has good clinical utility as a cognitive screening tool. [source]

    Testing Features of Graphical DIF: Application of a Regression Correction to Three Nonparametric Statistical Tests

    Daniel M. Bolt
    Inspection of differential item functioning (DIF) in translated test items can be informed by graphical comparisons of item response functions (IRFs) across translated forms. Due to the many forms of DIF that can emerge in such analyses, it is important to develop statistical tests that can confirm various characteristics of DIF when present. Traditional nonparametric tests of DIF (Mantel-Haenszel, SIBTEST) are not designed to test for the presence of nonuniform or local DIF, while common probability difference (P-DIF) tests (e.g., SIBTEST) do not optimize power in testing for uniform DIF, and thus may be less useful in the context of graphical DIF analyses. In this article, modifications of three alternative nonparametric statistical tests for DIF, Fisher's ,2test, Cochran's Z test, and Goodman's U test (Marascuilo & Slaughter, 1981), are investigated for these purposes. A simulation study demonstrates the effectiveness of a regression correction procedure in improving the statistical performance of the tests when using an internal test score as the matching criterion. Simulation power and real data analyses demonstrate the unique information provided by these alternative methods compared to SIBTEST and Mantel-Haenszel in confirming various forms of DIF in translated tests. [source]

    An Empirical Investigation Demonstrating the Multidimensional DIF Paradigm: A Cognitive Explanation for DIF

    Cindy M. Walker
    Differential Item Functioning (DIF) is traditionally used to identify different item performance patterns between intact groups, most commonly involving race or sex comparisons. This study advocates expanding the utility of DIF as a step in construct validation. Rather than grouping examinees based on cultural differences, the reference and focal groups are chosen from two extremes along a distinct cognitive dimension that is hypothesized to supplement the dominant latent trait being measured. Specifically, this study investigates DIF between proficient and non-proficient fourth- and seventh-grade writers on open-ended mathematics test items that require students to communicate about mathematics. It is suggested that the occurrence of DIF in this situation actually enhances, rather than detracts from, the construct validity of the test because, according to the National Council of Teachers of Mathematics (NCTM), mathematical communication is an important component of mathematical ability, the dominant construct being assessed. However, the presence of DIF influences the validity of inferences that can be made from test scores and suggests that two scores should be reported, one for general mathematical ability and one for mathematical communication. The fact that currently only one test score is reported, a simple composite of scores on multiple-choice and open-ended items, may lead to incorrect decisions being made about examinees. [source]

    Quality of cocoa beans dried using a direct solar dryer at different loadings

    Ching L Hii
    Abstract In this study fermented cocoa beans were dried in a direct solar dryer at three levels of loading (20, 30 and 60 kg). Surface mouldiness was found to be heavy in the 60 kg treatment, with beans appearing blackish. All the dried beans were reasonably acceptable in terms of vinegary odour and weak in alcohol odour. Weak odour was also detected for the faecal, rancid and cheesy odours. The 60 kg treatment was rated strong for wet sock odour due to poor drying condition. A significant difference (P < 0.05) was found between the 60 kg treatment and the lower loading treatments for pH and titratable acidity. A cut test showed that the lower loading treatments resulted in a higher percentage of brown beans. The 20 kg treatment showed the highest cut test score, which is significantly different (P < 0.05) from the 60 kg treatment. Fermentation index also showed a tendency for lower loading treatments to have a higher index. No significant difference (P > 0.05) was found among the treatments in terms of cocoa, astringency, bitterness and sourness flavour notes. However, better flavour was observed for beans from the 20 kg treatment. No mouldy off flavour was found in any of the dried beans. Overall quality assessment showed that the 20 kg treatment was able to produce reasonably good-quality beans as compared to other loadings and therefore is recommended for the direct solar dryer. Copyright © 2006 Society of Chemical Industry [source]

    A primer on classical test theory and item response theory for assessments in medical education

    MEDICAL EDUCATION, Issue 1 2010
    André F De Champlain
    Context, A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. Objectives, The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. Methods, The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. Discussion, Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions. Medical Education 2010: 44: 109,117 [source]

    Expression of wild-type estrogen receptor , protein in human breast cancer: Specific correlation with HER2/neu overexpression

    Yoshihisa Umekita
    Expression of estrogen receptor , (ER,) protein in human breast cancer and correlation with clinicopathological factors have been reported by many investigators, but many of them used ER, antibodies that react with both wild-type ER, (ER,wt) and splicing variant isoform. Therefore, the frequency and correlation with clinicopathological factors of ER,wt expression remain to be established. In the present study a monoclonal antibody EMR02, specific for ER,wt, was used in formalin-fixed paraffin-embedded sections from 225 female primary breast cancer patients diagnosed as having invasive ductal carcinoma. Expression of ER,, progesterone receptor (PgR) and HER2/neu were also investigated by immunohistochemistry. For ER,wt, ER, and PgR, positivity was defined as nuclear staining in >10% of the cancer cells. HER2/neu overexpression was defined as a Hercep test score 3+. Positivity for ER,wt, ER,, PgR and HER2/neu overexpression was 55%, 74%, 61% and 25%, respectively. The expression of ER,wt had a positive correlation with ER, (P = 0.018) and PgR (P = 0.02). There was significant positive correlation between ER,wt expression and HER2/neu overexpression (P < 0.0001). According to multivariate logistic regression analysis the most significant association was between ER,wt expression and HER2/neu overexpression (P < 0.0001). These results suggest that clinical significances of ER,wt expression in human breast cancer patients may be more complex. [source]

    Effectiveness of software training using simulations: An exploratory study

    Arnold D. McElroy Jr.
    This study was designed to explore the effectiveness in student performance and confidence of limited and full device simulators. The 30 employees from an information technology company who participated in this study were assigned to one of three groups. Each group received practice for learning a complex software procedure using traditional interactions, a limited device simulator, or a full device simulator. A training portal was created for each practice method. Measurements of performance included the number of times the participants repeated the assigned practice activity, the total time required to complete the procedure, a test score representing the number of mistakes made in the 20-step procedure, and the average time between mouse clicks as they selected items from menus or clicked buttons. Preliminary results indicated that a limited device simulator appears to be as effective as a full device simulator. Recommendations for further research and limitations of the study are also addressed. [source]

    Effect of informational internet web pages on patients' decision-making: randomised controlled trial regarding choice of spinal or general anaesthesia for orthopaedic surgery

    ANAESTHESIA, Issue 3 2010
    N. D. Groves
    Summary This study explored whether patients' preference for particular types of anaesthesia could be influenced pre-operatively by giving them the addresses of various relevant websites. Patients at an orthopaedic pre-assessment education clinic completed a questionnaire, which included a short multiple-choice general knowledge quiz about anaesthesia, and also questioned them as to their choice of anaesthesia (general or neuraxial). Patients were randomly assigned to intervention or control groups. Intervention group members were given the addresses of three relevant anaesthesia and health related websites to access at home. All patients were asked to complete the questionnaires on a second occasion, before surgery. Initially, most patients stated a preference for general anaesthesia. Subsequently, the intervention group altered their preference towards neuraxial anaesthesia compared to the control group (p , 0.0001). The increase in median (IQR [range]) anaesthesia knowledge test score was greater in the intervention group (from 10.0 (9.0,12.0 [5.0,14.0]) to 13.0 (11.0,14.0 [6.0,14.0])) than in the control group (from 10.0 (9.0,11.5 [3.0,13.0]) to 11.0 (9.0,12.0 [4.0,14.0]); p = 0.0068). [source]

    Framing French Success in Elementary Mathematics: Policy, Curriculum, and Pedagogy

    CURRICULUM INQUIRY, Issue 3 2004
    ABSTRACT For many decades Americans have been concerned about the effective teaching of mathematics, and educational and political leaders have often advocated reforms such as a return to the basics and strict accountability systems as the way to improve mathematical achievement. International studies, however, suggest that such reforms may not be the best path to successful mathematics education. Through this qualitative case study, the authors explore in depth the French approach to teaching elementary mathematics, using interviews, classroom observations, and documents as their data sets. They apply three theoretical frameworks to their data and find that the French use large-group instruction and a visible pedagogy, focusing on the discussion of mathematical concepts rather than on the completion of practice exercises. The national curriculum is relatively nonprescriptive, and teachers are somewhat empowered through site-based management. The authors conclude that the keys to French success with mathematics education are ongoing formative assessment, mathematically competent teachers, policies and practices that help disadvantaged children, and the use of constructivist methods. They urge comparative education researchers to look beyond international test scores to deeper issues of policy and practice. [source]

    Cognitive performance of male adolescents is lower than controls across psychiatric disorders: a population-based study

    M. Weiser
    Objective:, Psychiatric patients, as well as humans or experimental animals with brain lesions, often concurrently manifest behavioral deviations and subtle cognitive impairments. This study tested the hypothesis that as a group, adolescents suffering from psychiatric disorders score worse on cognitive tests compared with controls. Method:, As part of the assessment for eligibility to serve in the military, the entire, unselected population of 16,17-year old male Israelis undergo cognitive testing and screening for psychopathology by the Draft Board. We retrieved the cognitive test scores of 19 075 adolescents who were assigned any psychiatric diagnosis, and compared them with the scores of 243 507 adolescents without psychiatric diagnoses. Results:, Mean test scores of cases were significantly poorer then controls for all diagnostic groups, except for eating disorders. Effect sizes ranged from 0.3 to 1.6. Conclusion:, As group, adolescent males with psychiatric disorders manifest at least subtle impairments in cognitive functioning. [source]

    Estimating the Technology of Cognitive and Noncognitive Skill Formation

    ECONOMETRICA, Issue 3 2010
    Flavio Cunha
    This paper formulates and estimates multistage production functions for children's cognitive and noncognitive skills. Skills are determined by parental environments and investments at different stages of childhood. We estimate the elasticity of substitution between investments in one period and stocks of skills in that period to assess the benefits of early investment in children compared to later remediation. We establish nonparametric identification of a general class of production technologies based on nonlinear factor models with endogenous inputs. A by-product of our approach is a framework for evaluating childhood and schooling interventions that does not rely on arbitrarily scaled test scores as outputs and recognizes the differential effects of the same bundle of skills in different tasks. Using the estimated technology, we determine optimal targeting of interventions to children with different parental and personal birth endowments. Substitutability decreases in later stages of the life cycle in the production of cognitive skills. It is roughly constant across stages of the life cycle in the production of noncognitive skills. This finding has important implications for the design of policies that target the disadvantaged. For most configurations of disadvantage it is optimal to invest relatively more in the early stages of childhood than in later stages. [source]

    Cognitive test scores in male adolescent cigarette smokers compared to non-smokers: a population-based study

    ADDICTION, Issue 2 2010
    Mark Weiser
    ABSTRACT Background Although previous studies indicate that people with lower intelligence quotient (IQ) scores are more likely to become cigarette smokers, IQ scores of siblings discordant for smoking and of adolescents who began smoking between ages 18,21 years have not been studied systematically. Methods Each year a random sample of Israeli military recruits complete a smoking questionnaire. Cognitive functioning is assessed by the military using standardized tests equivalent to IQ. Results Of 20 221 18-year-old males, 28.5% reported smoking at least one cigarette a day (smokers). An unadjusted comparison found that smokers scored 0.41 effect sizes (ES, P < 0.001) lower than non-smokers; adjusted analyses remained significant (adjusted ES = 0.27, P < 0.001). Adolescents smoking one to five, six to 10, 11,20 and 21+ cigarettes/day had cognitive test scores 0.14, 0.22, 0.33 and 0.5 adjusted ES poorer than those of non-smokers (P < 0.001). Adolescents who did not smoke by age 18, and then began to smoke between ages 18,21 had lower cognitive test scores compared to never-smokers (adjusted ES = 0.14, P < 0.001). An analysis of brothers discordant for smoking found that smoking brothers had lower cognitive scores than non-smoking brothers (adjusted ES = 0.27; P = 0.014). Conclusion Controlled analyses from this large population-based cohort of male adolescents indicate that IQ scores are lower in male adolescents who smoke compared to non-smokers and in brothers who smoke compared to their non-smoking brothers. The IQs of adolescents who began smoking between ages 18,21 are lower than those of non-smokers. Adolescents with poorer IQ scores might be targeted for programmes designed to prevent smoking. [source]

    The Impact of Vertical Scaling Decisions on Growth Interpretations

    Derek C. Briggs
    Most growth models implicitly assume that test scores have been vertically scaled. What may not be widely appreciated are the different choices that must be made when creating a vertical score scale. In this paper empirical patterns of growth in student achievement are compared as a function of different approaches to creating a vertical scale. Longitudinal item-level data from a standardized reading test are analyzed for two cohorts of students between Grades 3 and 6 and Grades 4 and 7 for the entire state of Colorado from 2003 to 2006. Eight different vertical scales were established on the basis of choices made for three key variables: Item Response Theory modeling approach, linking approach, and ability estimation approach. It is shown that interpretations of empirical growth patterns appear to depend upon the extent to which a vertical scale has been effectively "stretched" or "compressed" by the psychometric decisions made to establish it. While all of the vertical scales considered show patterns of decelerating growth across grade levels, there is little evidence of scale shrinkage. [source]

    Three Options Are Optimal for Multiple-Choice Items: A Meta-Analysis of 80 Years of Research

    Michael C. Rodriguez
    Multiple-choice items are a mainstay of achievement testing. The need to adequately cover the content domain to certify achievement proficiency by producing meaningful precise scores requires many high-quality items. More 3-option items can be administered than 4- or 5-option items per testing time while improving content coverage, without detrimental effects on psychometric quality of test scores. Researchers have endorsed 3-option items for over 80 years with empirical evidence,the results of which have been synthesized in an effort to unify this endorsement and encourage its adoption. [source]

    Construct-Irrelevant Variance in High-Stakes Testing

    Thomas M. Haladyna
    There are many threats to validity in high-stakes achievement testing. One major threat is construct-irrelevant variance (CIV). This article defines CIV in the context of the contemporary, unitary view of validity and presents logical arguments, hypotheses, and documentation for a variety of CIV sources that commonly threaten interpretations of test scores. A more thorough study of CIV is recommended. [source]

    Bayes' Theorem to estimate population prevalence from Alcohol Use Disorders Identification Test (AUDIT) scores

    ADDICTION, Issue 7 2009
    David R. Foxcroft
    ABSTRACT Aim The aim in this methodological paper is to demonstrate, using Bayes' Theorem, an approach to estimating the difference in prevalence of a disorder in two groups whose test scores are obtained, illustrated with data from a college student trial where 12-month outcomes are reported for the Alcohol Use Disorders Identification Test (AUDIT). Method Using known population prevalence as a background probability and diagnostic accuracy information for the AUDIT scale, we calculated the post-test probability of alcohol abuse or dependence for study participants. The difference in post-test probability between the study intervention and control groups indicates the effectiveness of the intervention to reduce alcohol use disorder rates. Findings In the illustrative analysis, at 12-month follow-up there was a mean AUDIT score difference of 2.2 points between the intervention and control groups: an effect size of unclear policy relevance. Using Bayes' Theorem, the post-test probability mean difference between the two groups was 9% (95% confidence interval 3,14%). Interpreted as a prevalence reduction, this is evaluated more easily by policy makers and clinicians. Conclusion Important information on the probable differences in real world prevalence and impact of prevention and treatment programmes can be produced by applying Bayes' Theorem to studies where diagnostic outcome measures are used. However, the usefulness of this approach relies upon good information on the accuracy of such diagnostic measures for target conditions. [source]