Test Theory (test + theory)

Distribution by Scientific Domains

Kinds of Test Theory

  • classical test theory


  • Selected Abstracts


    A Closed Form Slug Test Theory for High Permeability Aquifers

    GROUND WATER, Issue 1 2005
    David W. Ostendorf
    We incorporate a linear estimate of casing friction into the analytical slug test theory of Springer and Gelhar (1991) for high permeability aquifers. The modified theory elucidates the influence of inertia and casing friction on consistent, closed form equations for the free surface, pressure, and velocity fluctuations for overdamped and under-damped conditions. A consistent, but small, correction for kinetic energy is included as well. A characteristic velocity linearizes the turbulent casing shear stress so that an analytical solution for attenuated, phase shifted pressure fluctuations fits a single parameter (damping frequency) to transducer data from any depth in the casing. Underdamped slug tests of 0.3, 0.6, and 1 m amplitudes at five transducer depths in a 5.1 cm diameter PVC well 21 m deep in the Plymouth-Carver Aquifer yield a consistent hydraulic conductivity of 1.5 × 10,3 m/s. The Springer and Gelhar (1991) model underestimates the hydraulic conductivity for these tests by as muchas 25% by improperly ascribing smooth turbulent casing friction to the aquifer. The match point normalization of Butler (1998) agrees with our fitted hydraulic conductivity, however, when friction is included in the damping frequency. Zurbuchen et al. (2002) use a numerical model to establish a similar sensitivity of hydraulic conductivity to nonlinear casing friction. [source]


    Statistical Test Theory for the Behavioral Sciences by Dato N. M. de Gruijter, Leo J. Th. van der Kamp

    INTERNATIONAL STATISTICAL REVIEW, Issue 1 2008
    Kimmo Vehkalahti
    No abstract is available for this article. [source]


    Siting the Death Penalty Internationally

    LAW & SOCIAL INQUIRY, Issue 2 2008
    David F. Greenberg
    We examine sources of variation in possession and use of the death penalty using data drawn from 193 nations in order to test theories of punishment. We find the death penalty to be rooted in a country's legal and political systems, and to be influenced by its religious traditions. A country's level of economic development, its educational attainment, and its religious composition shape its political institutions and practices, indirectly affecting its use of the death penalty. The article concludes by discussing likely future trends. [source]


    Measurement error: implications for diagnosis and discrepancy models of developmental dyslexia

    DYSLEXIA, Issue 3 2005
    Sue M. Cotton
    Abstract The diagnosis of developmental dyslexia (DD) is reliant on a discrepancy between intellectual functioning and reading achievement. Discrepancy-based formulae have frequently been employed to establish the significance of the difference between ,intelligence' and ,actual' reading achievement. These formulae, however, often fail to take into consideration test reliability and the error associated with a single test score. This paper provides an illustration of the potential effects that test reliability and measurement error can have on the diagnosis of dyslexia, with particular reference to discrepancy models. The roles of reliability and standard error of measurement (SEM) in classic test theory are also briefly reviewed. This is followed by illustrations of how SEM and test reliability can aid with the interpretation of a simple discrepancy-based formula of DD. It is proposed that a lack of consideration of test theory in the use of discrepancy-based models of DD can lead to misdiagnosis (both false positives and false negatives). Further, misdiagnosis in research samples affects reproducibility and generalizability of findings. This in turn, may explain current inconsistencies in research on the perceptual, sensory, and motor correlates of dyslexia. Copyright © 2005 John Wiley & Sons, Ltd. [source]


    (Mis) Conception About Generalizability Theory

    EDUCATIONAL MEASUREMENT: ISSUES AND PRACTICE, Issue 1 2000
    Robert L. Brennan
    In what sense is generalizability theory an extension of classical test theory? In what sense is generalizability theory an application of analysis of variance. [source]


    A Closed Form Slug Test Theory for High Permeability Aquifers

    GROUND WATER, Issue 1 2005
    David W. Ostendorf
    We incorporate a linear estimate of casing friction into the analytical slug test theory of Springer and Gelhar (1991) for high permeability aquifers. The modified theory elucidates the influence of inertia and casing friction on consistent, closed form equations for the free surface, pressure, and velocity fluctuations for overdamped and under-damped conditions. A consistent, but small, correction for kinetic energy is included as well. A characteristic velocity linearizes the turbulent casing shear stress so that an analytical solution for attenuated, phase shifted pressure fluctuations fits a single parameter (damping frequency) to transducer data from any depth in the casing. Underdamped slug tests of 0.3, 0.6, and 1 m amplitudes at five transducer depths in a 5.1 cm diameter PVC well 21 m deep in the Plymouth-Carver Aquifer yield a consistent hydraulic conductivity of 1.5 × 10,3 m/s. The Springer and Gelhar (1991) model underestimates the hydraulic conductivity for these tests by as muchas 25% by improperly ascribing smooth turbulent casing friction to the aquifer. The match point normalization of Butler (1998) agrees with our fitted hydraulic conductivity, however, when friction is included in the damping frequency. Zurbuchen et al. (2002) use a numerical model to establish a similar sensitivity of hydraulic conductivity to nonlinear casing friction. [source]


    A Bayesian predictive analysis of test scores

    JAPANESE PSYCHOLOGICAL RESEARCH, Issue 1 2001
    Hidetoki Ishii
    In the classical test theory, a high-reliability test always leads to a precise measurement. However, when it comes to the prediction of test scores, it is not necessarily so. Based on a Bayesian statistical approach, we predicted the distributions of test scores for a new subject, a new test, and a new subject taking a new test. Under some reasonable conditions, the predicted means, variances, and covariances of predicted scores were obtained and investigated. We found that high test reliability did not necessarily lead to small variances or covariances. For a new subject, higher test reliability led to larger predicted variances and covariances, because high test reliability enabled a more accurate prediction of test score variances. Regarding a new subject taking a new test, in this study, higher test reliability led to a large variance when the sample size was smaller than half the number of tests. The classical test theory is reanalyzed from the viewpoint of predictions and some suggestions are made. [source]


    Refining the measurement of exposure to violence (ETV) in urban youth

    JOURNAL OF COMMUNITY PSYCHOLOGY, Issue 5 2007
    Robert T. Brennan
    Correlational analysis, classical test theory, confirmatory factor analysis, and multilevel Rasch modeling were used to refine a measure of adolescents' exposure to violence (ETV). Interpersonal violence could be distinguished from other potentially traumatic events; it was also possible to distinguish three routes of exposure (victimization, witnessing, and learning of). Correlations confirmed that ETV subscales are related to measures of aggression, delinquency, and depression/anxiety. Reliability was improved by combining ETV subscales and/or caregiver and youth reports. Valid and reliable measures of ETV are critical to future research in associating violence exposure with common mental health and behavioral outcomes and disorders, and tracking how early violence exposure may affect future outcomes for adolescents. © 2007 Wiley Periodicals, Inc. J Comm Psychol 35: 603,618, 2007. [source]


    How Often Do Subscores Have Added Value?

    JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 2 2010
    Results from Operational, Simulated Data
    Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman suggested a method based on classical test theory to determine whether subscores have added value over total scores. In this article I first provide a rich collection of results regarding when subscores were found to have added value for several operational data sets. Following that I provide results from a detailed simulation study that examines what properties subscores should possess in order to have added value. The results indicate that subscores have to satisfy strict standards of reliability and correlation to have added value. A weighted average of the subscore and the total score was found to have added value more often. [source]


    A family of measures to evaluate scale reliability in a longitudinal setting

    JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 1 2009
    Annouschka Laenen
    Summary., The concept of reliability denotes one of the most important psychometric properties of a measurement scale. Reliability refers to the capacity of the scale to discriminate between subjects in a given population. In classical test theory, it is often estimated by using the intraclass correlation coefficient based on two replicate measurements. However, the modelling framework that is used in this theory is often too narrow when applied in practical situations. Generalizability theory has extended reliability theory to a much broader framework but is confronted with some limitations when applied in a longitudinal setting. We explore how the definition of reliability can be generalized to a setting where subjects are measured repeatedly over time. On the basis of four defining properties for the concept of reliability, we propose a family of reliability measures which circumscribes the area in which reliability measures should be sought. It is shown how different members assess different aspects of the problem and that the reliability of the instrument can depend on the way that it is used. The methodology is motivated by and illustrated on data from a clinical study on schizophrenia. On the basis of this study, we estimate and compare the reliabilities of two different rating scales to evaluate the severity of the disorder. [source]


    A primer on classical test theory and item response theory for assessments in medical education

    MEDICAL EDUCATION, Issue 1 2010
    André F De Champlain
    Context, A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. Objectives, The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. Methods, The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. Discussion, Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions. Medical Education 2010: 44: 109,117 [source]


    Item response theory: applications of modern test theory in medical education

    MEDICAL EDUCATION, Issue 8 2003
    Steven M Downing
    Context Item response theory (IRT) measurement models are discussed in the context of their potential usefulness in various medical education settings such as assessment of achievement and evaluation of clinical performance. Purpose The purpose of this article is to compare and contrast IRT measurement with the more familiar classical measurement theory (CMT) and to explore the benefits of IRT applications in typical medical education settings. Summary CMT, the more common measurement model used in medical education, is straightforward and intuitive. Its limitation is that it is sample-dependent, in that all statistics are confounded with the particular sample of examinees who completed the assessment. Examinee scores from IRT are independent of the particular sample of test questions or assessment stimuli. Also, item characteristics, such as item difficulty, are independent of the particular sample of examinees. The IRT characteristic of invariance permits easy equating of examination scores, which places scores on a constant measurement scale and permits the legitimate comparison of student ability change over time. Three common IRT models and their statistical assumptions are discussed. IRT applications in computer-adaptive testing and as a method useful for adjusting rater error in clinical performance assessments are overviewed. Conclusions IRT measurement is a powerful tool used to solve a major problem of CMT, that is, the confounding of examinee ability with item characteristics. IRT measurement addresses important issues in medical education, such as eliminating rater error from performance assessments. [source]