Achievement Test Scores (achievement + test_score)

Distribution by Scientific Domains

Selected Abstracts

Generating Dichotomous Item Scores with the Four-Parameter Beta Compound Binomial Model

Patrick O. Monahan
A Monte Carlo simulation technique for generating dichotomous item scores is presented that implements (a) a psychometric model with different explicit assumptions than traditional parametric item response theory (IRT) models, and (b) item characteristic curves without restrictive assumptions concerning mathematical form. The four-parameter beta compound-binomial (4PBCB) strong true score model (with two-term approximation to the compound binomial) is used to estimate and generate the true score distribution. The nonparametric item-true score step functions are estimated by classical item difficulties conditional on proportion-correct total score. The technique performed very well in replicating inter-item correlations, item statistics (point-biserial correlation coefficients and item proportion-correct difficulties), first four moments of total score distribution, and coefficient alpha of three real data sets consisting of educational achievement test scores. The technique replicated real data (including subsamples of differing proficiency) as well as the three-parameter logistic (3PL) IRT model (and much better than the 1PL model) and is therefore a promising alternative simulation technique. This 4PBCB technique may be particularly useful as a more neutral simulation procedure for comparing methods that use different IRT models. [source]

An Application of Item Response Time: The Effort-Moderated IRT Model

Steven L. Wise
The validity of inferences based on achievement test scores is dependent on the amount of effort that examinees put forth while taking the test. With low-stakes tests, for which this problem is particularly prevalent, there is a consequent need for psychometric models that can take into account differing levels of examinee effort. This article introduces the effort-moderated IRT model, which incorporates item response time into proficiency estimation and item parameter estimation. In two studies of the effort-moderated model when rapid guessing (i.e., reflecting low examinee effort) was present, one based on real data and the other on simulated data, the effort-moderated model performed better than the standard 3PL model. Specifically, it was found that the effort-moderated model (a) showed better model fit, (b) yielded more accurate item parameter estimates, (c) more accurately estimated test information, and (d) yielded proficiency estimates with higher convergent validity. [source]

How close is close enough?

Evaluating propensity score matching using data from a class size reduction experiment
In recent years, propensity score matching (PSM) has gained attention as a potential method for estimating the impact of public policy programs in the absence of experimental evaluations. In this study, we evaluate the usefulness of PSM for estimating the impact of a program change in an educational context (Tennessee's Student Teacher Achievement Ratio Project [Project STAR]). Because Tennessee's Project STAR experiment involved an effective random assignment procedure, the experimental results from this policy intervention can be used as a benchmark, to which we compare the impact estimates produced using propensity score matching methods. We use several different methods to assess these nonexperimental estimates of the impact of the program. We try to determine "how close is close enough," putting greatest emphasis on the question: Would the nonexperimental estimate have led to the wrong decision when compared to the experimental estimate of the program? We find that propensity score methods perform poorly with respect to measuring the impact of a reduction in class size on achievement test scores. We conclude that further research is needed before policymakers rely on PSM as an evaluation tool. 2007 by the Association for Public Policy Analysis and Management [source]

A case study of one school system's adoption and implementation of an elementary science program

Michael P. Kelly
In this investigation we employed a case study approach with qualitative and quantitative data sources to examine and discover the characteristics of the processes used by a midwestern U.S. school system to adopt and implement a new K,6 science curriculum. Analysis of data yielded several results. Elementary teachers received what they requested, a hands-on science program with texts and kits. Teachers as a group remained in the early stages of the Concerns-Based Adoption Model profile of concerns. Many K,6 teachers remained uncomfortable with teaching science. Teachers' attitudes regarding the new program were positive, and they taught more science. Teachers struggled with science-as-inquiry, with a science program they believe contained too many concepts and too much vocabulary, and with their beliefs that students learned more and loved the new hands-on program. Traditional science teaching remained the norm. Administrative support was positive but insufficient to facilitate full implementation of the new program and more substantial change in teaching. Standardized science achievement test scores did not show an observable pattern of growth. It is concluded that a systematic, ongoing program of professional development is necessary to address teachers' concerns and help the district realize its goal of standards-based K,6 science instruction. 2004 Wiley Periodicals, Inc. J Res Sci Teach 42: 25,52, 2005 [source]

How Should Reading Disabilities be Operationalized?

A Survey of Experts
In the face of accumulating research and logic, the use of a discrepancy between intelligence and reading achievement test scores is becoming increasingly untenable as a marker of reading disabilities. However, it is not clear what criteria might replace the discrepancy requirement. We surveyed 218 members of journal editorial boards to solicit their opinions on current and proposed definitional components and exclusion criteria. Three components were selected by over two-thirds of the respondents: reading achievement, phonemic awareness, and treatment validity. However, only 30 percent believed IQ-reading achievement discrepancy should be a marker. More than 75 percent of the respondents believed exclusion criteria should remain part of the definition. Mental retardation was the most frequently selected exclusion criterion despite rejection of intelligence test scores as a definitional component. Although the findings reflect uncertainty among experts on what elements should comprise a definition, they do signal a willingness to consider new approaches to the conceptually difficult task of defining reading disabilities. [source]

Numerical Magnitude Representations Influence Arithmetic Learning

Julie L. Booth
This study examined whether the quality of first graders' (mean age = 7.2 years) numerical magnitude representations is correlated with, predictive of, and causally related to their arithmetic learning. The children's pretest numerical magnitude representations were found to be correlated with their pretest arithmetic knowledge and to be predictive of their learning of answers to unfamiliar arithmetic problems. The relation to learning of unfamiliar problems remained after controlling for prior arithmetic knowledge, short-term memory for numbers, and math achievement test scores. Moreover, presenting randomly chosen children with accurate visual representations of the magnitudes of addends and sums improved their learning of the answers to the problems. Thus, representations of numerical magnitude are both correlationally and causally related to arithmetic learning. [source]