Home About us Contact

Mathematics Items (mathematics + item)

Distribution by Scientific Domains

Education	100%

Selected Abstracts

A Sex Difference by Item Difficulty Interaction in Multiple-Choice Mathematics Items Administered to National Probability Samples

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 1 2001
John Bielinski
A 1998 study by Bielinski and Davison reported a sex difference by item difficulty interaction in which easy items tended to be easier for females than males, and hard items tended to be harder for females than males. To extend their research to nationally representative samples of students, this study used math achievement data from the 1992 NAEP, the TIMSS, and the NELS:88. The data included students in grades 4, 8, 10, and 12. The interaction was assessed by correlating the item difficulty difference (bmale, bfemale) with item difficulty computed on the combined male/female sample. Using only the multiple-choice mathematics items, the predicted negative correlation was found for all eight populations and was significant in five. An argument is made that this phenomenon may help explain the greater variability in math achievement among males as compared to females and the emergence of higher performance of males in late adolescence. [source]

Generalizability of Cognitive Interview-Based Measures Across Cultural Groups

EDUCATIONAL MEASUREMENT: ISSUES AND PRACTICE, Issue 2 2009
Guillermo Solano-Flores
We addressed the challenge of scoring cognitive interviews in research involving multiple cultural groups. We interviewed 123 fourth- and fifth-grade students from three cultural groups to probe how they related a mathematics item to their personal lives. Item meaningfulness,the tendency of students to relate the content and/or context of an item to activities in which they are actors,was scored from interview transcriptions with a procedure similar to the scoring of constructed-response tasks. Generalizability theory analyses revealed a small amount of score variation due to the main and interaction effect of rater but a sizeable magnitude of measurement error due to the interaction of person and question (context). Students from different groups tended to draw on different sets of contexts of their personal lives to make sense of the item. In spite of individual and potential cultural communication style differences, cognitive interviews can be reliably scored by well-trained raters with the same kind of rigor used in the scoring of constructed-response tasks. However, to make valid generalizations of cognitive interview-based measures, a considerable number of interview questions may be needed. Information obtained with cognitive interviews for a given cultural group may not be generalizable to other groups. [source]

The Use of Generalizability (G) Theory in the Testing of Linguistic Minorities

EDUCATIONAL MEASUREMENT: ISSUES AND PRACTICE, Issue 1 2006
Flores, Guillermo Solano
We contend that generalizability (G) theory allows the design of psychometric approaches to testing English-language learners (ELLs) that are consistent with current thinking in linguistics. We used G theory to estimate the amount of measurement error due to code (language or dialect). Fourth- and fifth-grade ELLs, native speakers of Haitian-Creole from two speech communities, were given the same set of mathematics items in the standard English and standard Haitian-Creole dialects (Sample 1) or in the standard and local dialects of Haitian-Creole (Samples 2 and 3). The largest measurement error observed was produced by the interaction of student, item, and code. Our results indicate that the reliability and dependability of ELL achievement measures is affected by two facts that operate in combination: Each test item poses a unique set of linguistic challenges and each student has a unique set of linguistic strengths and weaknesses. This sensitivity to language appears to take place at the level of dialect. Also, students from different speech communities within the same broad linguistic group may differ considerably in the number of items needed to obtain dependable measures of their academic achievement. Whether students are tested in English or in their first language, dialect variation needs to be considered if language as a source of measurement error is to be effectively addressed. [source]