Home About us Contact

Differential Item Functioning (differential + item_functioning)

Distribution by Scientific Domains

Education	90%

Selected Abstracts

Assessing Differential Item Functioning in Performance Assessment: Review and Recommendations

EDUCATIONAL MEASUREMENT: ISSUES AND PRACTICE, Issue 3 2000
Randall D. Penfield
How can we best extend DIF research to performance assessment? What are the issues and problems surrounding studies of DIF on complex tasks? What appear to be the best approaches at this time? [source]

A Mixture Model Analysis of Differential Item Functioning

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 2 2005
Allan S. Cohen
Once a differential item functioning (DIF) item has been identified, little is known about the examinees for whom the item functions differentially. This is because DIF focuses on manifest group characteristics that are associated with it, but do not explain why examinees respond differentially to items. We first analyze item response patterns for gender DIF and then illustrate, through the use of a mixture item response theory (IRT) model, how the manifest characteristic associated with DIF often has a very weak relationship with the latent groups actually being advantaged or disadvantaged by the item(s). Next, we propose an alternative approach to DIF assessment that first uses an exploratory mixture model analysis to define the primary dimension(s) that contribute to DIF, and secondly studies examinee characteristics associated with those dimensions in order to understand the cause(s) of DIF. Comparison of academic characteristics of these examinees across classes reveals some clear differences in manifest characteristics between groups. [source]

An Empirical Investigation Demonstrating the Multidimensional DIF Paradigm: A Cognitive Explanation for DIF

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 2 2001
Cindy M. Walker
Differential Item Functioning (DIF) is traditionally used to identify different item performance patterns between intact groups, most commonly involving race or sex comparisons. This study advocates expanding the utility of DIF as a step in construct validation. Rather than grouping examinees based on cultural differences, the reference and focal groups are chosen from two extremes along a distinct cognitive dimension that is hypothesized to supplement the dominant latent trait being measured. Specifically, this study investigates DIF between proficient and non-proficient fourth- and seventh-grade writers on open-ended mathematics test items that require students to communicate about mathematics. It is suggested that the occurrence of DIF in this situation actually enhances, rather than detracts from, the construct validity of the test because, according to the National Council of Teachers of Mathematics (NCTM), mathematical communication is an important component of mathematical ability, the dominant construct being assessed. However, the presence of DIF influences the validity of inferences that can be made from test scores and suggests that two scores should be reported, one for general mathematical ability and one for mathematical communication. The fact that currently only one test score is reported, a simple composite of scores on multiple-choice and open-ended items, may lead to incorrect decisions being made about examinees. [source]

Identifying Sources of Differential Item and Bundle Functioning on Translated Achievement Tests: A Confirmatory Analysis

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 2 2001
Mark J. Gierl
Increasingly, tests are being translated and adapted into different languages. Differential item functioning (DIF) analyses are often used to identify non-equivalent items across language groups. However, few studies have focused on understanding why some translated items produce DIF. The purpose of the current study is to identify sources of differential item and bundle functioning on translated achievement tests using substantive and statistical analyses. A substantive analysis of existing DIF items was conducted by an 11-member committee of testing specialists. In their review, four sources of translation DIF were identified. Two certified translators used these four sources to categorize a new set of DIF items from Grade 6 and 9 Mathematics and Social Studies Achievement Tests. Each item was associated with a specific source of translation DIF and each item was anticipated to favor a specific group of examinees. Then, a statistical analysis was conducted on the items in each category using SIBTEST. The translators sorted the mathematics DIF items into three sources, and they correctly predicted the group that would be favored for seven of the eight items or bundles of items across two grade levels. The translators sorted the social studies DIF items into four sources, and they correctly predicted the group that would be favored for eight of the 13 items or bundles of items across two grade levels. The majority of items in mathematics and social studies were associated with differences in the words, expressions, or sentence structure of items that are not inherent to the language and/or culture. By combining substantive and statistical DIF analyses, researchers can study the sources of DIF and create a body of confirmed DIF hypotheses that may be used to develop guidelines and test construction principles for reducing DIF on translated tests. [source]

Examining item bias in the anxiety subscale of the Hospital Anxiety and Depression Scale in patients with chronic obstructive pulmonary disease

INTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH, Issue 2 2008
Wai-Kwong Tang
Abstract The Hospital Anxiety and Depression Scale (HADS) is a widely used screening instrument for depression and anxiety in medically compromised patients. The purpose of this study was to examine the differential item functioning (DIF) of the anxiety subscale of the HADA (HADS-A). A research assistant administered the HADS-A to 166 Chinese patients with chronic obstructive pulmonary disease (COPD) who were consecutively admitted to a rehabilitation hospital. Although the HADS-A was overall uni-dimensional, there were one mute item and two items with borderline misfit. Only one item had a DIF for arterial oxygen saturation. No item had DIF for other indicators of the severity of COPD. In conclusion, this study found that for one item the HADS-A has significant item bias for the severity of disease in patients with COPD. Copyright © 2008 John Wiley & Sons, Ltd. [source]

Hierarchical Logistic Regression: Accounting for Multilevel Data in DIF Detection

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 3 2010
Brian F. French
The purpose of this study was to examine the performance of differential item functioning (DIF) assessment in the presence of a multilevel structure that often underlies data from large-scale testing programs. Analyses were conducted using logistic regression (LR), a popular, flexible, and effective tool for DIF detection. Data were simulated using a hierarchical framework, such as might be seen when examinees are clustered in schools, for example. Both standard and hierarchical LR (accounting for multilevel data) approaches to DIF detection were employed. Results highlight the differences in DIF detection rates when the analytic strategy matches the data structure. Specifically, when the grouping variable was within clusters, LR and HLR performed similarly in terms of Type I error control and power. However, when the grouping variable was between clusters, LR failed to maintain the nominal Type I error rate of .05. HLR was able to maintain this rate. However, power for HLR tended to be low under many conditions in the between cluster variable case. [source]

Testing Features of Graphical DIF: Application of a Regression Correction to Three Nonparametric Statistical Tests

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 4 2006
Daniel M. Bolt
Inspection of differential item functioning (DIF) in translated test items can be informed by graphical comparisons of item response functions (IRFs) across translated forms. Due to the many forms of DIF that can emerge in such analyses, it is important to develop statistical tests that can confirm various characteristics of DIF when present. Traditional nonparametric tests of DIF (Mantel-Haenszel, SIBTEST) are not designed to test for the presence of nonuniform or local DIF, while common probability difference (P-DIF) tests (e.g., SIBTEST) do not optimize power in testing for uniform DIF, and thus may be less useful in the context of graphical DIF analyses. In this article, modifications of three alternative nonparametric statistical tests for DIF, Fisher's ,2test, Cochran's Z test, and Goodman's U test (Marascuilo & Slaughter, 1981), are investigated for these purposes. A simulation study demonstrates the effectiveness of a regression correction procedure in improving the statistical performance of the tests when using an internal test score as the matching criterion. Simulation power and real data analyses demonstrate the unique information provided by these alternative methods compared to SIBTEST and Mantel-Haenszel in confirming various forms of DIF in translated tests. [source]

A Mixture Model Analysis of Differential Item Functioning

Exact Small-Sample Differential Item Functioning Methods for Polytomous Items With Illustration Based on an Attitude Survey

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 4 2004
J. Patrick Meyer
Exact nonparametric procedures have been used to identify the level of differential item functioning (DIF) in binary items. This study explored the use of exact DIF procedures with items scored on a Likert scale. The results from an attitude survey suggest that the large-sample Cochran-Mantel-Haenszel (CMH) procedure identifies more items as statistically significant than two comparable exact nonparametric methods. This finding is consistent with previous findings; however, when items are classified in National Assessment of Educational Progress DIF categories, the results show that the CMH and its exact nonparametric counterparts produce almost identical classifications. Since DIF is often evaluated in terms of statistical and practical significance, this study provides evidence that the large-sample CMH procedure may be safely used even when the focal group has as few as 76 cases. [source]

Applying the Liu-Agresti Estimator of the Cumulative Common Odds Ratio to DIF Detection in Polytomous Items

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 4 2003
Randall D. Penfield
Liu and Agresti (1996) proposed a Mantel and Haenszel-type (1959) estimator of a common odds ratio for several 2 × J tables, where the J columns are ordinal levels of a response variable. This article applies the Liu-Agresti estimator to the case of assessing differential item functioning (DIF) in items having an ordinal response variable. A simulation study was conducted to investigate the accuracy of the Liu-Agresti estimator in relation to other statistical DIF detection procedures. The results of the simulation study indicate that the Liu-Agresti estimator is a viable alternative to other DIF detection statistics. [source]

A SIBTEST Approach to Testing DIF Hypotheses Using Experimentally Designed Test Items

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 4 2000
Daniel M. Bolt
This paper considers a modification of the DIF procedure SIBTEST for investigating the causes of differential item functioning (DIF). One way in which factors believed to be responsible for DIF can be investigated is by systematically manipulating them across multiple versions of an item using a randomized DIF study (Schmitt, Holland, & Dorans, 1993). In this paper: it is shown that the additivity of the index used for testing DIF in SIBTEST motivates a new extension of the method for statistically testing the effects of DIF factors. Because an important consideration is whether or not a studied DIF factor is consistent in its effects across items, a methodology for testing item x factor interactions is also presented. Using data from the mathematical sections of the Scholastic Assessment Test (SAT), the effects of two potential DIF factors,item format (multiple-choice versus open-ended) and problem type (abstract versus concrete),are investigated for gender Results suggest a small but statistically significant and consistent effect of item format (favoring males for multiple-choice items) across items, and a larger but less consistent effect due to problem type. [source]