Type I Error Control (type + i_error_control)

Distribution by Scientific Domains


Selected Abstracts


Hierarchical Logistic Regression: Accounting for Multilevel Data in DIF Detection

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 3 2010
Brian F. French
The purpose of this study was to examine the performance of differential item functioning (DIF) assessment in the presence of a multilevel structure that often underlies data from large-scale testing programs. Analyses were conducted using logistic regression (LR), a popular, flexible, and effective tool for DIF detection. Data were simulated using a hierarchical framework, such as might be seen when examinees are clustered in schools, for example. Both standard and hierarchical LR (accounting for multilevel data) approaches to DIF detection were employed. Results highlight the differences in DIF detection rates when the analytic strategy matches the data structure. Specifically, when the grouping variable was within clusters, LR and HLR performed similarly in terms of Type I error control and power. However, when the grouping variable was between clusters, LR failed to maintain the nominal Type I error rate of .05. HLR was able to maintain this rate. However, power for HLR tended to be low under many conditions in the between cluster variable case. [source]


Resampling-Based Empirical Bayes Multiple Testing Procedures for Controlling Generalized Tail Probability and Expected Value Error Rates: Focus on the False Discovery Rate and Simulation Study

BIOMETRICAL JOURNAL, Issue 5 2008
Sandrine Dudoit
Abstract This article proposes resampling-based empirical Bayes multiple testing procedures for controlling a broad class of Type I error rates, defined as generalized tail probability (gTP) error rates, gTP (q,g) = Pr(g (Vn,Sn) > q), and generalized expected value (gEV) error rates, gEV (g) = E [g (Vn,Sn)], for arbitrary functions g (Vn,Sn) of the numbers of false positives Vn and true positives Sn. Of particular interest are error rates based on the proportion g (Vn,Sn) = Vn /(Vn + Sn) of Type I errors among the rejected hypotheses, such as the false discovery rate (FDR), FDR = E [Vn /(Vn + Sn)]. The proposed procedures offer several advantages over existing methods. They provide Type I error control for general data generating distributions, with arbitrary dependence structures among variables. Gains in power are achieved by deriving rejection regions based on guessed sets of true null hypotheses and null test statistics randomly sampled from joint distributions that account for the dependence structure of the data. The Type I error and power properties of an FDR-controlling version of the resampling-based empirical Bayes approach are investigated and compared to those of widely-used FDR-controlling linear step-up procedures in a simulation study. The Type I error and power trade-off achieved by the empirical Bayes procedures under a variety of testing scenarios allows this approach to be competitive with or outperform the Storey and Tibshirani (2003) linear step-up procedure, as an alternative to the classical Benjamini and Hochberg (1995) procedure. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


Methods to account for spatial autocorrelation in the analysis of species distributional data: a review

ECOGRAPHY, Issue 5 2007
Carsten F. Dormann
Species distributional or trait data based on range map (extent-of-occurrence) or atlas survey data often display spatial autocorrelation, i.e. locations close to each other exhibit more similar values than those further apart. If this pattern remains present in the residuals of a statistical model based on such data, one of the key assumptions of standard statistical analyses, that residuals are independent and identically distributed (i.i.d), is violated. The violation of the assumption of i.i.d. residuals may bias parameter estimates and can increase type I error rates (falsely rejecting the null hypothesis of no effect). While this is increasingly recognised by researchers analysing species distribution data, there is, to our knowledge, no comprehensive overview of the many available spatial statistical methods to take spatial autocorrelation into account in tests of statistical significance. Here, we describe six different statistical approaches to infer correlates of species' distributions, for both presence/absence (binary response) and species abundance data (poisson or normally distributed response), while accounting for spatial autocorrelation in model residuals: autocovariate regression; spatial eigenvector mapping; generalised least squares; (conditional and simultaneous) autoregressive models and generalised estimating equations. A comprehensive comparison of the relative merits of these methods is beyond the scope of this paper. To demonstrate each method's implementation, however, we undertook preliminary tests based on simulated data. These preliminary tests verified that most of the spatial modeling techniques we examined showed good type I error control and precise parameter estimates, at least when confronted with simplistic simulated data containing spatial autocorrelation in the errors. However, we found that for presence/absence data the results and conclusions were very variable between the different methods. This is likely due to the low information content of binary maps. Also, in contrast with previous studies, we found that autocovariate methods consistently underestimated the effects of environmental controls of species distributions. Given their widespread use, in particular for the modelling of species presence/absence data (e.g. climate envelope models), we argue that this warrants further study and caution in their use. To aid other ecologists in making use of the methods described, code to implement them in freely available software is provided in an electronic appendix. [source]


Using evidence for population stratification bias in combined individual- and family-level genetic association analyses of quantitative traits

GENETIC EPIDEMIOLOGY, Issue 5 2010
Lucia Mirea
Abstract Genetic association studies are generally performed either by examining differences in the genotype distribution between individuals or by testing for preferential allele transmission within families. In the absence of population stratification bias (PSB), integrated analyses of individual and family data can increase power to identify susceptibility loci [Abecasis et al., 2000. Am. J. Hum. Genet. 66:279,292; Chen and Lin, 2008. Genet. Epidemiol. 32:520,527; Epstein et al., 2005. Am. J. Hum. Genet. 76:592,608]. In existing methods, the presence of PSB is initially assessed by comparing results from between-individual and within-family analyses, and then combined analyses are performed only if no significant PSB is detected. However, this strategy requires specification of an arbitrary testing level ,PSB, typically 5%, to declare PSB significance. As a novel alternative, we propose to directly use the PSB evidence in weights that combine results from between-individual and within-family analyses. The weighted approach generalizes previous methods by using a continuous weighting function that depends only on the observed P -value instead of a binary weight that depends on ,PSB. Using simulations, we demonstrate that for quantitative trait analysis, the weighted approach provides a good compromise between type I error control and power to detect association in studies with few genotyped markers and limited information regarding population structure. Genet. Epidemiol. 34: 502,511, 2010. © 2010 Wiley-Liss, Inc. [source]


Pleiotropy and principal components of heritability combine to increase power for association analysis

GENETIC EPIDEMIOLOGY, Issue 1 2008
Lambertus Klei
Abstract When many correlated traits are measured the potential exists to discover the coordinated control of these traits via genotyped polymorphisms. A common statistical approach to this problem involves assessing the relationship between each phenotype and each single nucleotide polymorphism (SNP) individually (PHN); and taking a Bonferroni correction for the effective number of independent tests conducted. Alternatively, one can apply a dimension reduction technique, such as estimation of principal components, and test for an association with the principal components of the phenotypes (PCP) rather than the individual phenotypes. Building on the work of Lange and colleagues we develop an alternative method based on the principal component of heritability (PCH). For each SNP the PCH approach reduces the phenotypes to a single trait that has a higher heritability than any other linear combination of the phenotypes. As a result, the association between a SNP and derived trait is often easier to detect than an association with any of the individual phenotypes or the PCP. When applied to unrelated subjects, PCH has a drawback. For each SNP it is necessary to estimate the vector of loadings that maximize the heritability over all phenotypes. We develop a method of iterated sample splitting that uses one portion of the data for training and the remainder for testing. This cross-validation approach maintains the type I error control and yet utilizes the data efficiently, resulting in a powerful test for association. Genet. Epidemiol. 2007. © 2007 Wiley-Liss, Inc. [source]