Case-control Data (case-control + data)

Distribution by Scientific Domains


Selected Abstracts


Estimation of Spatial Variation in Risk Using Matched Case-control Data

BIOMETRICAL JOURNAL, Issue 8 2002
Mikala F. Jarner
Abstract A common problem in environmental epidemiology is to estimate spatial variation in disease risk after accounting for known risk factors. In this paper we consider this problem in the context of matched case-control studies. We extend the generalised additive model approach of Kelsall and Diggle (1998) to studies in which each case has been individually matched to a set of controls. We discuss a method for fitting this model to data, apply the method to a matched study on perinatal death in the North West Thames region of England and explain why, if spatial variation is of particular scientific interest, matching is undesirable. [source]


Multistage analysis strategies for genome-wide association studies: summary of group 3 contributions to Genetic Analysis Workshop 16

GENETIC EPIDEMIOLOGY, Issue S1 2009
Rosalind J. Neuman
Abstract This contribution summarizes the work done by six independent teams of investigators to identify the genetic and non-genetic variants that work together or independently to predispose to disease. The theme addressed in these studies is multistage strategies in the context of genome-wide association studies (GWAS). The work performed comes from Group 3 of the Genetic Analysis Workshop 16 held in St. Louis, Missouri in September 2008. These six studies represent a diversity of multistage methods of which five are applied to the North American Rheumatoid Arthritis Consortium rheumatoid arthritis case-control data, and one method is applied to the low-density lipoprotein phenotype in the Framingham Heart Study simulated data. In the first stage of analyses, the majority of studies used a variety of screening techniques to reduce the noise of single-nucleotide polymorphisms purportedly not involved in the phenotype of interest. Three studies analyzed the data using penalized regression models, either LASSO or the elastic net. The main result was a reconfirmation of the involvement of variants in the HLA region on chromosome 6 with rheumatoid arthritis. The hope is that the intense computational methods highlighted in this group of papers will become useful tools in future GWAS. Genet. Epidemiol. 33 (Suppl. 1):S19,S23, 2009. © 2009 Wiley-Liss, Inc. [source]


A partial least-square approach for modeling gene-gene and gene-environment interactions when multiple markers are genotyped

GENETIC EPIDEMIOLOGY, Issue 1 2009
Tao Wang
Abstract Genetic association studies achieve an unprecedented level of resolution in mapping disease genes by genotyping dense single nucleotype polymorphisms (SNPs) in a gene region. Meanwhile, these studies require new powerful statistical tools that can optimally handle a large amount of information provided by genotype data. A question that arises is how to model interactions between two genes. Simply modeling all possible interactions between the SNPs in two gene regions is not desirable because a greatly increased number of degrees of freedom can be involved in the test statistic. We introduce an approach to reduce the genotype dimension in modeling interactions. The genotype compression of this approach is built upon the information on both the trait and the cross-locus gametic disequilibrium between SNPs in two interacting genes, in such a way as to parsimoniously model the interactions without loss of useful information in the process of dimension reduction. As a result, it improves power to detect association in the presence of gene-gene interactions. This approach can be similarly applied for modeling gene-environment interactions. We compare this method with other approaches, the corresponding test without modeling any interaction, that based on a saturated interaction model, that based on principal component analysis, and that based on Tukey's one-degree-of-freedom model. Our simulations suggest that this new approach has superior power to that of the other methods. In an application to endometrial cancer case-control data from the Women's Health Initiative, this approach detected AKT1 and AKT2 as being significantly associated with endometrial cancer susceptibility by taking into account their interactions with body mass index. Genet. Epidemiol. 2008. © 2008 Wiley-Liss, Inc. [source]


Simple estimates of haplotype relative risks in case-control data

GENETIC EPIDEMIOLOGY, Issue 6 2006
Benjamin French
Abstract Methods of varying complexity have been proposed to efficiently estimate haplotype relative risks in case-control data. Our goal was to compare methods that estimate associations between disease conditions and common haplotypes in large case-control studies such that haplotype imputation is done once as a simple data-processing step. We performed a simulation study based on haplotype frequencies for two renin-angiotensin system genes. The iterative and noniterative methods we compared involved fitting a weighted logistic regression, but differed in how the probability weights were specified. We also quantified the amount of ambiguity in the simulated genes. For one gene, there was essentially no uncertainty in the imputed diplotypes and every method performed well. For the other, ,60% of individuals had an unambiguous diplotype, and ,90% had a highest posterior probability greater than 0.75. For this gene, all methods performed well under no genetic effects, moderate effects, and strong effects tagged by a single nucleotide polymorphism (SNP). Noniterative methods produced biased estimates under strong effects not tagged by an SNP. For the most likely diplotype, median bias of the log-relative risks ranged between ,0.49 and 0.22 over all haplotypes. For all possible diplotypes, median bias ranged between ,0.73 and 0.08. Results were similar under interaction with a binary covariate. Noniterative weighted logistic regression provides valid tests for genetic associations and reliable estimates of modest effects of common haplotypes, and can be implemented in standard software. The potential for phase ambiguity does not necessarily imply uncertainty in imputed diplotypes, especially in large studies of common haplotypes. Genet. Epidemiol. 2006. © 2006 Wiley-Liss, Inc. [source]


Haplotype association analysis for late onset diseases using nuclear family data

GENETIC EPIDEMIOLOGY, Issue 3 2006
Chun Li
Abstract In haplotype-based association studies for late onset diseases, one attractive design is to use available unaffected spouses as controls (Valle et al. [1998] Diab. Care 21:949,958). Given cases and spouses only, the standard expectation-maximization (EM) algorithm (Dempster et al. [1977] J. R. Stat. Soc. B 39:1,38) for case-control data can be used to estimate haplotype frequencies. But often we will have offspring for at least some of the spouse pairs, and offspring genotypes provide additional information about the haplotypes of the parents. Existing methods may either ignore the offspring information, or reconstruct haplotypes for the subjects using offspring information and discard data from those whose haplotypes cannot be reconstructed with high confidence. Neither of these approaches is efficient, and the latter approach may also be biased. For case-control data with some subjects forming spouse pairs and offspring genotypes available for some spouse pairs or individuals, we propose a unified, likelihood-based method of haplotype inference. The method makes use of available offspring genotype information to apportion ambiguous haplotypes for the subjects. For subjects without offspring genotype information, haplotypes are apportioned as in the standard EM algorithm for case-control data. Our method enables efficient haplotype frequency estimation using an EM algorithm and supports probabilistic haplotype reconstruction with the probability calculated based on the whole sample. We describe likelihood ratio and permutation tests to test for disease-haplotype association, and describe three test statistics that are potentially useful for detecting such an association. Genet. Epidemiol. 2006. © 2006 Wiley-Liss, Inc. [source]


A novel method to identify gene,gene effects in nuclear families: the MDR-PDT

GENETIC EPIDEMIOLOGY, Issue 2 2006
E.R. Martin
Abstract It is now well recognized that gene,gene and gene,environment interactions are important in complex diseases, and statistical methods to detect interactions are becoming widespread. Traditional parametric approaches are limited in their ability to detect high-order interactions and handle sparse data, and standard stepwise procedures may miss interactions that occur in the absence of detectable main effects. To address these limitations, the multifactor dimensionality reduction (MDR) method [Ritchie et al., 2001: Am J Hum Genet 69:138,147] was developed. The MDR is wellsuited for examining high-order interactions and detecting interactions without main effects. The MDR was originally designed to analyze balanced case-control data. The analysis can use family data, but requires a single matched pair be selected from each family. This may be a discordant sib pair, or may be constructed from triad data when parents are available. To take advantage of additional affected and unaffected siblings requires a test statistic that measures the association of genotype with disease in general nuclear families. We have developed a novel test, the MDR-PDT, by merging the MDR method with the genotype-Pedigree Disequilibrium Test (geno-PDT)[Martin et al., 2003: Genet Epidemiol 25:203,213]. MDR-PDT allows identification of single-locus effects or joint effects of multiple loci in families of diverse structure. We present simulations to demonstrate the validity of the test and evaluate its power. To examine its applicability to real data, we applied the MDR-PDT to data from candidate genes for Alzheimer disease (AD) in a large family dataset. These results show the utility of the MDR-PDT for understanding the genetics of complex diseases. Genet. Epidemiol. 2006. © 2005 Wiley-Liss, Inc. [source]


Haplotype interaction analysis of unlinked regions

GENETIC EPIDEMIOLOGY, Issue 4 2005
Tim Becker
Abstract Genetically complex diseases are caused by interacting environmental factors and genes. As a consequence, statistical methods that consider multiple unlinked genomic regions simultaneously are desirable. Such consideration, however, may lead to a vast number of different high-dimensional tests whose appropriate analysis pose a problem. Here, we present a method to analyze case-control studies with multiple SNP data without phase information that considers gene-gene interaction effects while correcting appropriately for multiple testing. In particular, we allow for interactions of haplotypes that belong to different unlinked regions, as haplotype analysis often proves to be more powerful than single marker analysis. In addition, we consider different marker combinations at each unlinked region. The multiple testing issue is settled via the minP approach; the P value of the "best" marker/region configuration is corrected via Monte-Carlo simulations. Thus, we do not explicitly test for a specific pre-defined interaction model, but test for the global hypothesis that none of the considered haplotype interactions shows association with the disease. We carry out a simulation study for case-control data that confirms the validity of our approach. When simulating two-locus disease models, our test proves to be more powerful than association methods that analyze each linked region separately. In addition, when one of the tested regions is not involved in the etiology of the disease, only a small amount of power is lost with interaction analysis as compared to analysis without interaction. We successfully applied our method to a real case-control data set with markers from two genes controlling a common pathway. While classical analysis failed to reach significance, we obtained a significant result even after correction for multiple testing with our proposed haplotype interaction analysis. The method described here has been implemented in FAMHAP. Genet. Epidemiol. 2005. © 2005 Wiley-Liss, Inc. [source]


Stratified case sampling and the use of family controls

GENETIC EPIDEMIOLOGY, Issue 3 2001
Kimberly D. Siegmund
Abstract We compare the asymptotic relative efficiency (ARE) of different study designs for estimating gene and gene-environment interaction effects using matched case-control data. In the sampling schemes considered, cases are selected differentially based on their family history of disease. Controls are selected either from unrelated subjects or from among the case's unaffected siblings and cousins. Parameters are estimated using weighted conditional logistic regression, where the likelihood contributions for each subject are weighted by the fraction of cases sampled sharing the same family history. Results showed that compared to random sampling, over-sampling cases with a positive family history increased the efficiency for estimating the main effect of a gene for sib-control designs (103,254% ARE) and decreased efficiency for cousin-control and population-control designs (68,94% ARE and 67,84% ARE, respectively). Population controls and random sampling of cases were most efficient for a recessive gene or a dominant gene with an relative risk less than 9. For estimating gene-environment interactions, over-sampling positive-family-history cases again led to increased efficiency using sib controls (111,180% ARE) and decreased efficiency using population controls (68,87% ARE). Using case-cousin pairs, the results differed based on the genetic model and the size of the interaction effect; biased sampling was only slightly more efficient than random sampling for large interaction effects under a dominant gene model (relative risk ratio = 8, 106% ARE). Overall, the most efficient study design for studying gene-environment interaction was the case-sib-control design with over-sampling of positive-family-history-cases. Genet. Epidemiol. 20:316,327, 2001. © 2001 Wiley-Liss, Inc. [source]


Pearson's Test, Trend Test, and MAX Are All Trend Tests with Different Types of Scores

ANNALS OF HUMAN GENETICS, Issue 2 2009
Gang Zheng
Summary Pearson's test is one of the most commonly used statistics for testing genetic association of case-control data. The trend test is another one which assumes a dose-response model between the risk of the disease and genotypes. To apply the trend test, a set of ordered scores is assigned a priori based on the underlying genetic model. Pearson's test is model-free and robust, but is less powerful for common genetic models. MAX is another robust test statistic, which takes the maximum of the trend tests over a family of scientifically plausible genetic models. We show that the three test statistics are all trend tests but with different types of scores; whether the scores are prespecified or data-driven, or whether the scores are ordered (restricted) or not ordered (unrestricted). We then provide insights into power performance of the three tests when the underlying genetic model is unknown and discuss which test to use for the analyses of case-control genetic association studies. [source]


U-Statistics-based Tests for Multiple Genes in Genetic Association Studies

ANNALS OF HUMAN GENETICS, Issue 6 2008
Zhi Wei
Summary As our understanding of biological pathways and the genes that regulate these pathways increases, consideration of these biological pathways has become an increasingly important part of genetic and molecular epidemiology. Pathway-based genetic association studies often involve genotyping of variants in genes acting in certain biological pathways. Such pathway-based genetic association studies can potentially capture the highly heterogeneous nature of many complex traits, with multiple causative loci and multiple alleles at some of the causative loci. In this paper, we develop two nonparametric test statistics that consider simultaneously the effects of multiple markers. Our approach, which is based on data-adaptive U-statistics, can handle both qualitative data such as case-control data and quantitative continuous phenotype data. Simulations demonstrate that our proposed methods are more powerful than standard methods, especially when there are multiple risk loci each with small genetic effects. When the number of disease-predisposing genes is small, the data-adaptive weighting of the U-statistics over all the markers produces similar power to commonly used single marker tests. We further illustrate the potential merits of our proposed tests in the analysis of a data set from a pathway-based candidate gene association study of breast cancer and hormone metabolism pathways. Finally, potential applications of the proposed tests to genome-wide association studies are also discussed. [source]