Analysis Workshop (analysis + workshop)

Distribution by Scientific Domains

Kinds of Analysis Workshop

  • genetic analysis workshop


  • Selected Abstracts


    Genome-wide association studies for discrete traits

    GENETIC EPIDEMIOLOGY, Issue S1 2009
    Duncan C. Thomas
    Abstract Genome-wide association studies of discrete traits generally use simple methods of analysis based on ,2 tests for contingency tables or logistic regression, at least for an initial scan of the entire genome. Nevertheless, more power might be obtained by using various methods that analyze multiple markers in combination. Methods based on sliding windows, wavelets, Bayesian shrinkage, or penalized likelihood methods, among others, were explored by various participants of Genetic Analysis Workshop 16 Group 1 to combine information across multiple markers within a region, while others used Bayesian variable selection methods for genome-wide multivariate analyses of all markers simultaneously. Imputation can be used to fill in missing markers on individual subjects within a study or in a meta-analysis of studies using different panels. Although multiple imputation theoretically should give more robust tests of association, one participant contribution found little difference between results of single and multiple imputation. Careful control of population stratification is essential, and two contributions found that previously reported associations with two genes disappeared after more precise control. Other issues considered by this group included subgroup analysis, gene-gene interactions, and the use of biomarkers. Genet. Epidemiol. 33 (Suppl. 1):S8,S12, 2009. © 2009 Wiley-Liss, Inc. [source]


    Genome-wide association analyses of quantitative traits: the GAW16 experience

    GENETIC EPIDEMIOLOGY, Issue S1 2009
    Saurabh GhoshArticle first published online: 18 NOV 200
    Abstract The group that formed on the theme of genome-wide association analyses of quantitative traits (Group 2) in the Genetic Analysis Workshop 16 comprised eight sets of investigators. Three data sets were available: one on autoantibodies related to rheumatoid arthritis provided by the North American Rheumatoid Arthritis Consortium; the second on anthropometric, lipid, and biochemical measures provided by the Framingham Heart Study (FHS); and the third a simulated data set modeled after FHS. The different investigators in the group addressed a large set of statistical challenges and applied a wide spectrum of association methods in analyzing quantitative traits at the genome-wide level. While some previously reported genes were validated, some novel chromosomal regions provided significant evidence of association in multiple contributions in the group. In this report, we discuss the different strategies explored by the different investigators with the common goal of improving the power to detect association. Genet. Epidemiol. 33 (Suppl. 1):S13,S18, 2009. © 2009 Wiley-Liss, Inc. [source]


    Multistage analysis strategies for genome-wide association studies: summary of group 3 contributions to Genetic Analysis Workshop 16

    GENETIC EPIDEMIOLOGY, Issue S1 2009
    Rosalind J. Neuman
    Abstract This contribution summarizes the work done by six independent teams of investigators to identify the genetic and non-genetic variants that work together or independently to predispose to disease. The theme addressed in these studies is multistage strategies in the context of genome-wide association studies (GWAS). The work performed comes from Group 3 of the Genetic Analysis Workshop 16 held in St. Louis, Missouri in September 2008. These six studies represent a diversity of multistage methods of which five are applied to the North American Rheumatoid Arthritis Consortium rheumatoid arthritis case-control data, and one method is applied to the low-density lipoprotein phenotype in the Framingham Heart Study simulated data. In the first stage of analyses, the majority of studies used a variety of screening techniques to reduce the noise of single-nucleotide polymorphisms purportedly not involved in the phenotype of interest. Three studies analyzed the data using penalized regression models, either LASSO or the elastic net. The main result was a reconfirmation of the involvement of variants in the HLA region on chromosome 6 with rheumatoid arthritis. The hope is that the intense computational methods highlighted in this group of papers will become useful tools in future GWAS. Genet. Epidemiol. 33 (Suppl. 1):S19,S23, 2009. © 2009 Wiley-Liss, Inc. [source]


    Summary of contributions to GAW Group 15: family-based samples are useful in identifying common polymorphisms associated with complex traits

    GENETIC EPIDEMIOLOGY, Issue S1 2009
    Stacey Knight
    Abstract Traditionally, family-based samples have been used for genetic analyses of single-gene traits caused by rare but highly penetrant risk variants. The utility of family-based genetic data for analyzing common complex traits is unclear and contains numerous challenges. To assess the utility as well as to address these challenges, members of Genetic Analysis Workshop 16 Group 15 analyzed Framingham Heart Study data using family-based designs ranging from parent,offspring trios to large pedigrees. We investigated different methods including traditional linkage tests, family-based association tests, and population-based tests that correct for relatedness between subjects, and tests to detect parent-of-origin effects. The analyses presented an assortment of positive findings. One contribution found increased power to detect epistatic effects through linkage using ascertainment of sibships based on extreme quantitative values or presence of disease associated with the quantitative value. Another contribution found four single-nucleotide polymorphisms (SNPs) showing a maternal effect, two SNPs with an imprinting effect, and one SNP having both effects on a binary high blood pressure trait. Finally, three contributions illustrated the advantage of using population-based methods to detect association to complex binary or quantitative traits. Our findings highlight the contribution of family-based samples to the genetic dissection of complex traits. Genet. Epidemiol. 33 (Suppl. 1):S99,S104, 2009. © 2009 Wiley-Liss, Inc. [source]


    Bivariate combined linkage and association mapping of quantitative trait loci

    GENETIC EPIDEMIOLOGY, Issue 5 2008
    Jeesun Jung
    Abstract In this paper, bivariate/multivariate variance component models are proposed for high-resolution combined linkage and association mapping of quantitative trait loci (QTL), based on combinations of pedigree and population data. Suppose that a quantitative trait locus is located in a chromosome region that exerts pleiotropic effects on multiple quantitative traits. In the region, multiple markers such as single nucleotide polymorphisms are typed. Two regression models, "genotype effect model" and "additive effect model", are proposed to model the association between the markers and the trait locus. The linkage information, i.e., recombination fractions between the QTL and the markers, is modeled in the variance and covariance matrix. By analytical formulae, we show that the "genotype effect model" can be used to model the additive and dominant effects simultaneously; the "additive effect model" only takes care of additive effect. Based on the two models, F -test statistics are proposed to test association between the QTL and markers. By analytical power analysis, we show that bivariate models can be more powerful than univariate models. For moderate-sized samples, the proposed models lead to correct type I error rates; and so the models are reasonably robust. As a practical example, the method is applied to analyze the genetic inheritance of rheumatoid arthritis for the data of The North American Rheumatoid Arthritis Consortium, Problem 2, Genetic Analysis Workshop 15, which confirms the advantage of the proposed bivariate models. Genet. Epidemiol. 2008. © 2008 Wiley-Liss, Inc. [source]


    Genome-wide association analyses of expression phenotypes

    GENETIC EPIDEMIOLOGY, Issue S1 2007
    Gary K. Chen
    Abstract A number of issues arise when analyzing the large amount of data from high-throughput genotype and expression microarray experiments, including design and interpretation of genome-wide association studies of expression phenotypes. These issues were considered by contributions submitted to Group 1 of the Genetic Analysis Workshop 15 (GAW15), which focused on the association of quantitative expression data. These contributions evaluated diverse hypotheses, including those relevant to cancer and obesity research, and used various analytic techniques, many of which were derived from information theory. Several observations from these reports stand out. First, one needs to consider the genetic model of the trait of interest and carefully select which single nucleotide polymorphisms and individuals are included early in the design stage of a study. Second, by targeting specific pathways when analyzing genome-wide data, one can generate more interpretable results than agnostic approaches. Finally, for datasets with small sample sizes but a large number of features like the Genetic Analysis Workshop 15 dataset, machine learning approaches may be more practical than traditional parametric approaches. Genet Epidemiol 31 (Suppl. 1): S7,S11, 2007. © 2007 Wiley-Liss, Inc. [source]


    Genetic association with rheumatoid arthritis,Genetic Analysis Workshop 15: summary of contributions from Group 2

    GENETIC EPIDEMIOLOGY, Issue S1 2007
    Marsha A. Wilcox
    Abstract The papers in presentation group 2 of Genetic Analysis Workshop 15 (GAW15) conducted association analyses of rheumatoid arthritis data. The analyses were carried out primarily in the data provided by the North American Rheumatoid Arthritis Consortium (NARAC). One group conducted analyses in the data provided by the Canadian Rheumatoid Arthritis Genetics Study (CRAGS). Analysis strategies included genome-wide scans, the examination of candidate genes, and investigations of a region of interest on chromosome 18q21. Most authors employed relatively new methods, proposed extensions of existing methods, or introduced completely novel methods for aspects of association analysis. There were several common observations; a group of papers using a variety of methods found stronger association, on chromosomes 6 and 18 and in candidate gene PTPN22 among women with early onset. Generally, models that considered haplotypes or multiple markers showed stronger evidence for association than did single marker analyses. Genet. Epidemiol. 31 (Suppl. 1):S12,S21, 2007. © 2007 Wiley-Liss, Inc. [source]


    Multistage designs in the genomic era: Providing balance in complex disease studies

    GENETIC EPIDEMIOLOGY, Issue S1 2007
    Marie-Pierre Dubé
    Abstract In this summary paper, we describe the contributions included in the Multistage Design group (Group 14) at the Genetic Analysis Workshop 15, which was held during November 12,14, 2006. Our group contrasted and compared different approaches to reducing complexity in a genetic study through implementation of staged designs. Most groups used the simulated dataset (problem 3), which provided ample opportunities for evaluating various staged designs. A wide range of multistage designs that targeted different aspects of complexity were explored. We categorized these approaches as reducing phenotypic complexity, model complexity, analytic complexity or genetic complexity. In general we learned that: (1) when staged designs are carefully planned and implemented, the power loss compared to a single-stage analysis can be minimized and study cost is greatly reduced; (2) a joint analysis of the results from each stage is generally more powerful than treating the second stage as a replication analysis. Genet. Epidemiol. 31 (Suppl. 1):S118,S123, 2007. © 2007 Wiley-Liss, Inc. [source]


    Multiple testing in the genomics era: Findings from Genetic Analysis Workshop 15, Group 15

    GENETIC EPIDEMIOLOGY, Issue S1 2007
    Lisa J. Martin
    Abstract Recent advances in molecular technologies have resulted in the ability to screen hundreds of thousands of single nucleotide polymorphisms and tens of thousands of gene expression profiles. While these data have the potential to inform investigations into disease etiologies and advance medicine, the question of how to adequately control both type I and type II error rates remains. Genetic Analysis Workshop 15 datasets provided a unique opportunity for participants to evaluate multiple testing strategies applicable to microarray and single nucleotide polymorphism data. The Genetic Analysis Workshop 15 multiple testing and false discovery rate group (Group 15) investigated three general categories for multiple testing corrections, which are summarized in this review: statistical independence, error rate adjustment, and data reduction. We show that while each approach may have certain advantages, adequate error control is largely dependent upon the question under consideration and often requires the use of multiple analytic strategies. Genet. Epidemiol. 31(Suppl. 1):S124,S131, 2007. © 2007 Wiley-Liss, Inc. [source]


    Summary of contributions to GAW15 Group 16: Processing/normalization of expression traits

    GENETIC EPIDEMIOLOGY, Issue S1 2007
    Aurélie Labbe
    Abstract Here, we summarize the contributions to group 16 of Genetic Analysis Workshop 15, held in Florida, U.S.A. The theme of this group was preprocessing of expression quantitative trait loci (eQTL) studies using the Affymetrix platform. The objective of the Genetic Analysis Workshop 15 problem 1 dataset was to use transcript levels that are measured using DNA microarrays as quantitative traits and localize the genes or other features of the DNA that control gene expression by quantitative trait loci linkage analyses. All contributors of this group used the microarray expression profiles (problem 1) data. Various approaches and questions were examined to investigate the effects of preprocessing methods and/or gene filtering on the interpretation of data, specifically on heritability estimates of gene expression and on linkage results. In addition, some contributors focused on the statistical issues involved in large-scale genetic analyses of quantitative traits that account for or build composite phenotypes from a large number of correlated traits. Since the true eQTLs are not known in the problem 1 data, results from the 11 studies cannot be fully evaluated for the methods employed. However, several common trends were found. All reports concluded that preprocessing statistical analyses may have an important impact on eQTL analyses and on the identification of cis -/trans -regulators and/or major biological pathways. Genet. Epidemiol. 31(Suppl. 1):S132,S138, 2007. © 2007 Wiley-Liss, Inc. [source]


    Incorporating covariates in mapping heterogeneous traits: a hierarchical model using empirical Bayes estimation

    GENETIC EPIDEMIOLOGY, Issue 7 2007
    Swati Biswas
    Abstract Complex genetic traits are inherently heterogeneous, i.e., they may be caused by different genes, or non-genetic factors, in different individuals. So, for mapping genes responsible for these diseases using linkage analysis, heterogeneity must be accounted for in the model. Heterogeneity across different families can be modeled using a mixture distribution by letting each family have its own heterogeneity parameter denoting the probability that its disease-causing gene is linked to the marker map under consideration. A substantial gain in power is expected if covariates that can discriminate between the families of linked and unlinked types are incorporated in this modeling framework. To this end, we propose a hierarchical Bayesian model, in which the families are grouped according to various (categorized) levels of covariate(s). The heterogeneity parameters of families within each group are assigned a common prior, whose parameters are further assigned hyper-priors. The hyper-parameters are obtained by utilizing the empirical Bayes estimates. We also address related issues such as evaluating whether the covariate(s) under consideration are informative and grouping of families. We compare the proposed approach with one that does not utilize covariates and show that our approach leads to considerable gains in power to detect linkage and in precision of interval estimates through various simulation scenarios. An application to the asthma datasets of Genetic Analysis Workshop 12 also illustrates this gain in a real data analysis. Additionally, we compare the performances of microsatellite markers and single nucleotide polymorphisms for our approach and find that the latter clearly outperforms the former. Genet. Epidemiol. 2007. © 2007 Wiley-Liss, Inc. [source]


    Comparison of single-nucleotide polymorphisms and microsatellite markers for linkage analysis in the COGA and simulated data sets for Genetic Analysis Workshop 14: Presentation Groups 1, 2, and 3

    GENETIC EPIDEMIOLOGY, Issue S1 2005
    Marsha A. Wilcox
    Abstract The papers in presentation groups 1,3 of Genetic Analysis Workshop 14 (GAW14) compared microsatellite (MS) markers and single-nucleotide polymorphism (SNP) markers for a variety of factors, using multiple methods in both data sets provided to GAW participants. Group 1 focused on data provided from the Collaborative Study on the Genetics of Alcoholism (COGA). Group 2 focused on data simulated for the workshop. Group 3 contained analyses of both data sets. Issues examined included: information content, signal strength, localization of the signal, use of haplotype blocks, population structure, power, type I error, control of type I error, the effect of linkage disequilibrium, and computational challenges. There were several broad resulting observations. 1) Information content was higher for dense SNP marker panels than for MS panels, and dense SNP markers sets appeared to provide slightly higher linkage scores and slightly higher power to detect linkage than MS markers. 2) Dense SNP panels also gave higher type I errors, suggesting that increased test thresholds may be needed to maintain the correct error rate. 3) Dense SNP panels provided better trait localization, but only in the COGA data, in which the MS markers were relatively loosely spaced. 4) The strength of linkage signals did not vary with the density of SNP panels, once the marker density was ,1 SNP/cM. 5) Analyses with SNPs were computationally challenging, and identified areas where improvements in analysis tools will be necessary to make analysis practical for widespread use. Genet. Epidemiol. 29:(Suppl. 1): S7,S28, 2005. © 2005 Wiley-Liss, Inc. [source]


    Linkage mapping methods applied to the COGA data set: Presentation Group 4 of Genetic Analysis Workshop 14

    GENETIC EPIDEMIOLOGY, Issue S1 2005
    E. Warwick Daw
    Abstract Presentation Group 4 participants analyzed the Collaborative Study on the Genetics of Alcoholism data provided for Genetic Analysis Workshop 14. This group examined various aspects of linkage analysis and related issues. Seven papers included linkage analyses, while the eighth calculated identity-by-descent (IBD) probabilities. Six papers analyzed linkage to an alcoholism phenotype: ALDX1 (four papers), ALDX2 (one paper), or a combination both (one paper). Methods used included Bayesian variable selection coupled with Haseman-Elston regression, recursive partitioning to identify phenotype and covariate groupings that interact with evidence for linkage, nonparametric linkage regression modeling, affected sib-pair linkage analysis with discordant sib-pair controls, simulation-based homozygosity mapping in a single pedigree, and application of a propensity score to collapse covariates in a general conditional logistic model. Alcoholism linkage was found with ,2 of these approaches on chromosomes 2, 4, 6, 7, 9, 14, and 21. The remaining linkage paper compared the utility of several single-nucleotide polymorphism (SNP) and microsatellite marker maps for Monte Carlo Markov chain combined oligogenic segregation and linkage analysis, and analyzed one of the electrophysiological endophenotypes, ttth1, on chromosome 7. Linkage was found with all marker sets. The last paper compared the multipoint IBD information content of several SNP sets and the microsatellite set, and found that while all SNP sets examined contained more information than the microsatellite set, most of the information contained in the SNP sets was captured by a subset of the SNP markers with ,1-cM marker spacing. From these papers, we highlight three points: a 1-cM SNP map seems to capture most of the linkage information, so denser maps do not appear necessary; careful and appropriate use of covariates can aid linkage analysis; and sources of increased gene-sharing between relatives should be accounted for in analyses. Genet. Epidemiol. 29(Suppl. 1):S29,S34, 2005. © 2005 Wiley-Liss, Inc. [source]


    Approaches to detecting gene × gene interaction in Genetic Analysis Workshop 14 pedigrees

    GENETIC EPIDEMIOLOGY, Issue S1 2005
    Brion S. Maher
    Abstract Whether driven by the general lack of success in finding single-gene contributions to complex disease, by increased knowledge about the potential involvement of specific biological interactions in complex disease, or by recent dramatic increases in computational power, a large number of approaches to detect locus × locus interactions were recently proposed and implemented. The six Genetic Analysis Workshop 14 (GAW14) papers summarized here each applied either existing or refined approaches with the goal of detecting gene × gene, or locus × locus, interactions in the GAW14 data. Five of six papers analyzed the simulated data; the other analyzed the Collaborative Study on the Genetics of Alcoholism data. The analytic strategies implemented for detecting interactions included multifactor dimensionality reduction, conditional linkage analysis, nonparametric linkage correlation, two-locus parametric linkage analysis, and a joint test of linkage and association. Overall, most of the groups found limited success in consistently detecting all of the simulated interactions due, in large part, to the nature of the generating model. Genet. Epidemiol. 29(Suppl. 1):S116,S119, 2005. © 2005 Wiley-Liss, Inc. [source]


    Parent-of-origin, imprinting, mitochondrial, and X-linked effects in traits related to alcohol dependence: Presentation Group 18 of Genetic Analysis Workshop 14

    GENETIC EPIDEMIOLOGY, Issue S1 2005
    Konstantin Strauch
    Abstract The participants of Presentation Group 18 of Genetic Analysis Workshop 14 analyzed the Collaborative Study on the Genetics of Alcoholism data set to investigate sex-specific effects for phenotypes related to alcohol dependence. In particular, the participants looked at imprinting (which is also known as parent-of-origin effect), differences between recombination fractions for the two sexes, and mitochondrial and X-chromosomal effects. Five of the seven groups employed newly developed or existing methods that take imprinting into account when testing for linkage, or test for imprinting itself. Single-marker and multipoint analyses were performed for microsatellite as well as single-nucleotide polymorphism markers, and several groups used a sex-specific genetic map in addition to a sex-averaged map. Evidence for paternal imprinting (i.e., maternal expression) was consistently obtained by at least two groups at genetic regions on chromosomes 10, 12, and 21 that possibly harbor genes responsible for alcoholism. Evidence for maternal imprinting (which is equivalent to paternal expression) was consistently found at a locus on chromosome 11. Two groups applied extensions of variance components analysis that model a mitochondrial or X-chromosomal effect to latent class variables and electrophysiological traits employed in the diagnosis of alcoholism. The analysis, without using genetic markers, revealed mitochondrial or X-chromosomal effects for several of these traits. Accounting for sex-specific environmental variances appeared to be crucial for the identification of an X-chromosomal factor. In linkage analysis using marker data, modeling a mitochondrial variance component increased the linkage signals obtained for autosomal loci. Genet. Epidemiol. 29(Suppl. 1):S125,S132, 2005. © 2005 Wiley-Liss, Inc. [source]


    Evaluations of maximization procedures for estimating linkage parameters under heterogeneity

    GENETIC EPIDEMIOLOGY, Issue 3 2004
    Swati Biswas
    Abstract Locus heterogeneity is a major problem plaguing the mapping of disease genes responsible for complex genetic traits via linkage analysis. A common feature of several available methods to account for heterogeneity is that they involve maximizing a multidimensional likelihood to obtain maximum likelihood estimates. The high dimensionality of the likelihood surface may be due to multiple heterogeneity (mixing) parameters, linkage parameters, and/or regression coefficients corresponding to multiple covariates. Here, we focus on this nontrivial computational aspect of incorporating heterogeneity by considering several likelihood maximization procedures, including the expectation maximization (EM) algorithm and the stochastic expectation maximization (SEM) algorithm. The wide applicability of these procedures is demonstrated first through a general formulation of accounting for heterogeneity, and then by applying them to two specific formulations. Furthermore, our simulation studies as well as an application to the Genetic Analysis Workshop 12 asthma datasets show that, among other observations, SEM performs better than EM. As an aside, we illustrate a limitation of the popular admixture approach for incorporating heterogeneity, proved elsewhere. We also show how to obtain standard errors (SEs) for EM and SEM estimates, using methods available in the literature. These SEs can then be combined with the corresponding estimates to provide confidence intervals of the parameters. © 2004 Wiley-Liss, Inc. [source]


    Genetic analysis of phenotypes derived from longitudinal data: Presentation Group 1 of Genetic Analysis Workshop 13

    GENETIC EPIDEMIOLOGY, Issue S1 2003
    Konstantin Strauch
    Abstract The participants of Presentation Group 1 used the GAW13 data to derive new phenotypes, which were then analyzed for linkage and, in one case, for association to the genetic markers. Since the trait measurements ranged over longer time periods, the participants looked at the time dependence of particular traits in addition to the trait itself. The phenotypes analyzed with the Framingham data can be roughly divided into 1) body weight-related traits, which also include a type 2 diabetes progression trait, and 2) traits related to systolic blood pressure. Both trait classes are associated with metabolic syndrome. For traits related to body weight, linkage was consistently identified by at least two participating groups to genetic regions on chromosomes 4, 8, 11, and 18. For systolic blood pressure, or its derivatives, at least two groups obtained linkage for regions on chromosomes 4, 6, 8, 11, 14, 16, and 19. Five of the 13 participating groups focused on the simulated data. Due to the rather sparse grid of microsatellite markers, an association analysis for several traits was not successful. Linkage analysis of hypertension and body mass index using LODs and heterogeneity LODs (HLODs) had low power. For the glucose phenotype, a combination of random coefficient regression models and variance component linkage analysis turned out to be strikingly powerful in the identification of a trait locus simulated on chromosome 5. Haseman-Elston regression methods, applied to the same phenotype, had low power, but the above-mentioned chromosome 5 locus was not included in this analysis. Genet Epidemiol 25 (Suppl. 1):S5,S17, 2003. © 2003 Wiley-Liss, Inc. [source]


    Longitudinal data analysis in pedigree studies

    GENETIC EPIDEMIOLOGY, Issue S1 2003
    W. James Gauderman
    Abstract Longitudinal family studies provide a valuable resource for investigating genetic and environmental factors that influence long-term averages and changes over time in a complex trait. This paper summarizes 13 contributions to Genetic Analysis Workshop 13, which include a wide range of methods for genetic analysis of longitudinal data in families. The methods can be grouped into two basic approaches: 1) two-step modeling, in which repeated observations are first reduced to one summary statistic per subject (e.g., a mean or slope), after which this statistic is used in a standard genetic analysis, or 2) joint modeling, in which genetic and longitudinal model parameters are estimated simultaneously in a single analysis. In applications to Framingham Heart Study data, contributors collectively reported evidence for genes that affected trait mean on chromosomes 1, 2, 3, 5, 8, 9, 10, 13, and 17, but most did not find genes affecting slope. Applications to simulated data suggested that even for a gene that only affected slope, use of a mean-type statistic could provide greater power than a slope-type statistic for detecting that gene. We report on the results of a small experiment that sheds some light on this apparently paradoxical finding, and indicate how one might form a more powerful test for finding a slope-affecting gene. Several areas for future research are discussed. Genet Epidemiol 25 (Suppl. 1):S18,S28, 2003. © 2003 Wiley-Liss, Inc. [source]


    Tests of Association for Quantitative Traits in Nuclear Families Using Principal Components to Correct for Population Stratification

    ANNALS OF HUMAN GENETICS, Issue 6 2009
    Lei Zhang
    SUMMARY Traditional transmission disequilibrium test (TDT) based methods for genetic association analyses are robust to population stratification at the cost of a substantial loss of power. We here describe a novel method for family-based association studies that corrects for population stratification with the use of an extension of principal component analysis (PCA). Specifically, we adopt PCA on unrelated parents in each family. We then infer principal components for children from those for their parents through a TDT-like strategy. Two test statistics within the variance-components model are proposed for association tests. Simulation results show that the proposed tests have correct type I error rates regardless of population stratification, and have greatly improved power over two popular TDT-based methods: QTDT and FBAT. The application to the Genetic Analysis Workshop 16 (GAW16) data sets attests to the feasibility of the proposed method. [source]


    A Regression-based Association Test for Case-control Studies that Uses Inferred Ancestral Haplotype Similarity

    ANNALS OF HUMAN GENETICS, Issue 5 2009
    Youfang Liu
    Summary Association methods based on haplotype similarity (HS) can overcome power and stability issues encountered in standard haplotype analyses. Current HS methods can be generally classified into evolutionary and two-sample approaches. We propose a new regression-based HS association method for case-control studies that incorporates covariate information and combines the advantages of the two classes of approaches by using inferred ancestral haplotypes. We first estimate the ancestral haplotypes of case individuals and then, for each individual, an ancestral-haplotype-based similarity score is computed by comparing that individual's observed genotype with the estimated ancestral haplotypes. Trait values are then regressed on the similarity scores. Covariates can easily be incorporated into this regression framework. To account for the bias in the raw p-values due to the use of case data in constructing ancestral haplotypes, as well as to account for variation in ancestral haplotype estimation, a permutation procedure is adopted to obtain empirical p-values. Compared with the standard haplotype score test and the multilocus T2 test, our method improves power when neither the allele frequency nor linkage disequilibrium between the disease locus and its neighboring SNPs is too low and is comparable in other scenarios. We applied our method to the Genetic Analysis Workshop 15 simulated SNP data and successfully pinpointed a stretch of SNPs that covers the fine-scale region where the causal locus is located. [source]