    Potential roles for BMP and Pax genes in the development of iris smooth muscle

    Abbie M. Jensen
    Abstract The embryonic optic cup generates four types of tissue: neural retina, pigmented epithelium, ciliary epithelium, and iris smooth muscle. Remarkably little attention has focused on the development of the iris smooth muscle since Lewis ([1903] J. Am. Anat. 2:405,416) described its origins from the anterior rim of the optic cup neuroepithelium. As an initial step toward understanding iris smooth muscle development, I first determined the spatial and temporal pattern of the development of the iris smooth muscle in the chick by using the HNK1 antibody, which labels developing iris smooth muscle. HNK1 labeling shows that iris smooth muscle development is correlated in time and space with the development of the ciliary epithelial folds. Second, because neural crest is the only other neural tissue that has been shown to generate smooth muscle (Le Lievre and Le Douarin [1975] J. Embryo. Exp. Morphol. 34:125,154), I sought to determine whether iris smooth muscle development shares similarities with neural crest development. Two members of the BMP superfamily, BMP4 and BMP7, which may regulate neural crest development, are highly expressed by cells at the site of iris smooth muscle generation. Third, because humans and mice that are heterozygous for Pax6 mutations have no irides (Hill et al. [1991] Nature 354:522,525; Hanson et al. [1994] Nat. Genet. 6:168,173), I determined the expression of Pax6. I also examined the expression of Pax3 in the developing anterior optic cup. The developing iris smooth muscle coexpresses Pax6 and Pax3. I suggest that some of the eye defects caused by mutations in Pax6, BMP4, and BMP7 may be due to abnormal iris smooth muscle. Developmental Dynamics 232:385,392, 2005. 2004 Wiley-Liss, Inc. [source]

    Population dynamics of the ectomycorrhizal fungal species Tricholoma populinum and Tricholoma scalpturatum associated with black poplar under differing environmental conditions

    Herv Gryta
    Summary Fungi combine sexual reproduction and clonal propagation. The balance between these two reproductive modes affects establishment dynamics, and ultimately the evolutionary potential of populations. The pattern of colonization was studied in two species of ectomycorrhizal fungi: Tricholoma populinum and Tricholoma scalpturatum. The former is considered to be a host specialist whereas T. scalpturatum is a generalist taxon. Fruit bodies of both basidiomycete species were mapped and collected over several years from a black poplar (Populus nigra) stand, at two different sites. Multilocus genotypes (= genets) were identified based on the analysis of random amplified polymorphic DNA (RAPD) patterns, inter-simple sequence repeat (ISSR) patterns and restriction fragment length polymorphisms (RFLPs) in the ribosomal DNA intergenic spacer (rDNA IGS). The genetic analyses revealed differences in local population dynamics between the two species. Tricholoma scalpturatum tended to capture new space through sexual spores whereas T. populinum did this by clonal growth, suggesting trade-offs in allocation of resources at the genet level. Genet numbers and sizes strongly differ between the two study sites, perhaps as a result of abiotic disturbance on mycelial establishment and genet behaviour. [source]

    Molecular Genetic Study on Angelman Syndrome Patients without a Chromosomal Deletion

    EPILEPSIA
    Shinji Saitoh
    Purpose: Angelman syndrome (AS) is a ncurobehavioral disorder characterized by severe mental retardation, easily cvoked laughter, ataxic gait, and epilepsy. Epilepsy associated with AS is characterized by early childhood onset gencralized seizures with profound EEG abnormalities. Therefore, AS is a good human model for genetic epilepsy syndromes. Approximately 70% of AS cases are caused by maternal deletions of chromosomc 15q I I-qI3; whereas, 30% are not associated with a chromosomal dcletion. Thcse non-deletion AS patients are caused by paternal uniparental disomy (UPD), imprinting mutation (IM), or loss-or-function mutations of the UBE3A gene, cach of which predisposes different recurrence risk. To elucidate molecular etiology of non-dclction AS patients, we investigated 34 AS patients without a chromosomal deletion. Methods: Thirty sporadic AS patients, and 4 familial AS patients (2 families of 2 sibs) were enrolled to the study. The diagnosis of AS was based on Williams' criteria (Williams et al., Am J Med Genet 1995, 56: 237). Genomic DNA was extracted from peripheral blood by a standard procedure. DNA mcthylation tcst at SNRPN locus and genotyping using 7 highly informative PCR-based polymorphisms within 15q I I - q I3 were carried out to identify UPD and IM. When both UPD and IM were ruled out, the patients were classified :LS non-UPD, non-IM. For thcsc non-UPD, non-1M paticnts, UBE3A mutations were screened by PCR-SSCP analysis using 10 sets ofprimcrs covering all coding exons. Results: Among 30 sporadic patients, I UPD and 3 IM patients were identified, and the remaining 26 patients were classified as non-UPD, non-IM. Among 4 familial patients, 2 sibs from I family were detected as IM, whcrcas 2 sibs from another family were classified as non-UPD, non-IM. No UBE3A mutations were identified within 26 sporadic and 2 familial non-UPD, non-IM patients. Conclusion: Threc molecular classes were identified for noindeletion AS patients. Therefore, the underlying genetic mechanism was dcmonstratcd to be complex for AS patients without a chromosomal deletion. Combination of the DNA methylation test and PCR-based polymorphisms was sufficient to detect UPD and IM patients. Because recurrence risk is low for UPD and high lor IM, systematic molecular investigation including the DNA methylation test and PCR-based polymorphisms should bc donc for non-delction AS paticnts for genetic counscling purpose. A majority of non-deletion patients were classified as noii-UPD, non-1M. Although, approximate 30% of non-UPD, nonIM patients arc rcportcd to have UBE3A mutations, no such mutations were identified in our study. An underlying molecular mechanism was not rcvealcd for this group of patients, and therefore, assessment of recurrence risk was difficult. Further investigation is necessary for noii-UPD, non-1M paticnts. [source]

    Imputation aware meta-analysis of genome-wide association studies

    Noah Zaitlen
    Abstract Genome-wide association studies have recently identified many new loci associated with human complex diseases. These newly discovered variants typically have weak effects requiring studies with large numbers of individuals to achieve the statistical power necessary to identify them. Likely, there exist even more associated variants, which remain to be found if even larger association studies can be assembled. Meta-analysis provides a straightforward means of increasing study sample sizes without collecting new samples by combining existing data sets. One obstacle to combining studies is that they are often performed on platforms with different marker sets. Current studies overcome this issue by imputing genotypes missing from each of the studies and then performing standard meta-analysis techniques. We show that this approach may result in a loss of power since errors in imputation are not accounted for. We present a new method for performing meta-analysis over imputed single nucleotide polymorphisms, show that it is optimal with respect to power, and discuss practical implementation issues. Through simulation experiments, we show that our imputation aware meta-analysis approach outperforms or matches standard meta-analysis approaches. Genet. Epidemiol. 34: 537,542, 2010. 2010 Wiley-Liss, Inc. [source]

    Estimation of P -value of MAX test with double triangle diagram for 2 3 SNP case-control tables

    Katsura Hirosawa
    Abstract Single nucle otide polymorphisms (SNPs) are the most popular markers in genetic epidemiology. Multiple tests have been applied to evaluate genetic effect of SNPs, such as Pearson's test with two degrees of freedom, three tests with one degree of freedom (,2 tests for dominant and recessive modes and Cockran-Armitage trend test for additive mode) as well as MAX3 test and MAX test, which are combination of four tests mentioned earlier. Because MAX test is a combination of Pearson's test of two degrees of freedom and two tests of one degree of freedom, the probability density function (pdf) of MAX statistics does not match pdf of ,2 distribution of either one or two degrees of freedom. In order to calculate P -value of MAX test, we introduced a new diagram, Double Triangle Diagram, which was an extension of de Finetti diagram in population genetics which characterized all of the tests for 2 3 tables. In the diagram the contour lines of MAX statistics were consisted of elliptic curves and two tangent lines to the ellipses in the space. We normalized the ellipses into regular circles and expressed P -value of MAX test in an integral form. Although a part of the integral was not analytically solvable, it was calculable with arbitrary accuracy by dividing the area under pdf into finite rectangles. We confirmed that P -values from our method took uniform distribution from 0 to 1 in three example marginal count sets and concluded that our method was appropriate to give P -value of MAX test for 2 3 tables. Genet. Epidemiol. 34:543,551, 2010. 2010 Wiley-Liss, Inc. [source]

    Effect of including environmental data in investigations of gene-disease associations in the presence of qualitative interactions

    Elizabeth Williamson
    Abstract Complex diseases are likely to be caused by the interplay of genetic and environmental factors. Despite this, gene-disease associations are frequently investigated using models that focus solely on a marginal gene effect, ignoring environmental factors entirely. Failing to take into account a gene-environment interaction can weaken the apparent gene-disease association, leading to loss in statistical power and, potentially, inability to identify genuine risk factors. If a gene-environment interaction exists, therefore, a joint analysis allowing the effect of the gene to differ between groups defined by the environmental exposure can have greater statistical power than a marginal gene-disease model. However, environmental data are subject to measurement error. Substantial losses in statistical power for detecting gene-environment interactions can arise from measurement error in the environmental exposure. It is unclear, however, what effect measurement error may have on the power of the joint analysis. We consider the potential benefits, in terms of statistical power, of collecting concurrent environmental data within large cohorts in order to enhance gene detection. We further consider whether these benefits remain in the presence of misclassification in both the gene and the environmental exposure. We find that when an effect of the gene is apparent only in the presence of the environmental exposure, the joint analysis has greater power than a marginal gene-disease analysis. This comparative increase in power remains in the presence of likely levels of misclassification of either the gene or environmental exposure. Genet. Epidemiol. 34:552,560, 2010. 2010 Wiley-Liss, Inc. [source]

    Genome-wide association studies using haplotype clustering with a new haplotype similarity

    Lina Jin
    Abstract Association analysis, with the aim of investigating genetic variations, is designed to detect genetic associations with observable traits, which has played an increasing part in understanding the genetic basis of diseases. Among these methods, haplotype-based association studies are believed to possess prominent advantages, especially for the rare diseases in case-control studies. However, when modeling these haplotypes, they are subjected to statistical problems caused by rare haplotypes. Fortunately, haplotype clustering offers an appealing solution. In this research, we have developed a new befitting haplotype similarity for "affinity propagation" clustering algorithm, which can account for the rare haplotypes primely, so as to control for the issue on degrees of freedom. The new similarity can incorporate haplotype structure information, which is believed to enhance the power and provide high resolution for identifying associations between genetic variants and disease. Our simulation studies show that the proposed approach offers merits in detecting disease-marker associations in comparison with the cladistic haplotype clustering method CLADHC. We also illustrate an application of our method to cystic fibrosis, which shows quite accurate estimates during fine mapping. Genet. Epidemiol. 34: 633,641, 2010. 2010 Wiley-Liss, Inc. [source]

    A statistical method for scanning the genome for regions with rare disease alleles

    Article first published online: 21 JUN 2010
    Abstract Studying the role of rare alleles in common disease has been prevented by the impractical task of determining the DNA sequence of large numbers of individuals. Next-generation DNA sequencing technologies are being developed that will make it possible for genetic studies of common disease to study the full frequency spectrum of genetic variation, including rare alleles. This report describes a method for scanning the genome for disease susceptibility regions that show an increased number of rare alleles among a sample of disease cases versus an ethnically matched sample of controls. The method was based on a hidden Markov model and the statistical support for a disease susceptibility region characterized by rare alleles was measured by a likelihood ratio statistic. Due to the lack of empirical data, the method was evaluated through simulation. The performance of the method was tested under the null and alternative hypotheses under a range of sequence generating and hidden Markov models parameters. The results showed that the statistical method performs well at identifying true disease susceptibility regions and that performance was primarily affected by the amount of variation in the neutral sequence and the number of rare disease alleles found in the disease susceptibility region. Genet. Epidemiol. 34: 386,395, 2010. 2010 Wiley-Liss, Inc. [source]

    Detection of SNP-SNP interactions in trios of parents with schizophrenic children

    Qing Li
    Abstract Schizophrenia (SZ) is a heritable and complex psychiatric disorder with an estimated worldwide prevalence of about 1%. Research on the risk factors for SZ has thus far yielded few clues to causes, but has pointed to a heterogeneous etiology that likely involves multiple genes and gene-environment interactions. In this manuscript, we apply a novel method (trio logic regression, Li et al., 2009) to case-parent trio data from a SZ candidate gene study conducted on families of Ashkenazi Jewish descent, and demonstrate the method's ability to detect multi-gene models for SZ risk in the family-based design. In particular, we demonstrate how this method revealed a genotype-phenotype association that includes an allele without marginal effect. Genet. Epidemiol. 34: 396,406, 2010. 2010 Wiley-Liss, Inc. [source]

    Using evidence for population stratification bias in combined individual- and family-level genetic association analyses of quantitative traits

    Lucia Mirea
    Abstract Genetic association studies are generally performed either by examining differences in the genotype distribution between individuals or by testing for preferential allele transmission within families. In the absence of population stratification bias (PSB), integrated analyses of individual and family data can increase power to identify susceptibility loci [Abecasis et al., 2000. Am. J. Hum. Genet. 66:279,292; Chen and Lin, 2008. Genet. Epidemiol. 32:520,527; Epstein et al., 2005. Am. J. Hum. Genet. 76:592,608]. In existing methods, the presence of PSB is initially assessed by comparing results from between-individual and within-family analyses, and then combined analyses are performed only if no significant PSB is detected. However, this strategy requires specification of an arbitrary testing level ,PSB, typically 5%, to declare PSB significance. As a novel alternative, we propose to directly use the PSB evidence in weights that combine results from between-individual and within-family analyses. The weighted approach generalizes previous methods by using a continuous weighting function that depends only on the observed P -value instead of a binary weight that depends on ,PSB. Using simulations, we demonstrate that for quantitative trait analysis, the weighted approach provides a good compromise between type I error control and power to detect association in studies with few genotyped markers and limited information regarding population structure. Genet. Epidemiol. 34: 502,511, 2010. 2010 Wiley-Liss, Inc. [source]

    Modeling maternal-offspring gene-gene interactions: the extended-MFG test

    Erica J. Childs
    Abstract Maternal-fetal genotype (MFG) incompatibility is an interaction between the genes of a mother and offspring at a particular locus that adversely affects the developing fetus, thereby increasing susceptibility to disease. Statistical methods for examining MFG incompatibility as a disease risk factor have been developed for nuclear families. Because families collected as part of a study can be large and complex, containing multiple generations and marriage loops, we create the Extended-MFG (EMFG) Test, a model-based likelihood approach, to allow for arbitrary family structures. We modify the MFG test by replacing the nuclear-family based "mating type" approach with Ott's representation of a pedigree likelihood and calculating MFG incompatibility along with the Mendelian transmission probability. In order to allow for extension to arbitrary family structures, we make a slightly more stringent assumption of random mating with respect to the locus of interest. Simulations show that the EMFG test has appropriate type-I error rate, power, and precise parameter estimation when random mating holds. Our simulations and real data example illustrate that the chief advantages of the EMFG test over the earlier nuclear family version of the MFG test are improved accuracy of parameter estimation and power gains in the presence of missing genotypes. Genet. Epidemiol. 34: 512,521, 2010. 2010 Wiley-Liss, Inc. [source]

    Detecting interacting genetic loci with effects on quantitative traits where the nature and order of the interaction are unknown

    Joanna L. Davies
    Abstract Standard techniques for single marker quantitative trait mapping perform poorly in detecting complex interacting genetic influences. When a genetic marker interacts with other genetic markers and/or environmental factors to influence a quantitative trait, a sample of individuals will show different effects according to their exposure to other interacting factors. This paper presents a Bayesian mixture model, which effectively models heterogeneous genetic effects apparent at a single marker. We compute approximate Bayes factors which provide an efficient strategy for screening genetic markers (genome-wide) for evidence of a heterogeneous effect on a quantitative trait. We present a simulation study which demonstrates that the approximation is good and provide a real data example which identifies a population-specific genetic effect on gene expression in the HapMap CEU and YRI populations. We advocate the use of the model as a strategy for identifying candidate interacting markers without any knowledge of the nature or order of the interaction. The source of heterogeneity can be modeled as an extension. Genet. Epidemiol. 34: 299,308, 2010. 2009 Wiley-Liss, Inc. [source]

    Sibship analysis of associations between SNP haplotypes and a continuous trait with application to mammographic density

    J. Stone
    Abstract Haplotype-based association studies have been proposed as a powerful comprehensive approach to identify causal genetic variation underlying complex diseases. Data comparisons within families offer the additional advantage of dealing naturally with complex sources of noise, confounding and population stratification. Two problems encountered when investigating associations between haplotypes and a continuous trait using data from sibships are (i) the need to define within-sibship comparisons for sibships of size greater than two and (ii) the difficulty of resolving the joint distribution of haplotype pairs within sibships in the absence of parental genotypes. We therefore propose first a method of orthogonal transformation of both outcomes and exposures that allow the decomposition of between- and within-sibship regression effects when sibship size is greater than two. We conducted a simulation study, which confirmed analysis using all members of a sibship is statistically more powerful than methods based on cross-sectional analysis or using subsets of sib-pairs. Second, we propose a simple permutation approach to avoid errors of inference due to the within-sibship correlation of any errors in haplotype assignment. These methods were applied to investigate the association between mammographic density (MD), a continuously distributed and heritable risk factor for breast cancer, and single nucleotide polymorphisms (SNPs) and haplotypes from the VDR gene using data from a study of 430 twins and sisters. We found evidence of association between MD and a 4-SNP VDR haplotype. In conclusion, our proposed method retains the benefits of the between- and within-pair analysis for pairs of siblings and can be implemented in standard software. Genet. Epidemiol. 34: 309,318, 2010. 2009 Wiley-Liss, Inc. [source]

    Association tests using kernel-based measures of multi-locus genotype similarity between individuals

    Indranil Mukhopadhyay
    Abstract In a genetic association study, it is often desirable to perform an overall test of whether any or all single-nucleotide polymorphisms (SNPs) in a gene are associated with a phenotype. Several such tests exist, but most of them are powerful only under very specific assumptions about the genetic effects of the individual SNPs. In addition, some of the existing tests assume that the direction of the effect of each SNP is known, which is a highly unlikely scenario. Here, we propose a new kernel-based association test of joint association of several SNPs. Our test is non-parametric and robust, and does not make any assumption about the directions of individual SNP effects. It can be used to test multiple correlated SNPs within a gene and can also be used to test independent SNPs or genes in a biological pathway. Our test uses an analysis of variance paradigm to compare variation between cases and controls to the variation within the groups. The variation is measured using kernel functions for each marker, and then a composite statistic is constructed to combine the markers into a single test. We present simulation results comparing our statistic to the U -statistic-based method by Schaid et al. ([2005] Am. J. Hum. Genet. 76:780,793) and another statistic by Wessel and Schork ([2006] Am. J. Hum. Genet. 79:792,806). We consider a variety of different disease models and assumptions about how many SNPs within the gene are actually associated with disease. Our results indicate that our statistic has higher power than other statistics under most realistic conditions. Genet. Epidemiol. 34: 213,221, 2010. 2009 Wiley-Liss, Inc. [source]

    Gene, region and pathway level analyses in whole-genome studies

    Omar De la Cruz
    Abstract In the setting of genome-wide association studies, we propose a method for assigning a measure of significance to pre-defined sets of markers in the genome. The sets can be genes, conserved regions, or groups of genes such as pathways. Using the proposed methods and algorithms, evidence for association between a particular functional unit and a disease status can be obtained not just by the presence of a strong signal from a SNP within it, but also by the combination of several simultaneous weaker signals that are not strongly correlated. This approach has several advantages. First, moderately strong signals from different SNPs are combined to obtain a much stronger signal for the set, therefore increasing power. Second, in combination with methods that provide information on untyped markers, it leads to results that can be readily combined across studies and platforms that might use different SNPs. Third, the results are easy to interpret, since they refer to functional sets of markers that are likely to behave as a unit in their phenotypic effect. Finally, the availability of gene-level P -values for association is the first step in developing methods that integrate information from pathways and networks with genome-wide association data, and these can lead to a better understanding of the complex traits genetic architecture. The power of the approach is investigated in simulated and real datasets. Novel Crohn's disease associations are found using the WTCCC data. Genet. Epidemiol. 34: 222,231, 2010. 2009 Wiley-Liss, Inc. [source]

    Assessment of SNP streak statistics using gene drop simulation with linkage disequilibrium

    Article first published online: 6 JUL 200
    Abstract We describe methods and programs for simulating the genotypes of individuals in a pedigree at large numbers of linked loci when the alleles of the founders are under linkage disequilibrium. Both simulation and estimation of linkage disequilibrium models are shown to be feasible on a genome wide scale. The methods are applied to evaluate the statistical significance of streaks of loci at which sets of related individuals share a common allele. The effects of properly allowing for linkage disequilibrium are shown to be important as they explain many of the large observations. This is illustrated by reanalysis of a previously reported linkage of prostate cancer to chromosome 1p23. Genet. Epidemiol. 34: 119,124, 2010. 2009 Wiley-Liss, Inc. [source]

    Methods for detecting interactions between genetic polymorphisms and prenatal environment exposure with a mother-child design

    Shuang Wang
    Abstract Prenatal exposures such as polycyclic aromatic hydrocarbons and early postnatal environmental exposures are of particular concern because of the heightened susceptibility of the fetus and infant to diverse environmental pollutants. Marked inter-individual variation in response to the same level of exposure was observed in both mothers and their newborns, indicating that susceptibility might be due to genetic factors. With the mother-child pair design, existing methods developed for parent-child trio data or random sample data are either not applicable or not designed to optimally use the information. To take full advantage of this unique design, which provides partial information on genetic transmission and has both maternal and newborn outcome status collected, we developed a likelihood-based method that uses both the maternal and the newborn information together and jointly models gene-environment interactions on maternal and newborn outcomes. Through intensive simulation studies, the proposed method has demonstrated much improved power in detecting gene-environment interactions. The application on a real mother-child pair data from a study conducted in Krakow, Poland, suggested four significant gene-environment interactions after multiple comparisons adjustment. Genet. Epidemiol. 34: 125,132, 2010. 2009 Wiley-Liss, Inc. [source]

    Association test of multiallelic gene copy numbers in family trios

    Sadeep Shrestha
    Abstract While recent genomic surveys reveal growing numbers of di-allelic copy number variations, it is genes with multiallelic (>2) copy numbers that have shown association with distinct phenotypes. Current high-throughput laboratory methods are restricted to enumerating total gene copy numbers (GCNs) per individual and not the "genotype," i.e. gene copy per chromosome. Thus, association studies of multiallelic GCNs have been limited to comparison of median copies in different groups. Our new nonparametric statistical approach is based on GCN information within a trio-based study design. We present theoretical derivation of the statistics and results of simulation studies that show robustness of our approach and power under several genetic models. Genet. Epidemiol. 34:2,6, 2010. 2009 Wiley-Liss, Inc. [source]

    Case-only genome-wide interaction study of disease risk, prognosis and treatment

    Brandon L. Pierce
    Abstract Case-control genome-wide association (GWA) studies have facilitated the identification of susceptibility loci for many complex diseases; however, these studies are often not adequately powered to detect gene-environment (GE) and gene-gene (GG) interactions. Case-only studies are more efficient than case-control studies for detecting interactions and require no data on control subjects. In this article, we discuss the concept and utility of the case-only genome-wide interaction (COGWI) study, in which common genetic variants, measured genome-wide, are screened for association with environmental exposures or genetic variants of interest. An observed G-E (or G-G) association, as measured by the case-only odds ratio (OR), suggests interaction, but only if the interacting factors are unassociated in the population from which the cases were drawn. The case-only OR is equivalent to the interaction risk ratio. In addition to risk-related interactions, we discuss how the COGWI design can be used to efficiently detect GG, GE and pharmacogenetic interactions related to disease outcomes in the context of observational clinical studies or randomized clinical trials. Such studies can be conducted using only data on individuals experiencing an outcome of interest or individuals not experiencing the outcome of interest. Sharing data among GWA and COGWI studies of disease risk and outcome can further enhance efficiency. Sample size requirements for COGWI studies, as compared to case-control GWA studies, are provided. In the current era of genome-wide analyses, the COGWI design is an efficient and straightforward method for detecting GG, GE and pharmacogenetic interactions related to disease risk, prognosis and treatment response. Genet. Epidemiol. 34:7,15, 2010. 2009 Wiley-Liss, Inc. [source]

    Case-control association testing in the presence of unknown relationships

    Yoonha Choi
    Abstract Genome-wide association studies result in inflated false-positive results when unrecognized cryptic relatedness exists. A number of methods have been proposed for testing association between markers and disease with a correction for known pedigree-based relationships. However, in most case-control studies, relationships are generally unknown, yet the design is predicated on the assumption of at least ancestral relatedness among cases. Here, we focus on adjusting cryptic relatedness when the genealogy of the sample is unknown, particularly in the context of samples from isolated populations where cryptic relatedness may be problematic. We estimate cryptic relatedness using maximum-likelihood methods and use a corrected ,2 test with estimated kinship coefficients for testing in the context of unknown cryptic relatedness. Estimated kinship coefficients characterize precisely the relatedness between truly related people, but are biased for unrelated pairs. The proposed test substantially reduces spurious positive results, producing a uniform null distribution of P -values. Especially with missing pedigree information, estimated kinship coefficients can still be used to correct non-independence among individuals. The corrected test was applied to real data sets from genetic isolates and created a distribution of P -value that was close to uniform. Thus, the proposed test corrects the non-uniform distribution of P -values obtained with the uncorrected test and illustrates the advantage of the approach on real data. Genet. Epidemiol. 33:668,678, 2009. 2009 Wiley-Liss, Inc. [source]

    A propensity score approach to correction for bias due to population stratification using genetic and non-genetic factors

    Huaqing Zhao
    Abstract Confounding due to population stratification (PS) arises when differences in both allele and disease frequencies exist in a population of mixed racial/ethnic subpopulations. Genomic control, structured association, principal components analysis (PCA), and multidimensional scaling (MDS) approaches have been proposed to address this bias using genetic markers. However, confounding due to PS can also be due to non-genetic factors. Propensity scores are widely used to address confounding in observational studies but have not been adapted to deal with PS in genetic association studies. We propose a genomic propensity score (GPS) approach to correct for bias due to PS that considers both genetic and non-genetic factors. We compare the GPS method with PCA and MDS using simulation studies. Our results show that GPS can adequately adjust and consistently correct for bias due to PS. Under no/mild, moderate, and severe PS, GPS yielded estimated with bias close to 0 (mean=,0.0044, standard error=0.0087). Under moderate or severe PS, the GPS method consistently outperforms the PCA method in terms of bias, coverage probability (CP), and type I error. Under moderate PS, the GPS method consistently outperforms the MDS method in terms of CP. PCA maintains relatively high power compared to both MDS and GPS methods under the simulated situations. GPS and MDS are comparable in terms of statistical properties such as bias, type I error, and power. The GPS method provides a novel and robust tool for obtaining less-biased estimates of genetic associations that can consider both genetic and non-genetic factors. Genet. Epidemiol. 33:679,690, 2009. 2009 Wiley-Liss, Inc. [source]

    Genome-wide association studies for discrete traits

    Duncan C. Thomas
    Abstract Genome-wide association studies of discrete traits generally use simple methods of analysis based on ,2 tests for contingency tables or logistic regression, at least for an initial scan of the entire genome. Nevertheless, more power might be obtained by using various methods that analyze multiple markers in combination. Methods based on sliding windows, wavelets, Bayesian shrinkage, or penalized likelihood methods, among others, were explored by various participants of Genetic Analysis Workshop 16 Group 1 to combine information across multiple markers within a region, while others used Bayesian variable selection methods for genome-wide multivariate analyses of all markers simultaneously. Imputation can be used to fill in missing markers on individual subjects within a study or in a meta-analysis of studies using different panels. Although multiple imputation theoretically should give more robust tests of association, one participant contribution found little difference between results of single and multiple imputation. Careful control of population stratification is essential, and two contributions found that previously reported associations with two genes disappeared after more precise control. Other issues considered by this group included subgroup analysis, gene-gene interactions, and the use of biomarkers. Genet. Epidemiol. 33 (Suppl. 1):S8,S12, 2009. 2009 Wiley-Liss, Inc. [source]

    Genome-wide association analyses of quantitative traits: the GAW16 experience

    Article first published online: 18 NOV 200
    Abstract The group that formed on the theme of genome-wide association analyses of quantitative traits (Group 2) in the Genetic Analysis Workshop 16 comprised eight sets of investigators. Three data sets were available: one on autoantibodies related to rheumatoid arthritis provided by the North American Rheumatoid Arthritis Consortium; the second on anthropometric, lipid, and biochemical measures provided by the Framingham Heart Study (FHS); and the third a simulated data set modeled after FHS. The different investigators in the group addressed a large set of statistical challenges and applied a wide spectrum of association methods in analyzing quantitative traits at the genome-wide level. While some previously reported genes were validated, some novel chromosomal regions provided significant evidence of association in multiple contributions in the group. In this report, we discuss the different strategies explored by the different investigators with the common goal of improving the power to detect association. Genet. Epidemiol. 33 (Suppl. 1):S13,S18, 2009. 2009 Wiley-Liss, Inc. [source]

    Multistage analysis strategies for genome-wide association studies: summary of group 3 contributions to Genetic Analysis Workshop 16

    Rosalind J. Neuman
    Abstract This contribution summarizes the work done by six independent teams of investigators to identify the genetic and non-genetic variants that work together or independently to predispose to disease. The theme addressed in these studies is multistage strategies in the context of genome-wide association studies (GWAS). The work performed comes from Group 3 of the Genetic Analysis Workshop 16 held in St. Louis, Missouri in September 2008. These six studies represent a diversity of multistage methods of which five are applied to the North American Rheumatoid Arthritis Consortium rheumatoid arthritis case-control data, and one method is applied to the low-density lipoprotein phenotype in the Framingham Heart Study simulated data. In the first stage of analyses, the majority of studies used a variety of screening techniques to reduce the noise of single-nucleotide polymorphisms purportedly not involved in the phenotype of interest. Three studies analyzed the data using penalized regression models, either LASSO or the elastic net. The main result was a reconfirmation of the involvement of variants in the HLA region on chromosome 6 with rheumatoid arthritis. The hope is that the intense computational methods highlighted in this group of papers will become useful tools in future GWAS. Genet. Epidemiol. 33 (Suppl. 1):S19,S23, 2009. 2009 Wiley-Liss, Inc. [source]

    Use of longitudinal data in genetic studies in the genome-wide association studies era: summary of Group 14

    Berit Kerner
    Abstract Participants analyzed actual and simulated longitudinal data from the Framingham Heart Study for various metabolic and cardiovascular traits. The genetic information incorporated into these investigations ranged from selected single-nucleotide polymorphisms to genome-wide association arrays. Genotypes were incorporated using a broad range of methodological approaches including conditional logistic regression, linear mixed models, generalized estimating equations, linear growth curve estimation, growth modeling, growth mixture modeling, population attributable risk fraction based on survival functions under the proportional hazards models, and multivariate adaptive splines for the analysis of longitudinal data. The specific scientific questions addressed by these different approaches also varied, ranging from a more precise definition of the phenotype, bias reduction in control selection, estimation of effect sizes and genotype associated risk, to direct incorporation of genetic data into longitudinal modeling approaches and the exploration of population heterogeneity with regard to longitudinal trajectories. The group reached several overall conclusions: (1) The additional information provided by longitudinal data may be useful in genetic analyses. (2) The precision of the phenotype definition as well as control selection in nested designs may be improved, especially if traits demonstrate a trend over time or have strong age-of-onset effects. (3) Analyzing genetic data stratified for high-risk subgroups defined by a unique development over time could be useful for the detection of rare mutations in common multifactorial diseases. (4) Estimation of the population impact of genomic risk variants could be more precise. The challenges and computational complexity demanded by genome-wide single-nucleotide polymorphism data were also discussed. Genet. Epidemiol. 33 (Suppl. 1):S93,S98, 2009. 2009 Wiley-Liss, Inc. [source]

    Summary of contributions to GAW Group 15: family-based samples are useful in identifying common polymorphisms associated with complex traits

    Stacey Knight
    Abstract Traditionally, family-based samples have been used for genetic analyses of single-gene traits caused by rare but highly penetrant risk variants. The utility of family-based genetic data for analyzing common complex traits is unclear and contains numerous challenges. To assess the utility as well as to address these challenges, members of Genetic Analysis Workshop 16 Group 15 analyzed Framingham Heart Study data using family-based designs ranging from parent,offspring trios to large pedigrees. We investigated different methods including traditional linkage tests, family-based association tests, and population-based tests that correct for relatedness between subjects, and tests to detect parent-of-origin effects. The analyses presented an assortment of positive findings. One contribution found increased power to detect epistatic effects through linkage using ascertainment of sibships based on extreme quantitative values or presence of disease associated with the quantitative value. Another contribution found four single-nucleotide polymorphisms (SNPs) showing a maternal effect, two SNPs with an imprinting effect, and one SNP having both effects on a binary high blood pressure trait. Finally, three contributions illustrated the advantage of using population-based methods to detect association to complex binary or quantitative traits. Our findings highlight the contribution of family-based samples to the genetic dissection of complex traits. Genet. Epidemiol. 33 (Suppl. 1):S99,S104, 2009. 2009 Wiley-Liss, Inc. [source]

    Adapting the logical basis of tests for Hardy-Weinberg Equilibrium to the real needs of association studies in human and medical genetics

    Katrina A. B. Goddard
    Abstract The standard procedure to assess genetic equilibrium is a ,2 test of goodness-of-fit. As is the case with any statistical procedure of that type, the null hypothesis is that the distribution underlying the data is in agreement with the model. Thus, a significant result indicates incompatibility of the observed data with the model, which is clearly at variance with the aim in the majority of applications: to exclude the existence of gross violations of the equilibrium condition. In current practice, we try to avoid this basic logical difficulty by increasing the significance bound to the P -value (e.g. from 5 to 10%) and inferring compatibility of the data with Hardy Weinberg Equilibrium (HWE) from an insignificant result. Unfortunately, such direct inversion of a statistical testing procedure fails to produce a valid test of the hypothesis of interest, namely, that the data are in sufficiently good agreement with the model under which the P -value is calculated. We present a logically unflawed solution to the problem of establishing (approximate) compatibility of an observed genotype distribution with HWE. The test is available in one- and two-sided versions. For both versions, we provide tools for exact power calculation. We demonstrate the merits of the new approach through comparison with the traditional ,2 goodness-of-fit test in 260 genotype distributions from 43 published genetic studies of complex diseases where departure from HWE was noted in either the case or control sample. In addition, we show that the new test is useful for the analysis of genome-wide association studies. Genet. Epidemiol. 33:569,580, 2009. 2009 Wiley-Liss, Inc. [source]

    STrengthening the REporting of Genetic Association Studies (STREGA),an extension of the STROBE statement,

    Julian Little
    Abstract Making sense of rapidly evolving evidence on genetic associations is crucial to making genuine advances in human genomics and the eventual integration of this information in the practice of medicine and public health. Assessment of the strengths and weaknesses of this evidence, and hence the ability to synthesize it, has been limited by inadequate reporting of results. The STrengthening the REporting of Genetic Association studies (STREGA) initiative builds on the STrengthening the Reporting of OBservational Studies in Epidemiology (STROBE) Statement and provides additions to 12 of the 22 items on the STROBE checklist. The additions concern population stratification, genotyping errors, modelling haplotype variation, Hardy-Weinberg equilibrium, replication, selection of participants, rationale for choice of genes and variants, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data, and the volume of data issues that are important to consider in genetic association studies. The STREGA recommendations do not prescribe or dictate how a genetic association study should be designed but seek to enhance the transparency of its reporting, regardless of choices made during design, conduct, or analysis. Genet. Epidemiol. 33:581,598, 2009. 2009 Wiley-Liss, Inc. [source]

    Replication of genetic associations as pseudoreplication due to shared genealogy

    Noah A. Rosenberg
    Abstract The genotypes of individuals in replicate genetic association studies have some level of correlation due to shared descent in the complete pedigree of all living humans. As a result of this genealogical sharing, replicate studies that search for genotype-phenotype associations using linkage disequilibrium between marker loci and disease-susceptibility loci can be considered as "pseudoreplicates" rather than true replicates. We examine the size of the pseudoreplication effect in association studies simulated from evolutionary models of the history of a population, evaluating the excess probability that both of a pair of studies detect a disease association compared to the probability expected under the assumption that the two studies are independent. Each of nine combinations of a demographic model and a penetrance model leads to a detectable pseudoreplication effect, suggesting that the degree of support that can be attributed to a replicated genetic association result is less than that which can be attributed to a replicated result in a context of true independence. Genet. Epidemiol. 33:479,487, 2009. 2009 Wiley-Liss, Inc. [source]

    SNP selection and multidimensional scaling to quantify population structure

    Kelci Miclaus
    Abstract In the new era of large-scale collaborative Genome Wide Association Studies (GWAS), population stratification has become a critical issue that must be addressed. In order to build upon the methods developed to control the confounding effect of a structured population, it is extremely important to visualize and quantify that effect. In this work, we develop methodology for single nucleotide polymorphism (SNP) selection and subsequent population stratification visualization based on deviation from Hardy-Weinberg equilibrium in conjunction with non-metric multidimensional scaling (MDS); a distance-based multivariate technique. Through simulation, it is shown that SNP selection based on Hardy-Weinberg disequilibrium (HWD) is robust against confounding linkage disequilibrium patterns that have been problematic in past studies and methods as well as producing a differentiated SNP set. Non-metric MDS is shown to be a multivariate visualization tool preferable to principal components in conjunction with HWD SNP selection through theoretical and empirical study from HapMap samples. The proposed selection tool offers a simple and effective way to select appropriate substructure-informative markers for use in exploring the effect that population stratification may have in association studies. Genet. Epidemiol. 33:488,496, 2009. 2009 Wiley-Liss, Inc. [source]