Home About us Contact | |||
Gene-environment Interactions (gene-environment + interaction)
Selected AbstractsGene-Environment Interaction in Patterns of Adolescent Drinking: Regional Residency Moderates Longitudinal Influences on Alcohol UseALCOHOLISM, Issue 5 2001Richard J. Rose Background: Drinking frequency escalates rapidly during adolescence. Abstinence declines markedly, and drinking monthly or more often becomes normative. Individual differences in adolescent drinking patterns are large, and some patterns are predictive of subsequent drinking problems; little, however, is known of the gene,environment interactions that create them. Methods: Five consecutive and complete birth cohorts of Finnish twins, born 1975,1979, were enrolled sequentially into a longitudinal study and assessed, with postal questionnaires, at ages 16, 17, and 18.5 years. The sample included 1786 same-sex twin pairs, of whom 1240 pairs were concordantly drinking at age 16. Maximum likelihood models were fit in longitudinal analyses of the three waves of drinking data to assess changes in genetic and environmental influences on alcohol use across adolescence. Secondary analyses contrasted twin pairs residing in rural versus those in urban environments to investigate gene,environment interactions. Results: Longitudinal analyses revealed that genetic factors influencing drinking patterns increased in importance across the 30-month period, and effects arising from common environmental influences declined. Distributions of drinking frequencies in twins residing in urban and rural environments were highly similar, but influences on drinking varied between the two environments. Genetic factors assumed a larger role among adolescents residing in urban areas, while common environmental influences were more important in rural settings. Formal modeling of the data established a significant gene,environment interaction. Conclusions: The results document the changing impact of genetic and environmental influences on alcohol use across adolescence. Importantly, the results also reveal a significant gene,environment interaction in patterns of adolescent drinking and invite more detailed analyses of the pathways and mechanisms by which environments modulate genetic effects. [source] Effect of Gene-environment Interactions on Mental Development in African American, Dominican, and Caucasian Mothers and NewbornsANNALS OF HUMAN GENETICS, Issue 1 2010Shuang Wang Summary The health impact of environmental toxins has gained increasing recognition over the years. Polycyclic aromatic hydrocarbons (PAHs) and environmental tobacco smoke (ETS) are known to affect nervous system development in children, but no studies have investigated how polymorphisms in PAH metabolic genes affect child cognitive development following PAH exposure during pregnancy. In two parallel prospective cohort studies of non-smoking African American and Dominican mothers and children in New York City and of Caucasian mothers and children in Krakow, Poland, we explored the effect of gene-PAH interaction on child mental development index (MDI). Genes known to play important roles in the metabolic activation or detoxification of PAHs were selected. Genetic variations in these genes could influence susceptibility to adverse effects of PAHs in polluted air. We explored the effects of interactions between prenatal PAH exposure and 21 polymorphisms or haplotypes in these genes on MDI at 12, 24, and 36 months among 547 newborns and 806 mothers from three different ethnic groups. Significant interaction effects between haplotypes and PAHs were observed in mothers and their newborns in all three ethnic groups after Bonferroni correction. The strongest and most consistent effect observed was between PAH and haplotype ACCGGC of the CYP1B1 gene. [source] Gene-environment interaction and obesityNUTRITION REVIEWS, Issue 12 2008Lu Qi The epidemic of obesity has become a major public health problem. Common-form obesity is underpinned by both environmental and genetic factors. Epidemiological studies have documented that increased intakes of energy and reduced consumption of high-fiber foods, as well as sedentary lifestyle, were among the major driving forces for the epidemic of obesity. Recent genome-wide association studies have identified several genes convincingly related to obesity risk, including the fat mass and obesity associated gene and the melanocortin-4 receptor gene. Testing gene-environment interaction is a relatively new field. This article reviews recent advances in identifying the genetic and environmental risk factors (lifestyle and diet) for obesity. The evidence for gene-environment interaction, especially from observational studies and randomized intervention trials, is examined specifically. Knowledge about the interplay between genetic and environmental components may facilitate the choice of more effective and specific measures for obesity prevention based on the personalized genetic make-up. [source] European Mathematical Genetics Meeting, Heidelberg, Germany, 12th,13th April 2007ANNALS OF HUMAN GENETICS, Issue 4 2007Article first published online: 28 MAY 200 Saurabh Ghosh 11 Indian Statistical Institute, Kolkata, India High correlations between two quantitative traits may be either due to common genetic factors or common environmental factors or a combination of both. In this study, we develop statistical methods to extract the contribution of a common QTL to the total correlation between the components of a bivariate phenotype. Using data on bivariate phenotypes and marker genotypes for sib-pairs, we propose a test for linkage between a common QTL and a marker locus based on the conditional cross-sib trait correlations (trait 1 of sib 1 , trait 2 of sib 2 and conversely) given the identity-by-descent sharing at the marker locus. The null hypothesis cannot be rejected unless there exists a common QTL. We use Monte-Carlo simulations to evaluate the performance of the proposed test under different trait parameters and quantitative trait distributions. An application of the method is illustrated using data on two alcohol-related phenotypes from the Collaborative Study On The Genetics Of Alcoholism project. Rémi Kazma 1 , Catherine Bonaďti-Pellié 1 , Emmanuelle Génin 12 INSERM UMR-S535 and Université Paris Sud, Villejuif, 94817, France Keywords: Gene-environment interaction, sibling recurrence risk, exposure correlation Gene-environment interactions may play important roles in complex disease susceptibility but their detection is often difficult. Here we show how gene-environment interactions can be detected by investigating the degree of familial aggregation according to the exposure of the probands. In case of gene-environment interaction, the distribution of genotypes of affected individuals, and consequently the risk in relatives, depends on their exposure. We developed a test comparing the risks in sibs according to the proband exposure. To evaluate the properties of this new test, we derived the formulas for calculating the expected risks in sibs according to the exposure of probands for various values of exposure frequency, relative risk due to exposure alone, frequencies of latent susceptibility genotypes, genetic relative risks and interaction coefficients. We find that the ratio of risks when the proband is exposed and not exposed is a good indicator of the interaction effect. We evaluate the power of the test for various sample sizes of affected individuals. We conclude that this test is valuable for diseases with moderate familial aggregation, only when the role of the exposure has been clearly evidenced. Since a correlation for exposure among sibs might lead to a difference in risks among sibs in the different proband exposure strata, we also add an exposure correlation coefficient in the model. Interestingly, we find that when this correlation is correctly accounted for, the power of the test is not decreased and might even be significantly increased. Andrea Callegaro 1 , Hans J.C. Van Houwelingen 1 , Jeanine Houwing-Duistermaat 13 Dept. of Medical Statistics and Bioinformatics, Leiden University Medical Center, The Netherlands Keywords: Survival analysis, age at onset, score test, linkage analysis Non parametric linkage (NPL) analysis compares the identical by descent (IBD) sharing in sibling pairs to the expected IBD sharing under the hypothesis of no linkage. Often information is available on the marginal cumulative hazards (for example breast cancer incidence curves). Our aim is to extend the NPL methods by taking into account the age at onset of selected sibling pairs using these known marginal hazards. Li and Zhong (2002) proposed a (retrospective) likelihood ratio test based on an additive frailty model for genetic linkage analysis. From their model we derive a score statistic for selected samples which turns out to be a weighed NPL method. The weights depend on the marginal cumulative hazards and on the frailty parameter. A second approach is based on a simple gamma shared frailty model. Here, we simply test whether the score function of the frailty parameter depends on the excess IBD. We compare the performance of these methods using simulated data. Céline Bellenguez 1 , Carole Ober 2 , Catherine Bourgain 14 INSERM U535 and University Paris Sud, Villejuif, France 5 Department of Human Genetics, The University of Chicago, USA Keywords: Linkage analysis, linkage disequilibrium, high density SNP data Compared with microsatellite markers, high density SNP maps should be more informative for linkage analyses. However, because they are much closer, SNPs present important linkage disequilibrium (LD), which biases classical nonparametric multipoint analyses. This problem is even stronger in population isolates where LD extends over larger regions with a more stochastic pattern. We investigate the issue of linkage analysis with a 500K SNP map in a large and inbred 1840-member Hutterite pedigree, phenotyped for asthma. Using an efficient pedigree breaking strategy, we first identified linked regions with a 5cM microsatellite map, on which we focused to evaluate the SNP map. The only method that models LD in the NPL analysis is limited in both the pedigree size and the number of markers (Abecasis and Wigginton, 2005) and therefore could not be used. Instead, we studied methods that identify sets of SNPs with maximum linkage information content in our pedigree and no LD-driven bias. Both algorithms that directly remove pairs of SNPs in high LD and clustering methods were evaluated. Null simulations were performed to control that Zlr calculated with the SNP sets were not falsely inflated. Preliminary results suggest that although LD is strong in such populations, linkage information content slightly better than that of microsatellite maps can be extracted from dense SNP maps, provided that a careful marker selection is conducted. In particular, we show that the specific LD pattern requires considering LD between a wide range of marker pairs rather than only in predefined blocks. Peter Van Loo 1,2,3 , Stein Aerts 1,2 , Diether Lambrechts 4,5 , Bernard Thienpont 2 , Sunit Maity 4,5 , Bert Coessens 3 , Frederik De Smet 4,5 , Leon-Charles Tranchevent 3 , Bart De Moor 2 , Koen Devriendt 3 , Peter Marynen 1,2 , Bassem Hassan 1,2 , Peter Carmeliet 4,5 , Yves Moreau 36 Department of Molecular and Developmental Genetics, VIB, Belgium 7 Department of Human Genetics, University of Leuven, Belgium 8 Bioinformatics group, Department of Electrical Engineering, University of Leuven, Belgium 9 Department of Transgene Technology and Gene Therapy, VIB, Belgium 10 Center for Transgene Technology and Gene Therapy, University of Leuven, Belgium Keywords: Bioinformatics, gene prioritization, data fusion The identification of genes involved in health and disease remains a formidable challenge. Here, we describe a novel bioinformatics method to prioritize candidate genes underlying pathways or diseases, based on their similarity to genes known to be involved in these processes. It is freely accessible as an interactive software tool, ENDEAVOUR, at http://www.esat.kuleuven.be/endeavour. Unlike previous methods, ENDEAVOUR generates distinct prioritizations from multiple heterogeneous data sources, which are then integrated, or fused, into one global ranking using order statistics. ENDEAVOUR prioritizes candidate genes in a three-step process. First, information about a disease or pathway is gathered from a set of known "training" genes by consulting multiple data sources. Next, the candidate genes are ranked based on similarity with the training properties obtained in the first step, resulting in one prioritized list for each data source. Finally, ENDEAVOUR fuses each of these rankings into a single global ranking, providing an overall prioritization of the candidate genes. Validation of ENDEAVOUR revealed it was able to efficiently prioritize 627 genes in disease data sets and 76 genes in biological pathway sets, identify candidates of 16 mono- or polygenic diseases, and discover regulatory genes of myeloid differentiation. Furthermore, the approach identified YPEL1 as a novel gene involved in craniofacial development from a 2-Mb chromosomal region, deleted in some patients with DiGeorge-like birth defects. Finally, we are currently evaluating a pipeline combining array-CGH, ENDEAVOUR and in vivo validation in zebrafish to identify novel genes involved in congenital heart defects. Mark Broom 1 , Graeme Ruxton 2 , Rebecca Kilner 311 Mathematics Dept., University of Sussex, UK 12 Division of Environmental and Evolutionary Biology, University of Glasgow, UK 13 Department of Zoology, University of Cambridge, UK Keywords: Evolutionarily stable strategy, parasitism, asymmetric game Brood parasites chicks vary in the harm that they do to their companions in the nest. In this presentation we use game-theoretic methods to model this variation. Our model considers hosts which potentially abandon single nestlings and instead choose to re-allocate their reproductive effort to future breeding, irrespective of whether the abandoned chick is the host's young or a brood parasite's. The parasite chick must decide whether or not to kill host young by balancing the benefits from reduced competition in the nest against the risk of desertion by host parents. The model predicts that three different types of evolutionarily stable strategies can exist. (1) Hosts routinely rear depleted broods, the brood parasite always kills host young and the host never then abandons the nest. (2) When adult survival after deserting single offspring is very high, hosts always abandon broods of a single nestling and the parasite never kills host offspring, effectively holding them as hostages to prevent nest desertion. (3) Intermediate strategies, in which parasites sometimes kill their nest-mates and host parents sometimes desert nests that contain only a single chick, can also be evolutionarily stable. We provide quantitative descriptions of how the values given to ecological and behavioral parameters of the host-parasite system influence the likelihood of each strategy and compare our results with real host-brood parasite associations in nature. Martin Harrison 114 Mathematics Dept, University of Sussex, UK Keywords: Brood parasitism, games, host, parasite The interaction between hosts and parasites in bird populations has been studied extensively. Game theoretical methods have been used to model this interaction previously, but this has not been studied extensively taking into account the sequential nature of this game. We consider a model allowing the host and parasite to make a number of decisions, which depend on a number of natural factors. The host lays an egg, a parasite bird will arrive at the nest with a certain probability and then chooses to destroy a number of the host eggs and lay one of it's own. With some destruction occurring, either natural or through the actions of the parasite, the host chooses to continue, eject an egg (hoping to eject the parasite) or abandon the nest. Once the eggs have hatched the game then falls to the parasite chick versus the host. The chick chooses to destroy or eject a number of eggs. The final decision is made by the host, choosing whether to raise or abandon the chicks that are in the nest. We consider various natural parameters and probabilities which influence these decisions. We then use this model to look at real-world situations of the interactions of the Reed Warbler and two different parasites, the Common Cuckoo and the Brown-Headed Cowbird. These two parasites have different methods in the way that they parasitize the nests of their hosts. The hosts in turn have a different reaction to these parasites. Arne Jochens 1 , Amke Caliebe 2 , Uwe Roesler 1 , Michael Krawczak 215 Mathematical Seminar, University of Kiel, Germany 16 Institute of Medical Informatics and Statistics, University of Kiel, Germany Keywords: Stepwise mutation model, microsatellite, recursion equation, temporal behaviour We consider the stepwise mutation model which occurs, e.g., in microsatellite loci. Let X(t,i) denote the allelic state of individual i at time t. We compute expectation, variance and covariance of X(t,i), i=1,,,N, and provide a recursion equation for P(X(t,i)=z). Because the variance of X(t,i) goes to infinity as t grows, for the description of the temporal behaviour, we regard the scaled process X(t,i)-X(t,1). The results furnish a better understanding of the behaviour of the stepwise mutation model and may in future be used to derive tests for neutrality under this model. Paul O'Reilly 1 , Ewan Birney 2 , David Balding 117 Statistical Genetics, Department of Epidemiology and Public Health, Imperial, College London, UK 18 European Bioinformatics Institute, EMBL, Cambridge, UK Keywords: Positive selection, Recombination rate, LD, Genome-wide, Natural Selection In recent years efforts to develop population genetics methods that estimate rates of recombination and levels of natural selection in the human genome have intensified. However, since the two processes have an intimately related impact on genetic variation their inference is vulnerable to confounding. Genomic regions subject to recent selection are likely to have a relatively recent common ancestor and consequently less opportunity for historical recombinations that are detectable in contemporary populations. Here we show that selection can reduce the population-based recombination rate estimate substantially. In genome-wide studies for detecting selection we observe a tendency to highlight loci that are subject to low levels of recombination. We find that the outlier approach commonly adopted in such studies may have low power unless variable recombination is accounted for. We introduce a new genome-wide method for detecting selection that exploits the sensitivity to recent selection of methods for estimating recombination rates, while accounting for variable recombination using pedigree data. Through simulations we demonstrate the high power of the Ped/Pop approach to discriminate between neutral and adaptive evolution, particularly in the context of choosing outliers from a genome-wide distribution. Although methods have been developed showing good power to detect selection ,in action', the corresponding window of opportunity is small. In contrast, the power of the Ped/Pop method is maintained for many generations after the fixation of an advantageous variant Sarah Griffiths 1 , Frank Dudbridge 120 MRC Biostatistics Unit, Cambridge, UK Keywords: Genetic association, multimarker tag, haplotype, likelihood analysis In association studies it is generally too expensive to genotype all variants in all subjects. We can exploit linkage disequilibrium between SNPs to select a subset that captures the variation in a training data set obtained either through direct resequencing or a public resource such as the HapMap. These ,tag SNPs' are then genotyped in the whole sample. Multimarker tagging is a more aggressive adaptation of pairwise tagging that allows for combinations of two or more tag SNPs to predict an untyped SNP. Here we describe a new method for directly testing the association of an untyped SNP using a multimarker tag. Previously, other investigators have suggested testing a specific tag haplotype, or performing a weighted analysis using weights derived from the training data. However these approaches do not properly account for the imperfect correlation between the tag haplotype and the untyped SNP. Here we describe a straightforward approach to testing untyped SNPs using a missing-data likelihood analysis, including the tag markers as nuisance parameters. The training data is stacked on top of the main body of genotype data so there is information on how the tag markers predict the genotype of the untyped SNP. The uncertainty in this prediction is automatically taken into account in the likelihood analysis. This approach yields more power and also a more accurate prediction of the odds ratio of the untyped SNP. Anke Schulz 1 , Christine Fischer 2 , Jenny Chang-Claude 1 , Lars Beckmann 121 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany 22 Institute of Human Genetics, University of Heidelberg, Germany Keywords: Haplotype, haplotype sharing, entropy, Mantel statistics, marker selection We previously introduced a new method to map genes involved in complex diseases, using haplotype sharing-based Mantel statistics to correlate genetic and phenotypic similarity. Although the Mantel statistic is powerful in narrowing down candidate regions, the precise localization of a gene is hampered in genomic regions where linkage disequilibrium is so high that neighboring markers are found to be significant at similar magnitude and we are not able to discriminate between them. Here, we present a new approach to localize susceptibility genes by combining haplotype sharing-based Mantel statistics with an iterative entropy-based marker selection algorithm. For each marker at which the Mantel statistic is evaluated, the algorithm selects a subset of surrounding markers. The subset is chosen to maximize multilocus linkage disequilibrium, which is measured by the normalized entropy difference introduced by Nothnagel et al. (2002). We evaluated the algorithm with respect to type I error and power. Its ability to localize the disease variant was compared to the localization (i) without marker selection and (ii) considering haplotype block structure. Case-control samples were simulated from a set of 18 haplotypes, consisting of 15 SNPs in two haplotype blocks. The new algorithm gave correct type I error and yielded similar power to detect the disease locus compared to the alternative approaches. The neighboring markers were clearly less often significant than the causal locus, and also less often significant compared to the alternative approaches. Thus the new algorithm improved the precision of the localization of susceptibility genes. Mark M. Iles 123 Section of Epidemiology and Biostatistics, LIMM, University of Leeds, UK Keywords: tSNP, tagging, association, HapMap Tagging SNPs (tSNPs) are commonly used to capture genetic diversity cost-effectively. However, it is important that the efficacy of tSNPs is correctly estimated, otherwise coverage may be insufficient. If the pilot sample from which tSNPs are chosen is too small or the initial marker map too sparse, tSNP efficacy may be overestimated. An existing estimation method based on bootstrapping goes some way to correct for insufficient sample size and overfitting, but does not completely solve the problem. We describe a novel method, based on exclusion of haplotypes, that improves on the bootstrap approach. Using simulated data, the extent of the sample size problem is investigated and the performance of the bootstrap and the novel method are compared. We incorporate an existing method adjusting for marker density by ,SNP-dropping'. We find that insufficient sample size can cause large overestimates in tSNP efficacy, even with as many as 100 individuals, and the problem worsens as the region studied increases in size. Both the bootstrap and novel method correct much of this overestimate, with our novel method consistently outperforming the bootstrap method. We conclude that a combination of insufficient sample size and overfitting may lead to overestimation of tSNP efficacy and underpowering of studies based on tSNPs. Our novel approach corrects for much of this bias and is superior to the previous method. Sample sizes larger than previously suggested may still be required for accurate estimation of tSNP efficacy. This has obvious ramifications for the selection of tSNPs from HapMap data. Claudio Verzilli 1 , Juliet Chapman 1 , Aroon Hingorani 2 , Juan Pablo-Casas 1 , Tina Shah 2 , Liam Smeeth 1 , John Whittaker 124 Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, UK 25 Division of Medicine, University College London, UK Keywords: Meta-analysis, Genetic association studies We present a Bayesian hierarchical model for the meta-analysis of candidate gene studies with a continuous outcome. Such studies often report results from association tests for different, possibly study-specific and non-overlapping markers (typically SNPs) in the same genetic region. Meta analyses of the results at each marker in isolation are seldom appropriate as they ignore the correlation that may exist between markers due to linkage disequlibrium (LD) and cannot assess the relative importance of variants at each marker. Also such marker-wise meta analyses are restricted to only those studies that have typed the marker in question, with a potential loss of power. A better strategy is one which incorporates information about the LD between markers so that any combined estimate of the effect of each variant is corrected for the effect of other variants, as in multiple regression. Here we develop a Bayesian hierarchical linear regression that models the observed genotype group means and uses pairwise LD measurements between markers as prior information to make posterior inference on adjusted effects. The approach is applied to the meta analysis of 24 studies assessing the effect of 7 variants in the C-reactive protein (CRP) gene region on plasma CRP levels, an inflammatory biomarker shown in observational studies to be positively associated with cardiovascular disease. Cathryn M. Lewis 1 , Christopher G. Mathew 1 , Theresa M. Marteau 226 Dept. of Medical and Molecular Genetics, King's College London, UK 27 Department of Psychology, King's College London, UK Keywords: Risk, genetics, CARD15, smoking, model Recently progress has been made in identifying mutations that confer susceptibility to complex diseases, with the potential to use these mutations in determining disease risk. We developed methods to estimate disease risk based on genotype relative risks (for a gene G), exposure to an environmental factor (E), and family history (with recurrence risk ,R for a relative of type R). ,R must be partitioned into the risk due to G (which is modelled independently) and the residual risk. The risk model was then applied to Crohn's disease (CD), a severe gastrointestinal disease for which smoking increases disease risk approximately 2-fold, and mutations in CARD15 confer increased risks of 2.25 (for carriers of a single mutation) and 9.3 (for carriers of two mutations). CARD15 accounts for only a small proportion of the genetic component of CD, with a gene-specific ,S, CARD15 of 1.16, from a total sibling relative risk of ,S= 27. CD risks were estimated for high-risk individuals who are siblings of a CD case, and who also smoke. The CD risk to such individuals who carry two CARD15 mutations is approximately 0.34, and for those carrying a single CARD15 mutation the risk is 0.08, compared to a population prevalence of approximately 0.001. These results imply that complex disease genes may be valuable in estimating with greater precision than has hitherto been possible disease risks in specific, easily identified subgroups of the population with a view to prevention. Yurii Aulchenko 128 Department of Epidemiology & Biostatistics, Erasmus Medical Centre Rotterdam, The Netherlands Keywords: Compression, information, bzip2, genome-wide SNP data, statistical genetics With advances in molecular technology, studies accessing millions of genetic polymorphisms in thousands of study subjects will soon become common. Such studies generate large amounts of data, whose effective storage and management is a challenge to the modern statistical genetics. Standard file compression utilities, such as Zip, Gzip and Bzip2, may be helpful to minimise the storage requirements. Less obvious is the fact that the data compression techniques may be also used in the analysis of genetic data. It is known that the efficiency of a particular compression algorithm depends on the probability structure of the data. In this work, we compared different standard and customised tools using the data from human HapMap project. Secondly, we investigate the potential uses of data compression techniques for the analysis of linkage, association and linkage disequilibrium Suzanne Leal 1 , Bingshan Li 129 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, USA Keywords: Consanguineous pedigrees, missing genotype data Missing genotype data can increase false-positive evidence for linkage when either parametric or nonparametric analysis is carried out ignoring intermarker linkage disequilibrium (LD). Previously it was demonstrated by Huang et al (2005) that no bias occurs in this situation for affected sib-pairs with unrelated parents when either both parents are genotyped or genotype data is available for two additional unaffected siblings when parental genotypes are missing. However, this is not the case for consanguineous pedigrees, where missing genotype data for any pedigree member within a consanguinity loop can increase false-positive evidence of linkage. The false-positive evidence for linkage is further increased when cryptic consanguinity is present. The amount of false-positive evidence for linkage is highly dependent on which family members are genotyped. When parental genotype data is available, the false-positive evidence for linkage is usually not as strong as when parental genotype data is unavailable. Which family members will aid in the reduction of false-positive evidence of linkage is highly dependent on which other family members are genotyped. For a pedigree with an affected proband whose first-cousin parents have been genotyped, further reduction in the false-positive evidence of linkage can be obtained by including genotype data from additional affected siblings of the proband or genotype data from the proband's sibling-grandparents. When parental genotypes are not available, false-positive evidence for linkage can be reduced by including in the analysis genotype data from either unaffected siblings of the proband or the proband's married-in-grandparents. Najaf Amin 1 , Yurii Aulchenko 130 Department of Epidemiology & Biostatistics, Erasmus Medical Centre Rotterdam, The Netherlands Keywords: Genomic Control, pedigree structure, quantitative traits The Genomic Control (GC) method was originally developed to control for population stratification and cryptic relatedness in association studies. This method assumes that the effect of population substructure on the test statistics is essentially constant across the genome, and therefore unassociated markers can be used to estimate the effect of confounding onto the test statistic. The properties of GC method were extensively investigated for different stratification scenarios, and compared to alternative methods, such as the transmission-disequilibrium test. The potential of this method to correct not for occasional cryptic relations, but for regular pedigree structure, however, was not investigated before. In this work we investigate the potential of the GC method for pedigree-based association analysis of quantitative traits. The power and type one error of the method was compared to standard methods, such as the measured genotype (MG) approach and quantitative trait transmission-disequilibrium test. In human pedigrees, with trait heritability varying from 30 to 80%, the power of MG and GC approach was always higher than that of TDT. GC had correct type 1 error and its power was close to that of MG under moderate heritability (30%), but decreased with higher heritability. William Astle 1 , Chris Holmes 2 , David Balding 131 Department of Epidemiology and Public Health, Imperial College London, UK 32 Department of Statistics, University of Oxford, UK Keywords: Population structure, association studies, genetic epidemiology, statistical genetics In the analysis of population association studies, Genomic Control (Devlin & Roeder, 1999) (GC) adjusts the Armitage test statistic to correct the type I error for the effects of population substructure, but its power is often sub-optimal. Turbo Genomic Control (TGC) generalises GC to incorporate co-variation of relatedness and phenotype, retaining control over type I error while improving power. TGC is similar to the method of Yu et al. (2006), but we extend it to binary (case-control) in addition to quantitative phenotypes, we implement improved estimation of relatedness coefficients, and we derive an explicit statistic that generalizes the Armitage test statistic and is fast to compute. TGC also has similarities to EIGENSTRAT (Price et al., 2006) which is a new method based on principle components analysis. The problems of population structure(Clayton et al., 2005) and cryptic relatedness (Voight & Pritchard, 2005) are essentially the same: if patterns of shared ancestry differ between cases and controls, whether distant (coancestry) or recent (cryptic relatedness), false positives can arise and power can be diminished. With large numbers of widely-spaced genetic markers, coancestry can now be measured accurately for each pair of individuals via patterns of allele-sharing. Instead of modelling subpopulations, we work instead with a coancestry coefficient for each pair of individuals in the study. We explain the relationships between TGC, GC and EIGENSTRAT. We present simulation studies and real data analyses to illustrate the power advantage of TGC in a range of scenarios incorporating both substructure and cryptic relatedness. References Clayton, D. G.et al. (2005) Population structure, differential bias and genomic control in a large-scale case-control association study. Nature Genetics37(11) November 2005. Devlin, B. & Roeder, K. (1999) Genomic control for association studies. Biometics55(4) December 1999. Price, A. L.et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics38(8) (August 2006). Voight, B. J. & Pritchard, J. K. (2005) Confounding from cryptic relatedness in case-control association studies. Public Library of Science Genetics1(3) September 2005. Yu, J.et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics38(2) February 2006. Hervé Perdry 1 , Marie-Claude Babron 1 , Françoise Clerget-Darpoux 133 INSERM U535 and Univ. Paris Sud, UMR-S 535, Villejuif, France Keywords: Modifier genes, case-parents trios, ordered transmission disequilibrium test A modifying locus is a polymorphic locus, distinct from the disease locus, which leads to differences in the disease phenotype, either by modifying the penetrance of the disease allele, or by modifying the expression of the disease. The effect of such a locus is a clinical heterogeneity that can be reflected by the values of an appropriate covariate, such as the age of onset, or the severity of the disease. We designed the Ordered Transmission Disequilibrium Test (OTDT) to test for a relation between the clinical heterogeneity, expressed by the covariate, and marker genotypes of a candidate gene. The method applies to trio families with one affected child and his parents. Each family member is genotyped at a bi-allelic marker M of a candidate gene. To each of the families is associated a covariate value; the families are ordered on the values of this covariate. As the TDT (Spielman et al. 1993), the OTDT is based on the observation of the transmission rate T of a given allele at M. The OTDT aims to find a critical value of the covariate which separates the sample of families in two subsamples in which the transmission rates are significantly different. We investigate the power of the method by simulations under various genetic models and covariate distributions. Acknowledgments H Perdry is funded by ARSEP. Pascal Croiseau 1 , Heather Cordell 2 , Emmanuelle Génin 134 INSERM U535 and University Paris Sud, UMR-S535, Villejuif, France 35 Institute of Human Genetics, Newcastle University, UK Keywords: Association, missing data, conditionnal logistic regression Missing data is an important problem in association studies. Several methods used to test for association need that individuals be genotyped at the full set of markers. Individuals with missing data need to be excluded from the analysis. This could involve an important decrease in sample size and a loss of information. If the disease susceptibility locus (DSL) is poorly typed, it is also possible that a marker in linkage disequilibrium gives a stronger association signal than the DSL. One may then falsely conclude that the marker is more likely to be the DSL. We recently developed a Multiple Imputation method to infer missing data on case-parent trios Starting from the observed data, a few number of complete data sets are generated by Markov-Chain Monte Carlo approach. These complete datasets are analysed using standard statistical package and the results are combined as described in Little & Rubin (2002). Here we report the results of simulations performed to examine, for different patterns of missing data, how often the true DSL gives the highest association score among different loci in LD. We found that multiple imputation usually correctly detect the DSL site even if the percentage of missing data is high. This is not the case for the naďve approach that consists in discarding trios with missing data. In conclusion, Multiple imputation presents the advantage of being easy to use and flexible and is therefore a promising tool in the search for DSL involved in complex diseases. Salma Kotti 1 , Heike Bickeböller 2 , Françoise Clerget-Darpoux 136 University Paris Sud, UMR-S535, Villejuif, France 37 Department of Genetic Epidemiology, Medical School, University of Göttingen, Germany Keywords: Genotype relative risk, internal controls, Family based analyses Family based analyses using internal controls are very popular both for detecting the effect of a genetic factor and for estimating the relative disease risk on the corresponding genotypes. Two different procedures are often applied to reconstitute internal controls. The first one considers one pseudocontrol genotype formed by the parental non-transmitted alleles called also 1:1 matching of alleles, while the second corresponds to three pseudocontrols corresponding to all genotypes formed by the parental alleles except the one of the case (1:3 matching). Many studies have compared between the two procedures in terms of the power and have concluded that the difference depends on the underlying genetic model and the allele frequencies. However, the estimation of the Genotype Relative Risk (GRR) under the two procedures has not been studied. Based on the fact that on the 1:1 matching, the control group is composed of the alleles untransmitted to the affected child and on the 1:3 matching, the control group is composed amongst alleles already transmitted to the affected child, we expect a difference on the GRR estimation. In fact, we suspect that the second procedure leads to biased estimation of the GRRs. We will analytically derive the GRR estimators for the 1:1 and 1:3 matching and will present the results at the meeting. Family based analyses using internal controls are very popular both for detecting the effect of a genetic factor and for estimating the relative disease risk on the corresponding genotypes. Two different procedures are often applied to reconstitute internal controls. The first one considers one pseudocontrol genotype formed by the parental non-transmitted alleles called also 1:1 matching of alleles, while the second corresponds to three pseudocontrols corresponding to all genotypes formed by the parental alleles except the one of the case (1:3 matching). Many studies have compared between the two procedures in terms of the power and have concluded that the difference depends on the underlying genetic model and the allele frequencies. However, the estimation of the Genotype Relative Risk (GRR) under the two procedures has not been studied. Based on the fact that on the 1:1 matching, the control group is composed of the alleles untransmitted to the affected child and on the 1:3 matching, the control group is composed amongst alleles already transmitted to the affected child, we expect a difference on the GRR estimation. In fact, we suspect that the second procedure leads to biased estimation of the GRR. We will analytically derive the GRR estimator for the 1:1 and 1:3 matching and will present the results at the meeting. Luigi Palla 1 , David Siegmund 239 Department of Mathematics,Free University Amsterdam, The Netherlands 40 Department of Statistics, Stanford University, California, USA Keywords: TDT, assortative mating, inbreeding, statistical power A substantial amount of Assortative Mating (AM) is often recorded on physical and psychological, dichotomous as well as quantitative traits that are supposed to have a multifactorial genetic component. In particular AM has the effect of increasing the genetic variance, even more than inbreeding because it acts across loci beside within loci, when the trait has a multifactorial origin. Under the assumption of a polygenic model for AM dating back to Wright (1921) and refined by Crow and Felsenstein (1968,1982), the effect of assortative mating on the power to detect genetic association in the Transmission Disequilibrium Test (TDT) is explored as parameters, such as the effective number of genes and the allelic frequency vary. The power is reflected by the non centrality parameter of the TDT and is expressed as a function of the number of trios, the relative risk of the heterozygous genotype and the allele frequency (Siegmund and Yakir, 2007). The noncentrality parameter of the relevant score statistic is updated considering the effect of AM which is expressed in terms of an ,effective' inbreeding coefficient. In particular, for dichotomous traits it is apparent that the higher the number of genes involved in the trait, the lower the loss in power due to AM. Finally an attempt is made to extend this relation to the Q-TDT (Rabinowitz, 1997), which involves considering the effect of AM also on the phenotypic variance of the trait of interest, under the assumption that AM affects only its additive genetic component. References Crow, & Felsenstein, (1968). The effect of assortative mating on the genetic composition of a population. Eugen.Quart.15, 87,97. Rabinowitz,, 1997. A Transmission Disequilibrium Test for Quantitative Trait Loci. Human Heredity47, 342,350. Siegmund, & Yakir, (2007) Statistics of gene mapping, Springer. Wright, (1921). System of mating.III. Assortative mating based on somatic resemblance. Genetics6, 144,161. Jérémie Nsengimana 1 , Ben D Brown 2 , Alistair S Hall 2 , Jenny H Barrett 141 Leeds Institute of Molecular Medicine, University of Leeds, UK 42 Leeds Institute for Genetics, Health and Therapeutics, University of Leeds, UK Keywords: Inflammatory genes, haplotype, coronary artery disease Genetic Risk of Acute Coronary Events (GRACE) is an initiative to collect cases of coronary artery disease (CAD) and their unaffected siblings in the UK and to use them to map genetic variants increasing disease risk. The aim of the present study was to test the association between CAD and 51 single nucleotide polymorphisms (SNPs) and their haplotypes from 35 inflammatory genes. Genotype data were available for 1154 persons affected before age 66 (including 48% before age 50) and their 1545 unaffected siblings (891 discordant families). Each SNP was tested for association to CAD, and haplotypes within genes or gene clusters were tested using FBAT (Rabinowitz & Laird, 2000). For the most significant results, genetic effect size was estimated using conditional logistic regression (CLR) within STATA adjusting for other risk factors. Haplotypes were assigned using HAPLORE (Zhang et al., 2005), which considers all parental mating types consistent with offspring genotypes and assigns them a probability of occurence. This probability was used in CLR to weight the haplotypes. In the single SNP analysis, several SNPs showed some evidence for association, including one SNP in the interleukin-1A gene. Analysing haplotypes in the interleukin-1 gene cluster, a common 3-SNP haplotype was found to increase the risk of CAD (P = 0.009). In an additive genetic model adjusting for covariates the odds ratio (OR) for this haplotype is 1.56 (95% CI: 1.16-2.10, p = 0.004) for early-onset CAD (before age 50). This study illustrates the utility of haplotype analysis in family-based association studies to investigate candidate genes. References Rabinowitz, D. & Laird, N. M. (2000) Hum Hered50, 211,223. Zhang, K., Sun, F. & Zhao, H. (2005) Bioinformatics21, 90,103. Andrea Foulkes 1 , Recai Yucel 1 , Xiaohong Li 143 Division of Biostatistics, University of Massachusetts, USA Keywords: Haploytpe, high-dimensional, mixed modeling The explosion of molecular level information coupled with large epidemiological studies presents an exciting opportunity to uncover the genetic underpinnings of complex diseases; however, several analytical challenges remain to be addressed. Characterizing the components to complex diseases inevitably requires consideration of synergies across multiple genetic loci and environmental and demographic factors. In addition, it is critical to capture information on allelic phase, that is whether alleles within a gene are in cis (on the same chromosome) or in trans (on different chromosomes.) In associations studies of unrelated individuals, this alignment of alleles within a chromosomal copy is generally not observed. We address the potential ambiguity in allelic phase in this high dimensional data setting using mixed effects models. Both a semi-parametric and fully likelihood-based approach to estimation are considered to account for missingness in cluster identifiers. In the first case, we apply a multiple imputation procedure coupled with a first stage expectation maximization algorithm for parameter estimation. A bootstrap approach is employed to assess sensitivity to variability induced by parameter estimation. Secondly, a fully likelihood-based approach using an expectation conditional maximization algorithm is described. Notably, these models allow for characterizing high-order gene-gene interactions while providing a flexible statistical framework to account for the confounding or mediating role of person specific covariates. The proposed method is applied to data arising from a cohort of human immunodeficiency virus type-1 (HIV-1) infected individuals at risk for therapy associated dyslipidemia. Simulation studies demonstrate reasonable power and control of family-wise type 1 error rates. Vivien Marquard 1 , Lars Beckmann 1 , Jenny Chang-Claude 144 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany Keywords: Genotyping errors, type I error, haplotype-based association methods It has been shown in several simulation studies that genotyping errors may have a great impact on the type I error of statistical methods used in genetic association analysis of complex diseases. Our aim was to investigate type I error rates in a case-control study, when differential and non-differential genotyping errors were introduced in realistic scenarios. We simulated case-control data sets, where individual genotypes were drawn from a haplotype distribution of 18 haplotypes with 15 markers in the APM1 gene. Genotyping errors were introduced following the unrestricted and symmetric with 0 edges error models described by Heid et al. (2006). In six scenarios, errors resulted from changes of one allele to another with predefined probabilities of 1%, 2.5% or 10%, respectively. A multiple number of errors per haplotype was possible and could vary between 0 and 15, the number of markers investigated. We examined three association methods: Mantel statistics using haplotype-sharing; a haplotype-specific score test; and Armitage trend test for single markers. The type I error rates were not influenced for any of all the three methods for a genotyping error rate of less than 1%. For higher error rates and differential errors, the type I error of the Mantel statistic was only slightly and of the Armitage trend test moderately increased. The type I error rates of the score test were highly increased. The type I error rates were correct for all three methods for non-differential errors. Further investigations will be carried out with different frequencies of differential error rates and focus on power. Arne Neumann 1 , Dörthe Malzahn 1 , Martina Müller 2 , Heike Bickeböller 145 Department of Genetic Epidemiology, Medical School, University of Göttingen, Germany 46 GSF-National Research Center for Environment and Health, Neuherberg & IBE-Institute of Epidemiology, Ludwig-Maximilians University München, Germany Keywords: Interaction, longitudinal, nonparametric Longitudinal data show the time dependent course of phenotypic traits. In this contribution, we consider longitudinal cohort studies and investigate the association between two candidate genes and a dependent quantitative longitudinal phenotype. The set-up defines a factorial design which allows us to test simultaneously for the overall gene effect of the loci as well as for possible gene-gene and gene time interaction. The latter would induce genetically based time-profile differences in the longitudinal phenotype. We adopt a non-parametric statistical test to genetic epidemiological cohort studies and investigate its performance by simulation studies. The statistical test was originally developed for longitudinal clinical studies (Brunner, Munzel, Puri, 1999 J Multivariate Anal 70:286-317). It is non-parametric in the sense that no assumptions are made about the underlying distribution of the quantitative phenotype. Longitudinal observations belonging to the same individual can be arbitrarily dependent on one another for the different time points whereas trait observations of different individuals are independent. The two loci are assumed to be statistically independent. Our simulations show that the nonparametric test is comparable with ANOVA in terms of power of detecting gene-gene and gene-time interaction in an ANOVA favourable setting. Rebecca Hein 1 , Lars Beckmann 1 , Jenny Chang-Claude 147 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany Keywords: Indirect association studies, interaction effects, linkage disequilibrium, marker allele frequency Association studies accounting for gene-environment interactions (GxE) may be useful for detecting genetic effects and identifying important environmental effect modifiers. Current technology facilitates very dense marker spacing in genetic association studies; however, the true disease variant(s) may not be genotyped. In this situation, an association between a gene and a phenotype may still be detectable, using genetic markers associated with the true disease variant(s) (indirect association). Zondervan and Cardon [2004] showed that the odds ratios (OR) of markers which are associated with the disease variant depend highly on the linkage disequilibrium (LD) between the variant and the markers, and whether the allele frequencies match and thereby influence the sample size needed to detect genetic association. We examined the influence of LD and allele frequencies on the sample size needed to detect GxE in indirect association studies, and provide tables for sample size estimation. For discordant allele frequencies and incomplete LD, sample sizes can be unfeasibly large. The influence of both factors is stronger for disease loci with small rather than moderate to high disease allele frequencies. A decline in D' of e.g. 5% has less impact on sample size than increasing the difference in allele frequencies by the same percentage. Assuming 80% power, large interaction effects can be detected using smaller sample sizes than those needed for the detection of main effects. The detection of interaction effects involving rare alleles may not be possible. Focussing only on marker density can be a limited strategy in indirect association studies for GxE. Cyril Dalmasso 1 , Emmanuelle Génin 2 , Catherine Bourgain 2 , Philippe Broët 148 JE 2492 , Univ. Paris-Sud, France 49 INSERM UMR-S 535 and University Paris Sud, Villejuif, France Keywords: Linkage analysis, Multiple testing, False Discovery Rate, Mixture model In the context of genome-wide linkage analyses, where a large number of statistical tests are simultaneously performed, the False Discovery Rate (FDR) that is defined as the expected proportion of false discoveries among all discoveries is nowadays widely used for taking into account the multiple testing problem. Other related criteria have been considered such as the local False Discovery Rate (lFDR) that is a variant of the FDR giving to each test its own measure of significance. The lFDR is defined as the posterior probability that a null hypothesis is true. Most of the proposed methods for estimating the lFDR or the FDR rely on distributional assumption under the null hypothesis. However, in observational studies, the empirical null distribution may be very different from the theoretical one. In this work, we propose a mixture model based approach that provides estimates of the lFDR and the FDR in the context of large-scale variance component linkage analyses. In particular, this approach allows estimating the empirical null distribution, this latter being a key quantity for any simultaneous inference procedure. The proposed method is applied on a real dataset. Arief Gusnanto 1 , Frank Dudbridge 150 MRC Biostatistics Unit, Cambridge UK Keywords: Significance, genome-wide, association, permutation, multiplicity Genome-wide association scans have introduced statistical challenges, mainly in the multiplicity of thousands of tests. The question of what constitutes a significant finding remains somewhat unresolved. Permutation testing is very time-consuming, whereas Bayesian arguments struggle to distinguish direct from indirect association. It seems attractive to summarise the multiplicity in a simple form that allows users to avoid time-consuming permutations. A standard significance level would facilitate reporting of results and reduce the need for permutation tests. This is potentially important because current scans do not have full coverage of the whole genome, and yet, the implicit multiplicity is genome-wide. We discuss some proposed summaries, with reference to the empirical null distribution of the multiple tests, approximated through a large number of random permutations. Using genome-wide data from the Wellcome Trust Case-Control Consortium, we use a sub-sampling approach with increasing density to estimate the nominal p-value to obtain family-wise significance of 5%. The results indicate that the significance level is converging to about 1e-7 as the marker spacing becomes infinitely dense. We considered the concept of an effective number of independent tests, and showed that when used in a Bonferroni correction, the number varies with the overall significance level, but is roughly constant in the region of interest. We compared several estimators of the effective number of tests, and showed that in the region of significance of interest, Patterson's eigenvalue based estimator gives approximately the right family-wise error rate. Michael Nothnagel 1 , Amke Caliebe 1 , Michael Krawczak 151 Institute of Medical Informatics and Statistics, University Clinic Schleswig-Holstein, University of Kiel, Germany Keywords: Association scans, Bayesian framework, posterior odds, genetic risk, multiplicative model Whole-genome association scans have been suggested to be a cost-efficient way to survey genetic variation and to map genetic disease factors. We used a Bayesian framework to investigate the posterior odds of a genuine association under multiplicative disease models. We demonstrate that the p value alone is not a sufficient means to evaluate the findings in association studies. We suggest that likelihood ratios should accompany p values in association reports. We argue, that, given the reported results of whole-genome scans, more associations should have been successfully replicated if the consistently made assumptions about considerable genetic risks were correct. We conclude that it is very likely that the vast majority of relative genetic risks are only of the order of 1.2 or lower. Clive Hoggart 1 , Maria De Iorio 1 , John Whittakker 2 , David Balding 152 Department of Epidemiology and Public Health, Imperial College London, UK 53 Department of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, UK Keywords: Genome-wide association analyses, shrinkage priors, Lasso Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants of small effect, which is a plausible scenario for many complex diseases. Moreover, many simulation studies assume a single causal variant and so more complex realities are ignored. Analysing large numbers of variants simultaneously is now becoming feasible, thanks to developments in Bayesian stochastic search methods. We pose the problem of SNP selection as variable selection in a regression model. In contrast to single SNP tests this approach simultaneously models the effect of all SNPs. SNPs are selected by a Bayesian interpretation of the lasso (Tibshirani, 1996); the maximum a posterior (MAP) estimate of the regression coefficients, which have been given independent, double exponential prior distributions. The double exponential distribution is an example of a shrinkage prior, MAP estimates with shrinkage priors can be zero, thus all SNPs with non zero regression coefficients are selected. In addition to the commonly-used double exponential (Laplace) prior, we also implement the normal exponential gamma prior distribution. We show that use of the Laplace prior improves SNP selection in comparison with single -SNP tests, and that the normal exponential gamma prior leads to a further improvement. Our method is fast and can handle very large numbers of SNPs: we demonstrate its performance using both simulated and real genome-wide data sets with 500 K SNPs, which can be analysed in 2 hours on a desktop workstation. Mickael Guedj 1,2 , Jerome Wojcik 2 , Gregory Nuel 154 Laboratoire Statistique et Génome, Université d'Evry, Evry France 55 Serono Pharmaceutical Research Institute, Plan-les-Ouates, Switzerland Keywords: Local Replication, Local Score, Association In gene-mapping, replication of initial findings has been put forwards as the approach of choice for filtering false-positives from true signals for underlying loci. In practice, such replications are however too poorly observed. Besides the statistical and technical-related factors (lack of power, multiple-testing, stratification, quality control,) inconsistent conclusions obtained from independent populations might result from real biological differences. In particular, the high degree of variation in the strength of LD among populations of different origins is a major challenge to the discovery of genes. Seeking for Local Replications (defined as the presence of a signal of association in a same genomic region among populations) instead of strict replications (same locus, same risk allele) may lead to more reliable results. Recently, a multi-markers approach based on the Local Score statistic has been proposed as a simple and efficient way to select candidate genomic regions at the first stage of genome-wide association studies. Here we propose an extension of this approach adapted to replicated association studies. Based on simulations, this method appears promising. In particular it outperforms classical simple-marker strategies to detect modest-effect genes. Additionally it constitutes, to our knowledge, a first framework dedicated to the detection of such Local Replications. Juliet Chapman 1 , Claudio Verzilli 1 , John Whittaker 156 Department of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, UK Keywords: FDR, Association studies, Bayesian model selection As genomewide association studies become commonplace there is debate as to how such studies might be analysed and what we might hope to gain from the data. It is clear that standard single locus approaches are limited in that they do not adjust for the effects of other loci and problematic since it is not obvious how to adjust for multiple comparisons. False discovery rates have been suggested, but it is unclear how well these will cope with highly correlated genetic data. We consider the validity of standard false discovery rates in large scale association studies. We also show that a Bayesian procedure has advantages in detecting causal loci amongst a large number of dependant SNPs and investigate properties of a Bayesian FDR. Peter Kraft 157 Harvard School of Public Health, Boston USA Keywords: Gene-environment interaction, genome-wide association scans Appropriately analyzed two-stage designs,where a subset of available subjects are genotyped on a genome-wide panel of markers at the first stage and then a much smaller subset of the most promising markers are genotyped on the remaining subjects,can have nearly as much power as a single-stage study where all subjects are genotyped on the genome-wide panel yet can be much less expensive. Typically, the "most promising" markers are selected based on evidence for a marginal association between genotypes and disease. Subsequently, the few markers found to be associated with disease at the end of the second stage are interrogated for evidence of gene-environment interaction, mainly to understand their impact on disease etiology and public health impact. However, this approach may miss variants which have a sizeable effect restricted to one exposure stratum and therefore only a modest marginal effect. We have proposed to use information on the joint effects of genes and a discrete list of environmental exposures at the initial screening stage to select promising markers for the second stage [Kraft et al Hum Hered 2007]. This approach optimizes power to detect variants that have a sizeable marginal effect and variants that have a small marginal effect but a sizeable effect in a stratum defined by an environmental exposure. As an example, I discuss a proposed genome-wide association scan for Type II diabetes susceptibility variants based in several large nested case-control studies. Beate Glaser 1 , Peter Holmans 158 Biostatistics and Bioinformatics Unit, Cardiff University, School of Medicine, Heath Park, Cardiff, UK Keywords: Combined case-control and trios analysis, Power, False-positive rate, Simulation, Association studies The statistical power of genetic association studies can be enhanced by combining the analysis of case-control with parent-offspring trio samples. Various combined analysis techniques have been recently developed; as yet, there have been no comparisons of their power. This work was performed with the aim of identifying the most powerful method among available combined techniques including test statistics developed by Kazeem and Farrall (2005), Nagelkerke and colleagues (2004) and Dudbridge (2006), as well as a simple combination of ,2-statistics from single samples. Simulation studies were performed to investigate their power under different additive, multiplicative, dominant and recessive disease models. False-positive rates were determined by studying the type I error rates under null models including models with unequal allele frequencies between the single case-control and trios samples. We identified three techniques with equivalent power and false-positive rates, which included modifications of the three main approaches: 1) the unmodified combined Odds ratio estimate by Kazeem & Farrall (2005), 2) a modified approach of the combined risk ratio estimate by Nagelkerke & colleagues (2004) and 3) a modified technique for a combined risk ratio estimate by Dudbridge (2006). Our work highlights the importance of studies investigating test performance criteria of novel methods, as they will help users to select the optimal approach within a range of available analysis techniques. David Almorza 1 , M.V. Kandus 2 , Juan Carlos Salerno 2 , Rafael Boggio 359 Facultad de Ciencias del Trabajo, University of Cádiz, Spain 60 Instituto de Genética IGEAF, Buenos Aires, Argentina 61 Universidad Nacional de La Plata, Buenos Aires, Argentina Keywords: Principal component analysis, maize, ear weight, inbred lines The objective of this work was to evaluate the relationship among different traits of the ear of maize inbred lines and to group genotypes according to its performance. Ten inbred lines developed at IGEAF (INTA Castelar) and five public inbred lines as checks were used. A field trial was carried out in Castelar, Buenos Aires (34° 36' S , 58° 39' W) using a complete randomize design with three replications. At harvest, individual weight (P.E.), diameter (D.E.), row number (N.H.) and length (L.E.) of the ear were assessed. A principal component analysis, PCA, (Infostat 2005) was used, and the variability of the data was depicted with a biplot. Principal components 1 and 2 (CP1 and CP2) explained 90% of the data variability. CP1 was correlated with P.E., L.E. and D.E., meanwhile CP2 was correlated with N.H. We found that individual weight (P.E.) was more correlated with diameter of the ear (D.E.) than with length (L.E). Five groups of inbred lines were distinguished: with high P.E. and mean N.H. (04-70, 04-73, 04-101 and MO17), with high P.E. but less N.H. (04-61 and B14), with mean P.E. and N.H. (B73, 04-123 and 04-96), with high N.H. but less P.E. (LP109, 04-8, 04-91 and 04-76) and with low P.E. and low N.H. (LP521 and 04-104). The use of PCA showed which variables had more incidence in ear weight and how is the correlation among them. Moreover, the different groups found with this analysis allow the evaluation of inbred lines by several traits simultaneously. Sven Knüppel 1 , Anja Bauerfeind 1 , Klaus Rohde 162 Department of Bioinformatics, MDC Berlin, Germany Keywords: Haplotypes, association studies, case-control, nuclear families The area of gene chip technology provides a plethora of phase-unknown SNP genotypes in order to find significant association to some genetic trait. To circumvent possibly low information content of a single SNP one groups successive SNPs and estimates haplotypes. Haplotype estimation, however, may reveal ambiguous haplotype pairs and bias the application of statistical methods. Zaykin et al. (Hum Hered, 53:79-91, 2002) proposed the construction of a design matrix to take this ambiguity into account. Here we present a set of functions written for the Statistical package R, which carries out haplotype estimation on the basis of the EM-algorithm for individuals (case-control) or nuclear families. The construction of a design matrix on basis of estimated haplotypes or haplotype pairs allows application of standard methods for association studies (linear, logistic regression), as well as statistical methods as haplotype sharing statistics and TDT-Test. Applications of these methods to genome-wide association screens will be demonstrated. Manuela Zucknick 1 , Chris Holmes 2 , Sylvia Richardson 163 Department of Epidemiology and Public Health, Imperial College London, UK 64 Department of Statistics, Oxford Center for Gene Function, University of Oxford, UK Keywords: Bayesian, variable selection, MCMC, large p, small n, structured dependence In large-scale genomic applications vast numbers of markers or genes are scanned to find a few candidates which are linked to a particular phenotype. Statistically, this is a variable selection problem in the "large p, small n" situation where many more variables than samples are available. An additional feature is the complex dependence structure which is often observed among the markers/genes due to linkage disequilibrium or their joint involvement in biological processes. Bayesian variable selection methods using indicator variables are well suited to the problem. Binary phenotypes like disease status are common and both Bayesian probit and logistic regression can be applied in this context. We argue that logistic regression models are both easier to tune and to interpret than probit models and implement the approach by Holmes & Held (2006). Because the model space is vast, MCMC methods are used as stochastic search algorithms with the aim to quickly find regions of high posterior probability. In a trade-off between fast-updating but slow-moving single-gene Metropolis-Hastings samplers and computationally expensive full Gibbs sampling, we propose to employ the dependence structure among the genes/markers to help decide which variables to update together. Also, parallel tempering methods are used to aid bold moves and help avoid getting trapped in local optima. Mixing and convergence of the resulting Markov chains are evaluated and compared to standard samplers in both a simulation study and in an application to a gene expression data set. Reference Holmes, C. C. & Held, L. (2006) Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis1, 145,168. Dawn Teare 165 MMGE, University of Sheffield, UK Keywords: CNP, family-based analysis, MCMC Evidence is accumulating that segmental copy number polymorphisms (CNPs) may represent a significant portion of human genetic variation. These highly polymorphic systems require handling as phenotypes rather than co-dominant markers, placing new demands on family-based analyses. We present an integrated approach to meet these challenges in the form of a graphical model, where the underlying discrete CNP phenotype is inferred from the (single or replicate) quantitative measure within the analysis, whilst assuming an allele based system segregating through the pedigree. [source] Commentary on Chen et al. (2009): Gene-environment interactions in nicotine dependenceADDICTION, Issue 10 2009JAAKKO KAPRIO No abstract is available for this article. [source] European Mathematical Genetics Meeting, Heidelberg, Germany, 12th,13th April 2007ANNALS OF HUMAN GENETICS, Issue 4 2007Article first published online: 28 MAY 200 Saurabh Ghosh 11 Indian Statistical Institute, Kolkata, India High correlations between two quantitative traits may be either due to common genetic factors or common environmental factors or a combination of both. In this study, we develop statistical methods to extract the contribution of a common QTL to the total correlation between the components of a bivariate phenotype. Using data on bivariate phenotypes and marker genotypes for sib-pairs, we propose a test for linkage between a common QTL and a marker locus based on the conditional cross-sib trait correlations (trait 1 of sib 1 , trait 2 of sib 2 and conversely) given the identity-by-descent sharing at the marker locus. The null hypothesis cannot be rejected unless there exists a common QTL. We use Monte-Carlo simulations to evaluate the performance of the proposed test under different trait parameters and quantitative trait distributions. An application of the method is illustrated using data on two alcohol-related phenotypes from the Collaborative Study On The Genetics Of Alcoholism project. Rémi Kazma 1 , Catherine Bonaďti-Pellié 1 , Emmanuelle Génin 12 INSERM UMR-S535 and Université Paris Sud, Villejuif, 94817, France Keywords: Gene-environment interaction, sibling recurrence risk, exposure correlation Gene-environment interactions may play important roles in complex disease susceptibility but their detection is often difficult. Here we show how gene-environment interactions can be detected by investigating the degree of familial aggregation according to the exposure of the probands. In case of gene-environment interaction, the distribution of genotypes of affected individuals, and consequently the risk in relatives, depends on their exposure. We developed a test comparing the risks in sibs according to the proband exposure. To evaluate the properties of this new test, we derived the formulas for calculating the expected risks in sibs according to the exposure of probands for various values of exposure frequency, relative risk due to exposure alone, frequencies of latent susceptibility genotypes, genetic relative risks and interaction coefficients. We find that the ratio of risks when the proband is exposed and not exposed is a good indicator of the interaction effect. We evaluate the power of the test for various sample sizes of affected individuals. We conclude that this test is valuable for diseases with moderate familial aggregation, only when the role of the exposure has been clearly evidenced. Since a correlation for exposure among sibs might lead to a difference in risks among sibs in the different proband exposure strata, we also add an exposure correlation coefficient in the model. Interestingly, we find that when this correlation is correctly accounted for, the power of the test is not decreased and might even be significantly increased. Andrea Callegaro 1 , Hans J.C. Van Houwelingen 1 , Jeanine Houwing-Duistermaat 13 Dept. of Medical Statistics and Bioinformatics, Leiden University Medical Center, The Netherlands Keywords: Survival analysis, age at onset, score test, linkage analysis Non parametric linkage (NPL) analysis compares the identical by descent (IBD) sharing in sibling pairs to the expected IBD sharing under the hypothesis of no linkage. Often information is available on the marginal cumulative hazards (for example breast cancer incidence curves). Our aim is to extend the NPL methods by taking into account the age at onset of selected sibling pairs using these known marginal hazards. Li and Zhong (2002) proposed a (retrospective) likelihood ratio test based on an additive frailty model for genetic linkage analysis. From their model we derive a score statistic for selected samples which turns out to be a weighed NPL method. The weights depend on the marginal cumulative hazards and on the frailty parameter. A second approach is based on a simple gamma shared frailty model. Here, we simply test whether the score function of the frailty parameter depends on the excess IBD. We compare the performance of these methods using simulated data. Céline Bellenguez 1 , Carole Ober 2 , Catherine Bourgain 14 INSERM U535 and University Paris Sud, Villejuif, France 5 Department of Human Genetics, The University of Chicago, USA Keywords: Linkage analysis, linkage disequilibrium, high density SNP data Compared with microsatellite markers, high density SNP maps should be more informative for linkage analyses. However, because they are much closer, SNPs present important linkage disequilibrium (LD), which biases classical nonparametric multipoint analyses. This problem is even stronger in population isolates where LD extends over larger regions with a more stochastic pattern. We investigate the issue of linkage analysis with a 500K SNP map in a large and inbred 1840-member Hutterite pedigree, phenotyped for asthma. Using an efficient pedigree breaking strategy, we first identified linked regions with a 5cM microsatellite map, on which we focused to evaluate the SNP map. The only method that models LD in the NPL analysis is limited in both the pedigree size and the number of markers (Abecasis and Wigginton, 2005) and therefore could not be used. Instead, we studied methods that identify sets of SNPs with maximum linkage information content in our pedigree and no LD-driven bias. Both algorithms that directly remove pairs of SNPs in high LD and clustering methods were evaluated. Null simulations were performed to control that Zlr calculated with the SNP sets were not falsely inflated. Preliminary results suggest that although LD is strong in such populations, linkage information content slightly better than that of microsatellite maps can be extracted from dense SNP maps, provided that a careful marker selection is conducted. In particular, we show that the specific LD pattern requires considering LD between a wide range of marker pairs rather than only in predefined blocks. Peter Van Loo 1,2,3 , Stein Aerts 1,2 , Diether Lambrechts 4,5 , Bernard Thienpont 2 , Sunit Maity 4,5 , Bert Coessens 3 , Frederik De Smet 4,5 , Leon-Charles Tranchevent 3 , Bart De Moor 2 , Koen Devriendt 3 , Peter Marynen 1,2 , Bassem Hassan 1,2 , Peter Carmeliet 4,5 , Yves Moreau 36 Department of Molecular and Developmental Genetics, VIB, Belgium 7 Department of Human Genetics, University of Leuven, Belgium 8 Bioinformatics group, Department of Electrical Engineering, University of Leuven, Belgium 9 Department of Transgene Technology and Gene Therapy, VIB, Belgium 10 Center for Transgene Technology and Gene Therapy, University of Leuven, Belgium Keywords: Bioinformatics, gene prioritization, data fusion The identification of genes involved in health and disease remains a formidable challenge. Here, we describe a novel bioinformatics method to prioritize candidate genes underlying pathways or diseases, based on their similarity to genes known to be involved in these processes. It is freely accessible as an interactive software tool, ENDEAVOUR, at http://www.esat.kuleuven.be/endeavour. Unlike previous methods, ENDEAVOUR generates distinct prioritizations from multiple heterogeneous data sources, which are then integrated, or fused, into one global ranking using order statistics. ENDEAVOUR prioritizes candidate genes in a three-step process. First, information about a disease or pathway is gathered from a set of known "training" genes by consulting multiple data sources. Next, the candidate genes are ranked based on similarity with the training properties obtained in the first step, resulting in one prioritized list for each data source. Finally, ENDEAVOUR fuses each of these rankings into a single global ranking, providing an overall prioritization of the candidate genes. Validation of ENDEAVOUR revealed it was able to efficiently prioritize 627 genes in disease data sets and 76 genes in biological pathway sets, identify candidates of 16 mono- or polygenic diseases, and discover regulatory genes of myeloid differentiation. Furthermore, the approach identified YPEL1 as a novel gene involved in craniofacial development from a 2-Mb chromosomal region, deleted in some patients with DiGeorge-like birth defects. Finally, we are currently evaluating a pipeline combining array-CGH, ENDEAVOUR and in vivo validation in zebrafish to identify novel genes involved in congenital heart defects. Mark Broom 1 , Graeme Ruxton 2 , Rebecca Kilner 311 Mathematics Dept., University of Sussex, UK 12 Division of Environmental and Evolutionary Biology, University of Glasgow, UK 13 Department of Zoology, University of Cambridge, UK Keywords: Evolutionarily stable strategy, parasitism, asymmetric game Brood parasites chicks vary in the harm that they do to their companions in the nest. In this presentation we use game-theoretic methods to model this variation. Our model considers hosts which potentially abandon single nestlings and instead choose to re-allocate their reproductive effort to future breeding, irrespective of whether the abandoned chick is the host's young or a brood parasite's. The parasite chick must decide whether or not to kill host young by balancing the benefits from reduced competition in the nest against the risk of desertion by host parents. The model predicts that three different types of evolutionarily stable strategies can exist. (1) Hosts routinely rear depleted broods, the brood parasite always kills host young and the host never then abandons the nest. (2) When adult survival after deserting single offspring is very high, hosts always abandon broods of a single nestling and the parasite never kills host offspring, effectively holding them as hostages to prevent nest desertion. (3) Intermediate strategies, in which parasites sometimes kill their nest-mates and host parents sometimes desert nests that contain only a single chick, can also be evolutionarily stable. We provide quantitative descriptions of how the values given to ecological and behavioral parameters of the host-parasite system influence the likelihood of each strategy and compare our results with real host-brood parasite associations in nature. Martin Harrison 114 Mathematics Dept, University of Sussex, UK Keywords: Brood parasitism, games, host, parasite The interaction between hosts and parasites in bird populations has been studied extensively. Game theoretical methods have been used to model this interaction previously, but this has not been studied extensively taking into account the sequential nature of this game. We consider a model allowing the host and parasite to make a number of decisions, which depend on a number of natural factors. The host lays an egg, a parasite bird will arrive at the nest with a certain probability and then chooses to destroy a number of the host eggs and lay one of it's own. With some destruction occurring, either natural or through the actions of the parasite, the host chooses to continue, eject an egg (hoping to eject the parasite) or abandon the nest. Once the eggs have hatched the game then falls to the parasite chick versus the host. The chick chooses to destroy or eject a number of eggs. The final decision is made by the host, choosing whether to raise or abandon the chicks that are in the nest. We consider various natural parameters and probabilities which influence these decisions. We then use this model to look at real-world situations of the interactions of the Reed Warbler and two different parasites, the Common Cuckoo and the Brown-Headed Cowbird. These two parasites have different methods in the way that they parasitize the nests of their hosts. The hosts in turn have a different reaction to these parasites. Arne Jochens 1 , Amke Caliebe 2 , Uwe Roesler 1 , Michael Krawczak 215 Mathematical Seminar, University of Kiel, Germany 16 Institute of Medical Informatics and Statistics, University of Kiel, Germany Keywords: Stepwise mutation model, microsatellite, recursion equation, temporal behaviour We consider the stepwise mutation model which occurs, e.g., in microsatellite loci. Let X(t,i) denote the allelic state of individual i at time t. We compute expectation, variance and covariance of X(t,i), i=1,,,N, and provide a recursion equation for P(X(t,i)=z). Because the variance of X(t,i) goes to infinity as t grows, for the description of the temporal behaviour, we regard the scaled process X(t,i)-X(t,1). The results furnish a better understanding of the behaviour of the stepwise mutation model and may in future be used to derive tests for neutrality under this model. Paul O'Reilly 1 , Ewan Birney 2 , David Balding 117 Statistical Genetics, Department of Epidemiology and Public Health, Imperial, College London, UK 18 European Bioinformatics Institute, EMBL, Cambridge, UK Keywords: Positive selection, Recombination rate, LD, Genome-wide, Natural Selection In recent years efforts to develop population genetics methods that estimate rates of recombination and levels of natural selection in the human genome have intensified. However, since the two processes have an intimately related impact on genetic variation their inference is vulnerable to confounding. Genomic regions subject to recent selection are likely to have a relatively recent common ancestor and consequently less opportunity for historical recombinations that are detectable in contemporary populations. Here we show that selection can reduce the population-based recombination rate estimate substantially. In genome-wide studies for detecting selection we observe a tendency to highlight loci that are subject to low levels of recombination. We find that the outlier approach commonly adopted in such studies may have low power unless variable recombination is accounted for. We introduce a new genome-wide method for detecting selection that exploits the sensitivity to recent selection of methods for estimating recombination rates, while accounting for variable recombination using pedigree data. Through simulations we demonstrate the high power of the Ped/Pop approach to discriminate between neutral and adaptive evolution, particularly in the context of choosing outliers from a genome-wide distribution. Although methods have been developed showing good power to detect selection ,in action', the corresponding window of opportunity is small. In contrast, the power of the Ped/Pop method is maintained for many generations after the fixation of an advantageous variant Sarah Griffiths 1 , Frank Dudbridge 120 MRC Biostatistics Unit, Cambridge, UK Keywords: Genetic association, multimarker tag, haplotype, likelihood analysis In association studies it is generally too expensive to genotype all variants in all subjects. We can exploit linkage disequilibrium between SNPs to select a subset that captures the variation in a training data set obtained either through direct resequencing or a public resource such as the HapMap. These ,tag SNPs' are then genotyped in the whole sample. Multimarker tagging is a more aggressive adaptation of pairwise tagging that allows for combinations of two or more tag SNPs to predict an untyped SNP. Here we describe a new method for directly testing the association of an untyped SNP using a multimarker tag. Previously, other investigators have suggested testing a specific tag haplotype, or performing a weighted analysis using weights derived from the training data. However these approaches do not properly account for the imperfect correlation between the tag haplotype and the untyped SNP. Here we describe a straightforward approach to testing untyped SNPs using a missing-data likelihood analysis, including the tag markers as nuisance parameters. The training data is stacked on top of the main body of genotype data so there is information on how the tag markers predict the genotype of the untyped SNP. The uncertainty in this prediction is automatically taken into account in the likelihood analysis. This approach yields more power and also a more accurate prediction of the odds ratio of the untyped SNP. Anke Schulz 1 , Christine Fischer 2 , Jenny Chang-Claude 1 , Lars Beckmann 121 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany 22 Institute of Human Genetics, University of Heidelberg, Germany Keywords: Haplotype, haplotype sharing, entropy, Mantel statistics, marker selection We previously introduced a new method to map genes involved in complex diseases, using haplotype sharing-based Mantel statistics to correlate genetic and phenotypic similarity. Although the Mantel statistic is powerful in narrowing down candidate regions, the precise localization of a gene is hampered in genomic regions where linkage disequilibrium is so high that neighboring markers are found to be significant at similar magnitude and we are not able to discriminate between them. Here, we present a new approach to localize susceptibility genes by combining haplotype sharing-based Mantel statistics with an iterative entropy-based marker selection algorithm. For each marker at which the Mantel statistic is evaluated, the algorithm selects a subset of surrounding markers. The subset is chosen to maximize multilocus linkage disequilibrium, which is measured by the normalized entropy difference introduced by Nothnagel et al. (2002). We evaluated the algorithm with respect to type I error and power. Its ability to localize the disease variant was compared to the localization (i) without marker selection and (ii) considering haplotype block structure. Case-control samples were simulated from a set of 18 haplotypes, consisting of 15 SNPs in two haplotype blocks. The new algorithm gave correct type I error and yielded similar power to detect the disease locus compared to the alternative approaches. The neighboring markers were clearly less often significant than the causal locus, and also less often significant compared to the alternative approaches. Thus the new algorithm improved the precision of the localization of susceptibility genes. Mark M. Iles 123 Section of Epidemiology and Biostatistics, LIMM, University of Leeds, UK Keywords: tSNP, tagging, association, HapMap Tagging SNPs (tSNPs) are commonly used to capture genetic diversity cost-effectively. However, it is important that the efficacy of tSNPs is correctly estimated, otherwise coverage may be insufficient. If the pilot sample from which tSNPs are chosen is too small or the initial marker map too sparse, tSNP efficacy may be overestimated. An existing estimation method based on bootstrapping goes some way to correct for insufficient sample size and overfitting, but does not completely solve the problem. We describe a novel method, based on exclusion of haplotypes, that improves on the bootstrap approach. Using simulated data, the extent of the sample size problem is investigated and the performance of the bootstrap and the novel method are compared. We incorporate an existing method adjusting for marker density by ,SNP-dropping'. We find that insufficient sample size can cause large overestimates in tSNP efficacy, even with as many as 100 individuals, and the problem worsens as the region studied increases in size. Both the bootstrap and novel method correct much of this overestimate, with our novel method consistently outperforming the bootstrap method. We conclude that a combination of insufficient sample size and overfitting may lead to overestimation of tSNP efficacy and underpowering of studies based on tSNPs. Our novel approach corrects for much of this bias and is superior to the previous method. Sample sizes larger than previously suggested may still be required for accurate estimation of tSNP efficacy. This has obvious ramifications for the selection of tSNPs from HapMap data. Claudio Verzilli 1 , Juliet Chapman 1 , Aroon Hingorani 2 , Juan Pablo-Casas 1 , Tina Shah 2 , Liam Smeeth 1 , John Whittaker 124 Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, UK 25 Division of Medicine, University College London, UK Keywords: Meta-analysis, Genetic association studies We present a Bayesian hierarchical model for the meta-analysis of candidate gene studies with a continuous outcome. Such studies often report results from association tests for different, possibly study-specific and non-overlapping markers (typically SNPs) in the same genetic region. Meta analyses of the results at each marker in isolation are seldom appropriate as they ignore the correlation that may exist between markers due to linkage disequlibrium (LD) and cannot assess the relative importance of variants at each marker. Also such marker-wise meta analyses are restricted to only those studies that have typed the marker in question, with a potential loss of power. A better strategy is one which incorporates information about the LD between markers so that any combined estimate of the effect of each variant is corrected for the effect of other variants, as in multiple regression. Here we develop a Bayesian hierarchical linear regression that models the observed genotype group means and uses pairwise LD measurements between markers as prior information to make posterior inference on adjusted effects. The approach is applied to the meta analysis of 24 studies assessing the effect of 7 variants in the C-reactive protein (CRP) gene region on plasma CRP levels, an inflammatory biomarker shown in observational studies to be positively associated with cardiovascular disease. Cathryn M. Lewis 1 , Christopher G. Mathew 1 , Theresa M. Marteau 226 Dept. of Medical and Molecular Genetics, King's College London, UK 27 Department of Psychology, King's College London, UK Keywords: Risk, genetics, CARD15, smoking, model Recently progress has been made in identifying mutations that confer susceptibility to complex diseases, with the potential to use these mutations in determining disease risk. We developed methods to estimate disease risk based on genotype relative risks (for a gene G), exposure to an environmental factor (E), and family history (with recurrence risk ,R for a relative of type R). ,R must be partitioned into the risk due to G (which is modelled independently) and the residual risk. The risk model was then applied to Crohn's disease (CD), a severe gastrointestinal disease for which smoking increases disease risk approximately 2-fold, and mutations in CARD15 confer increased risks of 2.25 (for carriers of a single mutation) and 9.3 (for carriers of two mutations). CARD15 accounts for only a small proportion of the genetic component of CD, with a gene-specific ,S, CARD15 of 1.16, from a total sibling relative risk of ,S= 27. CD risks were estimated for high-risk individuals who are siblings of a CD case, and who also smoke. The CD risk to such individuals who carry two CARD15 mutations is approximately 0.34, and for those carrying a single CARD15 mutation the risk is 0.08, compared to a population prevalence of approximately 0.001. These results imply that complex disease genes may be valuable in estimating with greater precision than has hitherto been possible disease risks in specific, easily identified subgroups of the population with a view to prevention. Yurii Aulchenko 128 Department of Epidemiology & Biostatistics, Erasmus Medical Centre Rotterdam, The Netherlands Keywords: Compression, information, bzip2, genome-wide SNP data, statistical genetics With advances in molecular technology, studies accessing millions of genetic polymorphisms in thousands of study subjects will soon become common. Such studies generate large amounts of data, whose effective storage and management is a challenge to the modern statistical genetics. Standard file compression utilities, such as Zip, Gzip and Bzip2, may be helpful to minimise the storage requirements. Less obvious is the fact that the data compression techniques may be also used in the analysis of genetic data. It is known that the efficiency of a particular compression algorithm depends on the probability structure of the data. In this work, we compared different standard and customised tools using the data from human HapMap project. Secondly, we investigate the potential uses of data compression techniques for the analysis of linkage, association and linkage disequilibrium Suzanne Leal 1 , Bingshan Li 129 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, USA Keywords: Consanguineous pedigrees, missing genotype data Missing genotype data can increase false-positive evidence for linkage when either parametric or nonparametric analysis is carried out ignoring intermarker linkage disequilibrium (LD). Previously it was demonstrated by Huang et al (2005) that no bias occurs in this situation for affected sib-pairs with unrelated parents when either both parents are genotyped or genotype data is available for two additional unaffected siblings when parental genotypes are missing. However, this is not the case for consanguineous pedigrees, where missing genotype data for any pedigree member within a consanguinity loop can increase false-positive evidence of linkage. The false-positive evidence for linkage is further increased when cryptic consanguinity is present. The amount of false-positive evidence for linkage is highly dependent on which family members are genotyped. When parental genotype data is available, the false-positive evidence for linkage is usually not as strong as when parental genotype data is unavailable. Which family members will aid in the reduction of false-positive evidence of linkage is highly dependent on which other family members are genotyped. For a pedigree with an affected proband whose first-cousin parents have been genotyped, further reduction in the false-positive evidence of linkage can be obtained by including genotype data from additional affected siblings of the proband or genotype data from the proband's sibling-grandparents. When parental genotypes are not available, false-positive evidence for linkage can be reduced by including in the analysis genotype data from either unaffected siblings of the proband or the proband's married-in-grandparents. Najaf Amin 1 , Yurii Aulchenko 130 Department of Epidemiology & Biostatistics, Erasmus Medical Centre Rotterdam, The Netherlands Keywords: Genomic Control, pedigree structure, quantitative traits The Genomic Control (GC) method was originally developed to control for population stratification and cryptic relatedness in association studies. This method assumes that the effect of population substructure on the test statistics is essentially constant across the genome, and therefore unassociated markers can be used to estimate the effect of confounding onto the test statistic. The properties of GC method were extensively investigated for different stratification scenarios, and compared to alternative methods, such as the transmission-disequilibrium test. The potential of this method to correct not for occasional cryptic relations, but for regular pedigree structure, however, was not investigated before. In this work we investigate the potential of the GC method for pedigree-based association analysis of quantitative traits. The power and type one error of the method was compared to standard methods, such as the measured genotype (MG) approach and quantitative trait transmission-disequilibrium test. In human pedigrees, with trait heritability varying from 30 to 80%, the power of MG and GC approach was always higher than that of TDT. GC had correct type 1 error and its power was close to that of MG under moderate heritability (30%), but decreased with higher heritability. William Astle 1 , Chris Holmes 2 , David Balding 131 Department of Epidemiology and Public Health, Imperial College London, UK 32 Department of Statistics, University of Oxford, UK Keywords: Population structure, association studies, genetic epidemiology, statistical genetics In the analysis of population association studies, Genomic Control (Devlin & Roeder, 1999) (GC) adjusts the Armitage test statistic to correct the type I error for the effects of population substructure, but its power is often sub-optimal. Turbo Genomic Control (TGC) generalises GC to incorporate co-variation of relatedness and phenotype, retaining control over type I error while improving power. TGC is similar to the method of Yu et al. (2006), but we extend it to binary (case-control) in addition to quantitative phenotypes, we implement improved estimation of relatedness coefficients, and we derive an explicit statistic that generalizes the Armitage test statistic and is fast to compute. TGC also has similarities to EIGENSTRAT (Price et al., 2006) which is a new method based on principle components analysis. The problems of population structure(Clayton et al., 2005) and cryptic relatedness (Voight & Pritchard, 2005) are essentially the same: if patterns of shared ancestry differ between cases and controls, whether distant (coancestry) or recent (cryptic relatedness), false positives can arise and power can be diminished. With large numbers of widely-spaced genetic markers, coancestry can now be measured accurately for each pair of individuals via patterns of allele-sharing. Instead of modelling subpopulations, we work instead with a coancestry coefficient for each pair of individuals in the study. We explain the relationships between TGC, GC and EIGENSTRAT. We present simulation studies and real data analyses to illustrate the power advantage of TGC in a range of scenarios incorporating both substructure and cryptic relatedness. References Clayton, D. G.et al. (2005) Population structure, differential bias and genomic control in a large-scale case-control association study. Nature Genetics37(11) November 2005. Devlin, B. & Roeder, K. (1999) Genomic control for association studies. Biometics55(4) December 1999. Price, A. L.et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics38(8) (August 2006). Voight, B. J. & Pritchard, J. K. (2005) Confounding from cryptic relatedness in case-control association studies. Public Library of Science Genetics1(3) September 2005. Yu, J.et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics38(2) February 2006. Hervé Perdry 1 , Marie-Claude Babron 1 , Françoise Clerget-Darpoux 133 INSERM U535 and Univ. Paris Sud, UMR-S 535, Villejuif, France Keywords: Modifier genes, case-parents trios, ordered transmission disequilibrium test A modifying locus is a polymorphic locus, distinct from the disease locus, which leads to differences in the disease phenotype, either by modifying the penetrance of the disease allele, or by modifying the expression of the disease. The effect of such a locus is a clinical heterogeneity that can be reflected by the values of an appropriate covariate, such as the age of onset, or the severity of the disease. We designed the Ordered Transmission Disequilibrium Test (OTDT) to test for a relation between the clinical heterogeneity, expressed by the covariate, and marker genotypes of a candidate gene. The method applies to trio families with one affected child and his parents. Each family member is genotyped at a bi-allelic marker M of a candidate gene. To each of the families is associated a covariate value; the families are ordered on the values of this covariate. As the TDT (Spielman et al. 1993), the OTDT is based on the observation of the transmission rate T of a given allele at M. The OTDT aims to find a critical value of the covariate which separates the sample of families in two subsamples in which the transmission rates are significantly different. We investigate the power of the method by simulations under various genetic models and covariate distributions. Acknowledgments H Perdry is funded by ARSEP. Pascal Croiseau 1 , Heather Cordell 2 , Emmanuelle Génin 134 INSERM U535 and University Paris Sud, UMR-S535, Villejuif, France 35 Institute of Human Genetics, Newcastle University, UK Keywords: Association, missing data, conditionnal logistic regression Missing data is an important problem in association studies. Several methods used to test for association need that individuals be genotyped at the full set of markers. Individuals with missing data need to be excluded from the analysis. This could involve an important decrease in sample size and a loss of information. If the disease susceptibility locus (DSL) is poorly typed, it is also possible that a marker in linkage disequilibrium gives a stronger association signal than the DSL. One may then falsely conclude that the marker is more likely to be the DSL. We recently developed a Multiple Imputation method to infer missing data on case-parent trios Starting from the observed data, a few number of complete data sets are generated by Markov-Chain Monte Carlo approach. These complete datasets are analysed using standard statistical package and the results are combined as described in Little & Rubin (2002). Here we report the results of simulations performed to examine, for different patterns of missing data, how often the true DSL gives the highest association score among different loci in LD. We found that multiple imputation usually correctly detect the DSL site even if the percentage of missing data is high. This is not the case for the naďve approach that consists in discarding trios with missing data. In conclusion, Multiple imputation presents the advantage of being easy to use and flexible and is therefore a promising tool in the search for DSL involved in complex diseases. Salma Kotti 1 , Heike Bickeböller 2 , Françoise Clerget-Darpoux 136 University Paris Sud, UMR-S535, Villejuif, France 37 Department of Genetic Epidemiology, Medical School, University of Göttingen, Germany Keywords: Genotype relative risk, internal controls, Family based analyses Family based analyses using internal controls are very popular both for detecting the effect of a genetic factor and for estimating the relative disease risk on the corresponding genotypes. Two different procedures are often applied to reconstitute internal controls. The first one considers one pseudocontrol genotype formed by the parental non-transmitted alleles called also 1:1 matching of alleles, while the second corresponds to three pseudocontrols corresponding to all genotypes formed by the parental alleles except the one of the case (1:3 matching). Many studies have compared between the two procedures in terms of the power and have concluded that the difference depends on the underlying genetic model and the allele frequencies. However, the estimation of the Genotype Relative Risk (GRR) under the two procedures has not been studied. Based on the fact that on the 1:1 matching, the control group is composed of the alleles untransmitted to the affected child and on the 1:3 matching, the control group is composed amongst alleles already transmitted to the affected child, we expect a difference on the GRR estimation. In fact, we suspect that the second procedure leads to biased estimation of the GRRs. We will analytically derive the GRR estimators for the 1:1 and 1:3 matching and will present the results at the meeting. Family based analyses using internal controls are very popular both for detecting the effect of a genetic factor and for estimating the relative disease risk on the corresponding genotypes. Two different procedures are often applied to reconstitute internal controls. The first one considers one pseudocontrol genotype formed by the parental non-transmitted alleles called also 1:1 matching of alleles, while the second corresponds to three pseudocontrols corresponding to all genotypes formed by the parental alleles except the one of the case (1:3 matching). Many studies have compared between the two procedures in terms of the power and have concluded that the difference depends on the underlying genetic model and the allele frequencies. However, the estimation of the Genotype Relative Risk (GRR) under the two procedures has not been studied. Based on the fact that on the 1:1 matching, the control group is composed of the alleles untransmitted to the affected child and on the 1:3 matching, the control group is composed amongst alleles already transmitted to the affected child, we expect a difference on the GRR estimation. In fact, we suspect that the second procedure leads to biased estimation of the GRR. We will analytically derive the GRR estimator for the 1:1 and 1:3 matching and will present the results at the meeting. Luigi Palla 1 , David Siegmund 239 Department of Mathematics,Free University Amsterdam, The Netherlands 40 Department of Statistics, Stanford University, California, USA Keywords: TDT, assortative mating, inbreeding, statistical power A substantial amount of Assortative Mating (AM) is often recorded on physical and psychological, dichotomous as well as quantitative traits that are supposed to have a multifactorial genetic component. In particular AM has the effect of increasing the genetic variance, even more than inbreeding because it acts across loci beside within loci, when the trait has a multifactorial origin. Under the assumption of a polygenic model for AM dating back to Wright (1921) and refined by Crow and Felsenstein (1968,1982), the effect of assortative mating on the power to detect genetic association in the Transmission Disequilibrium Test (TDT) is explored as parameters, such as the effective number of genes and the allelic frequency vary. The power is reflected by the non centrality parameter of the TDT and is expressed as a function of the number of trios, the relative risk of the heterozygous genotype and the allele frequency (Siegmund and Yakir, 2007). The noncentrality parameter of the relevant score statistic is updated considering the effect of AM which is expressed in terms of an ,effective' inbreeding coefficient. In particular, for dichotomous traits it is apparent that the higher the number of genes involved in the trait, the lower the loss in power due to AM. Finally an attempt is made to extend this relation to the Q-TDT (Rabinowitz, 1997), which involves considering the effect of AM also on the phenotypic variance of the trait of interest, under the assumption that AM affects only its additive genetic component. References Crow, & Felsenstein, (1968). The effect of assortative mating on the genetic composition of a population. Eugen.Quart.15, 87,97. Rabinowitz,, 1997. A Transmission Disequilibrium Test for Quantitative Trait Loci. Human Heredity47, 342,350. Siegmund, & Yakir, (2007) Statistics of gene mapping, Springer. Wright, (1921). System of mating.III. Assortative mating based on somatic resemblance. Genetics6, 144,161. Jérémie Nsengimana 1 , Ben D Brown 2 , Alistair S Hall 2 , Jenny H Barrett 141 Leeds Institute of Molecular Medicine, University of Leeds, UK 42 Leeds Institute for Genetics, Health and Therapeutics, University of Leeds, UK Keywords: Inflammatory genes, haplotype, coronary artery disease Genetic Risk of Acute Coronary Events (GRACE) is an initiative to collect cases of coronary artery disease (CAD) and their unaffected siblings in the UK and to use them to map genetic variants increasing disease risk. The aim of the present study was to test the association between CAD and 51 single nucleotide polymorphisms (SNPs) and their haplotypes from 35 inflammatory genes. Genotype data were available for 1154 persons affected before age 66 (including 48% before age 50) and their 1545 unaffected siblings (891 discordant families). Each SNP was tested for association to CAD, and haplotypes within genes or gene clusters were tested using FBAT (Rabinowitz & Laird, 2000). For the most significant results, genetic effect size was estimated using conditional logistic regression (CLR) within STATA adjusting for other risk factors. Haplotypes were assigned using HAPLORE (Zhang et al., 2005), which considers all parental mating types consistent with offspring genotypes and assigns them a probability of occurence. This probability was used in CLR to weight the haplotypes. In the single SNP analysis, several SNPs showed some evidence for association, including one SNP in the interleukin-1A gene. Analysing haplotypes in the interleukin-1 gene cluster, a common 3-SNP haplotype was found to increase the risk of CAD (P = 0.009). In an additive genetic model adjusting for covariates the odds ratio (OR) for this haplotype is 1.56 (95% CI: 1.16-2.10, p = 0.004) for early-onset CAD (before age 50). This study illustrates the utility of haplotype analysis in family-based association studies to investigate candidate genes. References Rabinowitz, D. & Laird, N. M. (2000) Hum Hered50, 211,223. Zhang, K., Sun, F. & Zhao, H. (2005) Bioinformatics21, 90,103. Andrea Foulkes 1 , Recai Yucel 1 , Xiaohong Li 143 Division of Biostatistics, University of Massachusetts, USA Keywords: Haploytpe, high-dimensional, mixed modeling The explosion of molecular level information coupled with large epidemiological studies presents an exciting opportunity to uncover the genetic underpinnings of complex diseases; however, several analytical challenges remain to be addressed. Characterizing the components to complex diseases inevitably requires consideration of synergies across multiple genetic loci and environmental and demographic factors. In addition, it is critical to capture information on allelic phase, that is whether alleles within a gene are in cis (on the same chromosome) or in trans (on different chromosomes.) In associations studies of unrelated individuals, this alignment of alleles within a chromosomal copy is generally not observed. We address the potential ambiguity in allelic phase in this high dimensional data setting using mixed effects models. Both a semi-parametric and fully likelihood-based approach to estimation are considered to account for missingness in cluster identifiers. In the first case, we apply a multiple imputation procedure coupled with a first stage expectation maximization algorithm for parameter estimation. A bootstrap approach is employed to assess sensitivity to variability induced by parameter estimation. Secondly, a fully likelihood-based approach using an expectation conditional maximization algorithm is described. Notably, these models allow for characterizing high-order gene-gene interactions while providing a flexible statistical framework to account for the confounding or mediating role of person specific covariates. The proposed method is applied to data arising from a cohort of human immunodeficiency virus type-1 (HIV-1) infected individuals at risk for therapy associated dyslipidemia. Simulation studies demonstrate reasonable power and control of family-wise type 1 error rates. Vivien Marquard 1 , Lars Beckmann 1 , Jenny Chang-Claude 144 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany Keywords: Genotyping errors, type I error, haplotype-based association methods It has been shown in several simulation studies that genotyping errors may have a great impact on the type I error of statistical methods used in genetic association analysis of complex diseases. Our aim was to investigate type I error rates in a case-control study, when differential and non-differential genotyping errors were introduced in realistic scenarios. We simulated case-control data sets, where individual genotypes were drawn from a haplotype distribution of 18 haplotypes with 15 markers in the APM1 gene. Genotyping errors were introduced following the unrestricted and symmetric with 0 edges error models described by Heid et al. (2006). In six scenarios, errors resulted from changes of one allele to another with predefined probabilities of 1%, 2.5% or 10%, respectively. A multiple number of errors per haplotype was possible and could vary between 0 and 15, the number of markers investigated. We examined three association methods: Mantel statistics using haplotype-sharing; a haplotype-specific score test; and Armitage trend test for single markers. The type I error rates were not influenced for any of all the three methods for a genotyping error rate of less than 1%. For higher error rates and differential errors, the type I error of the Mantel statistic was only slightly and of the Armitage trend test moderately increased. The type I error rates of the score test were highly increased. The type I error rates were correct for all three methods for non-differential errors. Further investigations will be carried out with different frequencies of differential error rates and focus on power. Arne Neumann 1 , Dörthe Malzahn 1 , Martina Müller 2 , Heike Bickeböller 145 Department of Genetic Epidemiology, Medical School, University of Göttingen, Germany 46 GSF-National Research Center for Environment and Health, Neuherberg & IBE-Institute of Epidemiology, Ludwig-Maximilians University München, Germany Keywords: Interaction, longitudinal, nonparametric Longitudinal data show the time dependent course of phenotypic traits. In this contribution, we consider longitudinal cohort studies and investigate the association between two candidate genes and a dependent quantitative longitudinal phenotype. The set-up defines a factorial design which allows us to test simultaneously for the overall gene effect of the loci as well as for possible gene-gene and gene time interaction. The latter would induce genetically based time-profile differences in the longitudinal phenotype. We adopt a non-parametric statistical test to genetic epidemiological cohort studies and investigate its performance by simulation studies. The statistical test was originally developed for longitudinal clinical studies (Brunner, Munzel, Puri, 1999 J Multivariate Anal 70:286-317). It is non-parametric in the sense that no assumptions are made about the underlying distribution of the quantitative phenotype. Longitudinal observations belonging to the same individual can be arbitrarily dependent on one another for the different time points whereas trait observations of different individuals are independent. The two loci are assumed to be statistically independent. Our simulations show that the nonparametric test is comparable with ANOVA in terms of power of detecting gene-gene and gene-time interaction in an ANOVA favourable setting. Rebecca Hein 1 , Lars Beckmann 1 , Jenny Chang-Claude 147 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany Keywords: Indirect association studies, interaction effects, linkage disequilibrium, marker allele frequency Association studies accounting for gene-environment interactions (GxE) may be useful for detecting genetic effects and identifying important environmental effect modifiers. Current technology facilitates very dense marker spacing in genetic association studies; however, the true disease variant(s) may not be genotyped. In this situation, an association between a gene and a phenotype may still be detectable, using genetic markers associated with the true disease variant(s) (indirect association). Zondervan and Cardon [2004] showed that the odds ratios (OR) of markers which are associated with the disease variant depend highly on the linkage disequilibrium (LD) between the variant and the markers, and whether the allele frequencies match and thereby influence the sample size needed to detect genetic association. We examined the influence of LD and allele frequencies on the sample size needed to detect GxE in indirect association studies, and provide tables for sample size estimation. For discordant allele frequencies and incomplete LD, sample sizes can be unfeasibly large. The influence of both factors is stronger for disease loci with small rather than moderate to high disease allele frequencies. A decline in D' of e.g. 5% has less impact on sample size than increasing the difference in allele frequencies by the same percentage. Assuming 80% power, large interaction effects can be detected using smaller sample sizes than those needed for the detection of main effects. The detection of interaction effects involving rare alleles may not be possible. Focussing only on marker density can be a limited strategy in indirect association studies for GxE. Cyril Dalmasso 1 , Emmanuelle Génin 2 , Catherine Bourgain 2 , Philippe Broët 148 JE 2492 , Univ. Paris-Sud, France 49 INSERM UMR-S 535 and University Paris Sud, Villejuif, France Keywords: Linkage analysis, Multiple testing, False Discovery Rate, Mixture model In the context of genome-wide linkage analyses, where a large number of statistical tests are simultaneously performed, the False Discovery Rate (FDR) that is defined as the expected proportion of false discoveries among all discoveries is nowadays widely used for taking into account the multiple testing problem. Other related criteria have been considered such as the local False Discovery Rate (lFDR) that is a variant of the FDR giving to each test its own measure of significance. The lFDR is defined as the posterior probability that a null hypothesis is true. Most of the proposed methods for estimating the lFDR or the FDR rely on distributional assumption under the null hypothesis. However, in observational studies, the empirical null distribution may be very different from the theoretical one. In this work, we propose a mixture model based approach that provides estimates of the lFDR and the FDR in the context of large-scale variance component linkage analyses. In particular, this approach allows estimating the empirical null distribution, this latter being a key quantity for any simultaneous inference procedure. The proposed method is applied on a real dataset. Arief Gusnanto 1 , Frank Dudbridge 150 MRC Biostatistics Unit, Cambridge UK Keywords: Significance, genome-wide, association, permutation, multiplicity Genome-wide association scans have introduced statistical challenges, mainly in the multiplicity of thousands of tests. The question of what constitutes a significant finding remains somewhat unresolved. Permutation testing is very time-consuming, whereas Bayesian arguments struggle to distinguish direct from indirect association. It seems attractive to summarise the multiplicity in a simple form that allows users to avoid time-consuming permutations. A standard significance level would facilitate reporting of results and reduce the need for permutation tests. This is potentially important because current scans do not have full coverage of the whole genome, and yet, the implicit multiplicity is genome-wide. We discuss some proposed summaries, with reference to the empirical null distribution of the multiple tests, approximated through a large number of random permutations. Using genome-wide data from the Wellcome Trust Case-Control Consortium, we use a sub-sampling approach with increasing density to estimate the nominal p-value to obtain family-wise significance of 5%. The results indicate that the significance level is converging to about 1e-7 as the marker spacing becomes infinitely dense. We considered the concept of an effective number of independent tests, and showed that when used in a Bonferroni correction, the number varies with the overall significance level, but is roughly constant in the region of interest. We compared several estimators of the effective number of tests, and showed that in the region of significance of interest, Patterson's eigenvalue based estimator gives approximately the right family-wise error rate. Michael Nothnagel 1 , Amke Caliebe 1 , Michael Krawczak 151 Institute of Medical Informatics and Statistics, University Clinic Schleswig-Holstein, University of Kiel, Germany Keywords: Association scans, Bayesian framework, posterior odds, genetic risk, multiplicative model Whole-genome association scans have been suggested to be a cost-efficient way to survey genetic variation and to map genetic disease factors. We used a Bayesian framework to investigate the posterior odds of a genuine association under multiplicative disease models. We demonstrate that the p value alone is not a sufficient means to evaluate the findings in association studies. We suggest that likelihood ratios should accompany p values in association reports. We argue, that, given the reported results of whole-genome scans, more associations should have been successfully replicated if the consistently made assumptions about considerable genetic risks were correct. We conclude that it is very likely that the vast majority of relative genetic risks are only of the order of 1.2 or lower. Clive Hoggart 1 , Maria De Iorio 1 , John Whittakker 2 , David Balding 152 Department of Epidemiology and Public Health, Imperial College London, UK 53 Department of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, UK Keywords: Genome-wide association analyses, shrinkage priors, Lasso Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants of small effect, which is a plausible scenario for many complex diseases. Moreover, many simulation studies assume a single causal variant and so more complex realities are ignored. Analysing large numbers of variants simultaneously is now becoming feasible, thanks to developments in Bayesian stochastic search methods. We pose the problem of SNP selection as variable selection in a regression model. In contrast to single SNP tests this approach simultaneously models the effect of all SNPs. SNPs are selected by a Bayesian interpretation of the lasso (Tibshirani, 1996); the maximum a posterior (MAP) estimate of the regression coefficients, which have been given independent, double exponential prior distributions. The double exponential distribution is an example of a shrinkage prior, MAP estimates with shrinkage priors can be zero, thus all SNPs with non zero regression coefficients are selected. In addition to the commonly-used double exponential (Laplace) prior, we also implement the normal exponential gamma prior distribution. We show that use of the Laplace prior improves SNP selection in comparison with single -SNP tests, and that the normal exponential gamma prior leads to a further improvement. Our method is fast and can handle very large numbers of SNPs: we demonstrate its performance using both simulated and real genome-wide data sets with 500 K SNPs, which can be analysed in 2 hours on a desktop workstation. Mickael Guedj 1,2 , Jerome Wojcik 2 , Gregory Nuel 154 Laboratoire Statistique et Génome, Université d'Evry, Evry France 55 Serono Pharmaceutical Research Institute, Plan-les-Ouates, Switzerland Keywords: Local Replication, Local Score, Association In gene-mapping, replication of initial findings has been put forwards as the approach of choice for filtering false-positives from true signals for underlying loci. In practice, such replications are however too poorly observed. Besides the statistical and technical-related factors (lack of power, multiple-testing, stratification, quality control,) inconsistent conclusions obtained from independent populations might result from real biological differences. In particular, the high degree of variation in the strength of LD among populations of different origins is a major challenge to the discovery of genes. Seeking for Local Replications (defined as the presence of a signal of association in a same genomic region among populations) instead of strict replications (same locus, same risk allele) may lead to more reliable results. Recently, a multi-markers approach based on the Local Score statistic has been proposed as a simple and efficient way to select candidate genomic regions at the first stage of genome-wide association studies. Here we propose an extension of this approach adapted to replicated association studies. Based on simulations, this method appears promising. In particular it outperforms classical simple-marker strategies to detect modest-effect genes. Additionally it constitutes, to our knowledge, a first framework dedicated to the detection of such Local Replications. Juliet Chapman 1 , Claudio Verzilli 1 , John Whittaker 156 Department of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, UK Keywords: FDR, Association studies, Bayesian model selection As genomewide association studies become commonplace there is debate as to how such studies might be analysed and what we might hope to gain from the data. It is clear that standard single locus approaches are limited in that they do not adjust for the effects of other loci and problematic since it is not obvious how to adjust for multiple comparisons. False discovery rates have been suggested, but it is unclear how well these will cope with highly correlated genetic data. We consider the validity of standard false discovery rates in large scale association studies. We also show that a Bayesian procedure has advantages in detecting causal loci amongst a large number of dependant SNPs and investigate properties of a Bayesian FDR. Peter Kraft 157 Harvard School of Public Health, Boston USA Keywords: Gene-environment interaction, genome-wide association scans Appropriately analyzed two-stage designs,where a subset of available subjects are genotyped on a genome-wide panel of markers at the first stage and then a much smaller subset of the most promising markers are genotyped on the remaining subjects,can have nearly as much power as a single-stage study where all subjects are genotyped on the genome-wide panel yet can be much less expensive. Typically, the "most promising" markers are selected based on evidence for a marginal association between genotypes and disease. Subsequently, the few markers found to be associated with disease at the end of the second stage are interrogated for evidence of gene-environment interaction, mainly to understand their impact on disease etiology and public health impact. However, this approach may miss variants which have a sizeable effect restricted to one exposure stratum and therefore only a modest marginal effect. We have proposed to use information on the joint effects of genes and a discrete list of environmental exposures at the initial screening stage to select promising markers for the second stage [Kraft et al Hum Hered 2007]. This approach optimizes power to detect variants that have a sizeable marginal effect and variants that have a small marginal effect but a sizeable effect in a stratum defined by an environmental exposure. As an example, I discuss a proposed genome-wide association scan for Type II diabetes susceptibility variants based in several large nested case-control studies. Beate Glaser 1 , Peter Holmans 158 Biostatistics and Bioinformatics Unit, Cardiff University, School of Medicine, Heath Park, Cardiff, UK Keywords: Combined case-control and trios analysis, Power, False-positive rate, Simulation, Association studies The statistical power of genetic association studies can be enhanced by combining the analysis of case-control with parent-offspring trio samples. Various combined analysis techniques have been recently developed; as yet, there have been no comparisons of their power. This work was performed with the aim of identifying the most powerful method among available combined techniques including test statistics developed by Kazeem and Farrall (2005), Nagelkerke and colleagues (2004) and Dudbridge (2006), as well as a simple combination of ,2-statistics from single samples. Simulation studies were performed to investigate their power under different additive, multiplicative, dominant and recessive disease models. False-positive rates were determined by studying the type I error rates under null models including models with unequal allele frequencies between the single case-control and trios samples. We identified three techniques with equivalent power and false-positive rates, which included modifications of the three main approaches: 1) the unmodified combined Odds ratio estimate by Kazeem & Farrall (2005), 2) a modified approach of the combined risk ratio estimate by Nagelkerke & colleagues (2004) and 3) a modified technique for a combined risk ratio estimate by Dudbridge (2006). Our work highlights the importance of studies investigating test performance criteria of novel methods, as they will help users to select the optimal approach within a range of available analysis techniques. David Almorza 1 , M.V. Kandus 2 , Juan Carlos Salerno 2 , Rafael Boggio 359 Facultad de Ciencias del Trabajo, University of Cádiz, Spain 60 Instituto de Genética IGEAF, Buenos Aires, Argentina 61 Universidad Nacional de La Plata, Buenos Aires, Argentina Keywords: Principal component analysis, maize, ear weight, inbred lines The objective of this work was to evaluate the relationship among different traits of the ear of maize inbred lines and to group genotypes according to its performance. Ten inbred lines developed at IGEAF (INTA Castelar) and five public inbred lines as checks were used. A field trial was carried out in Castelar, Buenos Aires (34° 36' S , 58° 39' W) using a complete randomize design with three replications. At harvest, individual weight (P.E.), diameter (D.E.), row number (N.H.) and length (L.E.) of the ear were assessed. A principal component analysis, PCA, (Infostat 2005) was used, and the variability of the data was depicted with a biplot. Principal components 1 and 2 (CP1 and CP2) explained 90% of the data variability. CP1 was correlated with P.E., L.E. and D.E., meanwhile CP2 was correlated with N.H. We found that individual weight (P.E.) was more correlated with diameter of the ear (D.E.) than with length (L.E). Five groups of inbred lines were distinguished: with high P.E. and mean N.H. (04-70, 04-73, 04-101 and MO17), with high P.E. but less N.H. (04-61 and B14), with mean P.E. and N.H. (B73, 04-123 and 04-96), with high N.H. but less P.E. (LP109, 04-8, 04-91 and 04-76) and with low P.E. and low N.H. (LP521 and 04-104). The use of PCA showed which variables had more incidence in ear weight and how is the correlation among them. Moreover, the different groups found with this analysis allow the evaluation of inbred lines by several traits simultaneously. Sven Knüppel 1 , Anja Bauerfeind 1 , Klaus Rohde 162 Department of Bioinformatics, MDC Berlin, Germany Keywords: Haplotypes, association studies, case-control, nuclear families The area of gene chip technology provides a plethora of phase-unknown SNP genotypes in order to find significant association to some genetic trait. To circumvent possibly low information content of a single SNP one groups successive SNPs and estimates haplotypes. Haplotype estimation, however, may reveal ambiguous haplotype pairs and bias the application of statistical methods. Zaykin et al. (Hum Hered, 53:79-91, 2002) proposed the construction of a design matrix to take this ambiguity into account. Here we present a set of functions written for the Statistical package R, which carries out haplotype estimation on the basis of the EM-algorithm for individuals (case-control) or nuclear families. The construction of a design matrix on basis of estimated haplotypes or haplotype pairs allows application of standard methods for association studies (linear, logistic regression), as well as statistical methods as haplotype sharing statistics and TDT-Test. Applications of these methods to genome-wide association screens will be demonstrated. Manuela Zucknick 1 , Chris Holmes 2 , Sylvia Richardson 163 Department of Epidemiology and Public Health, Imperial College London, UK 64 Department of Statistics, Oxford Center for Gene Function, University of Oxford, UK Keywords: Bayesian, variable selection, MCMC, large p, small n, structured dependence In large-scale genomic applications vast numbers of markers or genes are scanned to find a few candidates which are linked to a particular phenotype. Statistically, this is a variable selection problem in the "large p, small n" situation where many more variables than samples are available. An additional feature is the complex dependence structure which is often observed among the markers/genes due to linkage disequilibrium or their joint involvement in biological processes. Bayesian variable selection methods using indicator variables are well suited to the problem. Binary phenotypes like disease status are common and both Bayesian probit and logistic regression can be applied in this context. We argue that logistic regression models are both easier to tune and to interpret than probit models and implement the approach by Holmes & Held (2006). Because the model space is vast, MCMC methods are used as stochastic search algorithms with the aim to quickly find regions of high posterior probability. In a trade-off between fast-updating but slow-moving single-gene Metropolis-Hastings samplers and computationally expensive full Gibbs sampling, we propose to employ the dependence structure among the genes/markers to help decide which variables to update together. Also, parallel tempering methods are used to aid bold moves and help avoid getting trapped in local optima. Mixing and convergence of the resulting Markov chains are evaluated and compared to standard samplers in both a simulation study and in an application to a gene expression data set. Reference Holmes, C. C. & Held, L. (2006) Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis1, 145,168. Dawn Teare 165 MMGE, University of Sheffield, UK Keywords: CNP, family-based analysis, MCMC Evidence is accumulating that segmental copy number polymorphisms (CNPs) may represent a significant portion of human genetic variation. These highly polymorphic systems require handling as phenotypes rather than co-dominant markers, placing new demands on family-based analyses. We present an integrated approach to meet these challenges in the form of a graphical model, where the underlying discrete CNP phenotype is inferred from the (single or replicate) quantitative measure within the analysis, whilst assuming an allele based system segregating through the pedigree. [source] Effect of including environmental data in investigations of gene-disease associations in the presence of qualitative interactionsGENETIC EPIDEMIOLOGY, Issue 6 2010Elizabeth Williamson Abstract Complex diseases are likely to be caused by the interplay of genetic and environmental factors. Despite this, gene-disease associations are frequently investigated using models that focus solely on a marginal gene effect, ignoring environmental factors entirely. Failing to take into account a gene-environment interaction can weaken the apparent gene-disease association, leading to loss in statistical power and, potentially, inability to identify genuine risk factors. If a gene-environment interaction exists, therefore, a joint analysis allowing the effect of the gene to differ between groups defined by the environmental exposure can have greater statistical power than a marginal gene-disease model. However, environmental data are subject to measurement error. Substantial losses in statistical power for detecting gene-environment interactions can arise from measurement error in the environmental exposure. It is unclear, however, what effect measurement error may have on the power of the joint analysis. We consider the potential benefits, in terms of statistical power, of collecting concurrent environmental data within large cohorts in order to enhance gene detection. We further consider whether these benefits remain in the presence of misclassification in both the gene and the environmental exposure. We find that when an effect of the gene is apparent only in the presence of the environmental exposure, the joint analysis has greater power than a marginal gene-disease analysis. This comparative increase in power remains in the presence of likely levels of misclassification of either the gene or environmental exposure. Genet. Epidemiol. 34:552,560, 2010. Š 2010 Wiley-Liss, Inc. [source] Genetic association tests in the presence of epistasis or gene-environment interactionGENETIC EPIDEMIOLOGY, Issue 7 2008Kai WangArticle first published online: 24 APR 200 Abstract A genetic variant is very likely to manifest its effect on disease through its main effect as well as through its interaction with other genetic variants or environmental factors. Power to detect genetic variants can be greatly improved by modeling their main effects and their interaction effects through a common set of parameters or "generalized association parameters" (Chatterjee et al. [2006] Am. J. Hum. Genet. 79:1002,1016) because of the reduced number of degrees of freedom. Following this idea, I propose two models that extend the work by Chatterjee and colleagues. Particularly, I consider not only the case of relatively weak interaction effect compared to the main effect but also the case of relatively weak main effect. This latter case is perhaps more relevant to genetic association studies. The proposed methods are invariant to the choice of the allele for scoring genotypes or the choice of the reference genotype score. For each model, the asymptotic distribution of the likelihood ratio statistic is derived. Simulation studies suggest that the proposed methods are more powerful than existing ones under certain circumstances. Genet. Epidemiol. 2008. Š 2008 Wiley-Liss, Inc. [source] Stratified case sampling and the use of family controlsGENETIC EPIDEMIOLOGY, Issue 3 2001Kimberly D. Siegmund Abstract We compare the asymptotic relative efficiency (ARE) of different study designs for estimating gene and gene-environment interaction effects using matched case-control data. In the sampling schemes considered, cases are selected differentially based on their family history of disease. Controls are selected either from unrelated subjects or from among the case's unaffected siblings and cousins. Parameters are estimated using weighted conditional logistic regression, where the likelihood contributions for each subject are weighted by the fraction of cases sampled sharing the same family history. Results showed that compared to random sampling, over-sampling cases with a positive family history increased the efficiency for estimating the main effect of a gene for sib-control designs (103,254% ARE) and decreased efficiency for cousin-control and population-control designs (68,94% ARE and 67,84% ARE, respectively). Population controls and random sampling of cases were most efficient for a recessive gene or a dominant gene with an relative risk less than 9. For estimating gene-environment interactions, over-sampling positive-family-history cases again led to increased efficiency using sib controls (111,180% ARE) and decreased efficiency using population controls (68,87% ARE). Using case-cousin pairs, the results differed based on the genetic model and the size of the interaction effect; biased sampling was only slightly more efficient than random sampling for large interaction effects under a dominant gene model (relative risk ratio = 8, 106% ARE). Overall, the most efficient study design for studying gene-environment interaction was the case-sib-control design with over-sampling of positive-family-history-cases. Genet. Epidemiol. 20:316,327, 2001. Š 2001 Wiley-Liss, Inc. [source] The IBD international genetics consortium provides further evidence for linkage to IBD4 and shows gene-environment interactionINFLAMMATORY BOWEL DISEASES, Issue 1 2005Marie Pierik MD Abstract Background and Aims: The inflammatory bowel diseases (IBDs) Crohn's disease (CD) and ulcerative colitis are complex disorders with an important genetic determinant. One gene associated with CD has been identified: NOD2/CARD15. Two independent genome-wide scans found significant evidence (logarithm of odds [LOD] 3.6) and suggestive evidence (LOD 2.8) for linkage on locus 14q11-12, also known as the IBD4 locus. To further characterize this locus, we assessed gene-environment interaction (IBD4 × smoking) and phenotypic heterogeneity in a large cohort of IBD-affected sibling pairs as part of an ongoing international collaborative effort. Patients and Methods: A total of 733 IBD families, comprising 892 affected sibling pairs, were genotyped for microsatellites D14S261, D14S283, D14S972, and D14S275, spanning the IBD4 locus. Information on gender, ethnicity, age at onset, smoking at diagnosis, extraintestinal manifestations, and disease location was available. Results: A significant distortion in the mean allele sharing (MAS) between affected siblings was observed for CD patients only at each of the four markers (54.6%, 52.8%, 50.4%, and 53.3%, respectively). Maximum linkage for CD was observed at marker D14S261 (multipoint nonparametric linkage score 2.36; P , 0.01; MAS 54.6%). MAS was higher in CD families in which all siblings or at least one sibling smoked compared with nonsmoking CD families (MAS, 58.90%, 57.50%, and 52.80%, respectively). Conclusions: The IBD International Genetics Consortium replicated the IBD4 locus on chromosome 14q for CD and also showed evidence for a gene-environment interaction at this locus. Further studies are needed to explore the mechanism by which smoking influences IBD4. [source] Gene-environment interaction and obesityNUTRITION REVIEWS, Issue 12 2008Lu Qi The epidemic of obesity has become a major public health problem. Common-form obesity is underpinned by both environmental and genetic factors. Epidemiological studies have documented that increased intakes of energy and reduced consumption of high-fiber foods, as well as sedentary lifestyle, were among the major driving forces for the epidemic of obesity. Recent genome-wide association studies have identified several genes convincingly related to obesity risk, including the fat mass and obesity associated gene and the melanocortin-4 receptor gene. Testing gene-environment interaction is a relatively new field. This article reviews recent advances in identifying the genetic and environmental risk factors (lifestyle and diet) for obesity. The evidence for gene-environment interaction, especially from observational studies and randomized intervention trials, is examined specifically. Knowledge about the interplay between genetic and environmental components may facilitate the choice of more effective and specific measures for obesity prevention based on the personalized genetic make-up. [source] Genetic polymorphisms in glutathione S-transferases and cytochrome P450s, tobacco smoking, and risk of non-Hodgkin lymphomaAMERICAN JOURNAL OF HEMATOLOGY, Issue 5 2009Briseis A. Kilfoy We investigated variation in glutathione S-transferases (GSTs) and cytochrome P450s (CYPs), and smoking in a population-based case-control study of NHL including 1,115 women. Although risk of NHL was not altered by variant polymorphisms in GSTs or CYPs, it was significantly changed for DLBCL when considered in conjunction with smoking behavior, though only in nonsmokers. An increased risk of DLBCL in nonsmokers was associated with the variant G allele for GSTP1 (OR = 1.6, 95% CI 1.0,2.3) and CYP1A1 (OR = 2.4; 95% CI 1.0,5.7), but a decreased risk for the variant G allele for CYP1B1 (OR = 0.6, 95% CI 0.4,1.0). Our results confer support investigation of the gene-environment interaction in a larger study population of DLBCL. Am. J. Hematol. 2009. Š 2009 Wiley-Liss, Inc. [source] Photoperiod at conception predicts C677T-MTHFR genotype: A novel gene-environment interactionAMERICAN JOURNAL OF HUMAN BIOLOGY, Issue 4 2010Mark Lucock Data is presented, which suggest that the day length a woman experiences during the periconceptional period predicts the C677T-MTHFR genotype of her child. Logistic regression analysis involving 375 neonates born in the same geographical location within a three year period demonstrated that photoperiod (minutes) at conception predicts both genotype (P = 0.0139) and mutant allele carriage (P = 0.0161); the trend clearly showing that the 677T-MTHFR allele frequency increases as photoperiod increases. We propose a number of explanations, including a hypothesis in which a long photoperiod around conception decreases maternal systemic folate because of UVA induced dermal oxidative degradation of 5-methyl-H4folate, leading to a lower cellular 5,10-methylene-H4folate status. In this scenario, 5,10-methylene-H4folate would be more efficiently used for dTMP and DNA synthesis by 677T-MTHFR embryos than wildtype embryos giving the 677T-MTHFR embryos increased viability, and hence increasing mutant T-allele frequency. Alternate hypotheses include: increased seasonal availability of folate rich foods that genetically buffer any negative effect of 677T-MTHFR in embryos; seasonal oxidative stress lowering embryo-toxic homocysteine; an undefined hormonal effect of photoperiod on the neuroendocrine axis, which mediates genotype/embryo selection. The effect of photoperiod on genotype seems clear, but the speculative molecular mechanism underpinning the effect needs careful examination. Am. J. Hum. Biol., 2010. Š 2010 Wiley-Liss, Inc. [source] On the Use of Allelic Transmission Rates for Assessing Gene-by-Environment Interaction in Case-Parent TriosANNALS OF HUMAN GENETICS, Issue 5 2010Ji-Hyung Shin Summary Allelic transmission rates from parents to cases are frequently stratified by an environmental risk factor E and compared, with heterogeneity interpreted as gene-environment interaction or G×E. Though generally invalid, such analyses continue to appear. We revisit why heterogeneity is not equivalent to G×E in a range of settings not considered previously. The objective is a fuller understanding of the bias in transmission rates and what is driving it. Extending previously published findings, we derive parental mating-type probabilities in cases and use them to obtain transmission rates, which we then compare to G×E. Through simulation, we investigate the practical implications of the bias for a transmission-based test of G×E. We find that general population characteristics distort the picture of G×E obtained from transmission rates: the stratum-specific mating-type probabilities under G , E dependence and the allele frequency under independence. Furthermore, the transmission-based test has inflated error rates relative to a likelihood-based test. Our investigation provides further insight into how and why transmission-based tests and descriptive summaries can mislead about G×E. For exploring G×E, we suggest graphical displays of the transmission rates within parental mating types, as they are robust to population stratification and the penetrance model. [source] European Mathematical Genetics Meeting, Heidelberg, Germany, 12th,13th April 2007ANNALS OF HUMAN GENETICS, Issue 4 2007Article first published online: 28 MAY 200 Saurabh Ghosh 11 Indian Statistical Institute, Kolkata, India High correlations between two quantitative traits may be either due to common genetic factors or common environmental factors or a combination of both. In this study, we develop statistical methods to extract the contribution of a common QTL to the total correlation between the components of a bivariate phenotype. Using data on bivariate phenotypes and marker genotypes for sib-pairs, we propose a test for linkage between a common QTL and a marker locus based on the conditional cross-sib trait correlations (trait 1 of sib 1 , trait 2 of sib 2 and conversely) given the identity-by-descent sharing at the marker locus. The null hypothesis cannot be rejected unless there exists a common QTL. We use Monte-Carlo simulations to evaluate the performance of the proposed test under different trait parameters and quantitative trait distributions. An application of the method is illustrated using data on two alcohol-related phenotypes from the Collaborative Study On The Genetics Of Alcoholism project. Rémi Kazma 1 , Catherine Bonaďti-Pellié 1 , Emmanuelle Génin 12 INSERM UMR-S535 and Université Paris Sud, Villejuif, 94817, France Keywords: Gene-environment interaction, sibling recurrence risk, exposure correlation Gene-environment interactions may play important roles in complex disease susceptibility but their detection is often difficult. Here we show how gene-environment interactions can be detected by investigating the degree of familial aggregation according to the exposure of the probands. In case of gene-environment interaction, the distribution of genotypes of affected individuals, and consequently the risk in relatives, depends on their exposure. We developed a test comparing the risks in sibs according to the proband exposure. To evaluate the properties of this new test, we derived the formulas for calculating the expected risks in sibs according to the exposure of probands for various values of exposure frequency, relative risk due to exposure alone, frequencies of latent susceptibility genotypes, genetic relative risks and interaction coefficients. We find that the ratio of risks when the proband is exposed and not exposed is a good indicator of the interaction effect. We evaluate the power of the test for various sample sizes of affected individuals. We conclude that this test is valuable for diseases with moderate familial aggregation, only when the role of the exposure has been clearly evidenced. Since a correlation for exposure among sibs might lead to a difference in risks among sibs in the different proband exposure strata, we also add an exposure correlation coefficient in the model. Interestingly, we find that when this correlation is correctly accounted for, the power of the test is not decreased and might even be significantly increased. Andrea Callegaro 1 , Hans J.C. Van Houwelingen 1 , Jeanine Houwing-Duistermaat 13 Dept. of Medical Statistics and Bioinformatics, Leiden University Medical Center, The Netherlands Keywords: Survival analysis, age at onset, score test, linkage analysis Non parametric linkage (NPL) analysis compares the identical by descent (IBD) sharing in sibling pairs to the expected IBD sharing under the hypothesis of no linkage. Often information is available on the marginal cumulative hazards (for example breast cancer incidence curves). Our aim is to extend the NPL methods by taking into account the age at onset of selected sibling pairs using these known marginal hazards. Li and Zhong (2002) proposed a (retrospective) likelihood ratio test based on an additive frailty model for genetic linkage analysis. From their model we derive a score statistic for selected samples which turns out to be a weighed NPL method. The weights depend on the marginal cumulative hazards and on the frailty parameter. A second approach is based on a simple gamma shared frailty model. Here, we simply test whether the score function of the frailty parameter depends on the excess IBD. We compare the performance of these methods using simulated data. Céline Bellenguez 1 , Carole Ober 2 , Catherine Bourgain 14 INSERM U535 and University Paris Sud, Villejuif, France 5 Department of Human Genetics, The University of Chicago, USA Keywords: Linkage analysis, linkage disequilibrium, high density SNP data Compared with microsatellite markers, high density SNP maps should be more informative for linkage analyses. However, because they are much closer, SNPs present important linkage disequilibrium (LD), which biases classical nonparametric multipoint analyses. This problem is even stronger in population isolates where LD extends over larger regions with a more stochastic pattern. We investigate the issue of linkage analysis with a 500K SNP map in a large and inbred 1840-member Hutterite pedigree, phenotyped for asthma. Using an efficient pedigree breaking strategy, we first identified linked regions with a 5cM microsatellite map, on which we focused to evaluate the SNP map. The only method that models LD in the NPL analysis is limited in both the pedigree size and the number of markers (Abecasis and Wigginton, 2005) and therefore could not be used. Instead, we studied methods that identify sets of SNPs with maximum linkage information content in our pedigree and no LD-driven bias. Both algorithms that directly remove pairs of SNPs in high LD and clustering methods were evaluated. Null simulations were performed to control that Zlr calculated with the SNP sets were not falsely inflated. Preliminary results suggest that although LD is strong in such populations, linkage information content slightly better than that of microsatellite maps can be extracted from dense SNP maps, provided that a careful marker selection is conducted. In particular, we show that the specific LD pattern requires considering LD between a wide range of marker pairs rather than only in predefined blocks. Peter Van Loo 1,2,3 , Stein Aerts 1,2 , Diether Lambrechts 4,5 , Bernard Thienpont 2 , Sunit Maity 4,5 , Bert Coessens 3 , Frederik De Smet 4,5 , Leon-Charles Tranchevent 3 , Bart De Moor 2 , Koen Devriendt 3 , Peter Marynen 1,2 , Bassem Hassan 1,2 , Peter Carmeliet 4,5 , Yves Moreau 36 Department of Molecular and Developmental Genetics, VIB, Belgium 7 Department of Human Genetics, University of Leuven, Belgium 8 Bioinformatics group, Department of Electrical Engineering, University of Leuven, Belgium 9 Department of Transgene Technology and Gene Therapy, VIB, Belgium 10 Center for Transgene Technology and Gene Therapy, University of Leuven, Belgium Keywords: Bioinformatics, gene prioritization, data fusion The identification of genes involved in health and disease remains a formidable challenge. Here, we describe a novel bioinformatics method to prioritize candidate genes underlying pathways or diseases, based on their similarity to genes known to be involved in these processes. It is freely accessible as an interactive software tool, ENDEAVOUR, at http://www.esat.kuleuven.be/endeavour. Unlike previous methods, ENDEAVOUR generates distinct prioritizations from multiple heterogeneous data sources, which are then integrated, or fused, into one global ranking using order statistics. ENDEAVOUR prioritizes candidate genes in a three-step process. First, information about a disease or pathway is gathered from a set of known "training" genes by consulting multiple data sources. Next, the candidate genes are ranked based on similarity with the training properties obtained in the first step, resulting in one prioritized list for each data source. Finally, ENDEAVOUR fuses each of these rankings into a single global ranking, providing an overall prioritization of the candidate genes. Validation of ENDEAVOUR revealed it was able to efficiently prioritize 627 genes in disease data sets and 76 genes in biological pathway sets, identify candidates of 16 mono- or polygenic diseases, and discover regulatory genes of myeloid differentiation. Furthermore, the approach identified YPEL1 as a novel gene involved in craniofacial development from a 2-Mb chromosomal region, deleted in some patients with DiGeorge-like birth defects. Finally, we are currently evaluating a pipeline combining array-CGH, ENDEAVOUR and in vivo validation in zebrafish to identify novel genes involved in congenital heart defects. Mark Broom 1 , Graeme Ruxton 2 , Rebecca Kilner 311 Mathematics Dept., University of Sussex, UK 12 Division of Environmental and Evolutionary Biology, University of Glasgow, UK 13 Department of Zoology, University of Cambridge, UK Keywords: Evolutionarily stable strategy, parasitism, asymmetric game Brood parasites chicks vary in the harm that they do to their companions in the nest. In this presentation we use game-theoretic methods to model this variation. Our model considers hosts which potentially abandon single nestlings and instead choose to re-allocate their reproductive effort to future breeding, irrespective of whether the abandoned chick is the host's young or a brood parasite's. The parasite chick must decide whether or not to kill host young by balancing the benefits from reduced competition in the nest against the risk of desertion by host parents. The model predicts that three different types of evolutionarily stable strategies can exist. (1) Hosts routinely rear depleted broods, the brood parasite always kills host young and the host never then abandons the nest. (2) When adult survival after deserting single offspring is very high, hosts always abandon broods of a single nestling and the parasite never kills host offspring, effectively holding them as hostages to prevent nest desertion. (3) Intermediate strategies, in which parasites sometimes kill their nest-mates and host parents sometimes desert nests that contain only a single chick, can also be evolutionarily stable. We provide quantitative descriptions of how the values given to ecological and behavioral parameters of the host-parasite system influence the likelihood of each strategy and compare our results with real host-brood parasite associations in nature. Martin Harrison 114 Mathematics Dept, University of Sussex, UK Keywords: Brood parasitism, games, host, parasite The interaction between hosts and parasites in bird populations has been studied extensively. Game theoretical methods have been used to model this interaction previously, but this has not been studied extensively taking into account the sequential nature of this game. We consider a model allowing the host and parasite to make a number of decisions, which depend on a number of natural factors. The host lays an egg, a parasite bird will arrive at the nest with a certain probability and then chooses to destroy a number of the host eggs and lay one of it's own. With some destruction occurring, either natural or through the actions of the parasite, the host chooses to continue, eject an egg (hoping to eject the parasite) or abandon the nest. Once the eggs have hatched the game then falls to the parasite chick versus the host. The chick chooses to destroy or eject a number of eggs. The final decision is made by the host, choosing whether to raise or abandon the chicks that are in the nest. We consider various natural parameters and probabilities which influence these decisions. We then use this model to look at real-world situations of the interactions of the Reed Warbler and two different parasites, the Common Cuckoo and the Brown-Headed Cowbird. These two parasites have different methods in the way that they parasitize the nests of their hosts. The hosts in turn have a different reaction to these parasites. Arne Jochens 1 , Amke Caliebe 2 , Uwe Roesler 1 , Michael Krawczak 215 Mathematical Seminar, University of Kiel, Germany 16 Institute of Medical Informatics and Statistics, University of Kiel, Germany Keywords: Stepwise mutation model, microsatellite, recursion equation, temporal behaviour We consider the stepwise mutation model which occurs, e.g., in microsatellite loci. Let X(t,i) denote the allelic state of individual i at time t. We compute expectation, variance and covariance of X(t,i), i=1,,,N, and provide a recursion equation for P(X(t,i)=z). Because the variance of X(t,i) goes to infinity as t grows, for the description of the temporal behaviour, we regard the scaled process X(t,i)-X(t,1). The results furnish a better understanding of the behaviour of the stepwise mutation model and may in future be used to derive tests for neutrality under this model. Paul O'Reilly 1 , Ewan Birney 2 , David Balding 117 Statistical Genetics, Department of Epidemiology and Public Health, Imperial, College London, UK 18 European Bioinformatics Institute, EMBL, Cambridge, UK Keywords: Positive selection, Recombination rate, LD, Genome-wide, Natural Selection In recent years efforts to develop population genetics methods that estimate rates of recombination and levels of natural selection in the human genome have intensified. However, since the two processes have an intimately related impact on genetic variation their inference is vulnerable to confounding. Genomic regions subject to recent selection are likely to have a relatively recent common ancestor and consequently less opportunity for historical recombinations that are detectable in contemporary populations. Here we show that selection can reduce the population-based recombination rate estimate substantially. In genome-wide studies for detecting selection we observe a tendency to highlight loci that are subject to low levels of recombination. We find that the outlier approach commonly adopted in such studies may have low power unless variable recombination is accounted for. We introduce a new genome-wide method for detecting selection that exploits the sensitivity to recent selection of methods for estimating recombination rates, while accounting for variable recombination using pedigree data. Through simulations we demonstrate the high power of the Ped/Pop approach to discriminate between neutral and adaptive evolution, particularly in the context of choosing outliers from a genome-wide distribution. Although methods have been developed showing good power to detect selection ,in action', the corresponding window of opportunity is small. In contrast, the power of the Ped/Pop method is maintained for many generations after the fixation of an advantageous variant Sarah Griffiths 1 , Frank Dudbridge 120 MRC Biostatistics Unit, Cambridge, UK Keywords: Genetic association, multimarker tag, haplotype, likelihood analysis In association studies it is generally too expensive to genotype all variants in all subjects. We can exploit linkage disequilibrium between SNPs to select a subset that captures the variation in a training data set obtained either through direct resequencing or a public resource such as the HapMap. These ,tag SNPs' are then genotyped in the whole sample. Multimarker tagging is a more aggressive adaptation of pairwise tagging that allows for combinations of two or more tag SNPs to predict an untyped SNP. Here we describe a new method for directly testing the association of an untyped SNP using a multimarker tag. Previously, other investigators have suggested testing a specific tag haplotype, or performing a weighted analysis using weights derived from the training data. However these approaches do not properly account for the imperfect correlation between the tag haplotype and the untyped SNP. Here we describe a straightforward approach to testing untyped SNPs using a missing-data likelihood analysis, including the tag markers as nuisance parameters. The training data is stacked on top of the main body of genotype data so there is information on how the tag markers predict the genotype of the untyped SNP. The uncertainty in this prediction is automatically taken into account in the likelihood analysis. This approach yields more power and also a more accurate prediction of the odds ratio of the untyped SNP. Anke Schulz 1 , Christine Fischer 2 , Jenny Chang-Claude 1 , Lars Beckmann 121 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany 22 Institute of Human Genetics, University of Heidelberg, Germany Keywords: Haplotype, haplotype sharing, entropy, Mantel statistics, marker selection We previously introduced a new method to map genes involved in complex diseases, using haplotype sharing-based Mantel statistics to correlate genetic and phenotypic similarity. Although the Mantel statistic is powerful in narrowing down candidate regions, the precise localization of a gene is hampered in genomic regions where linkage disequilibrium is so high that neighboring markers are found to be significant at similar magnitude and we are not able to discriminate between them. Here, we present a new approach to localize susceptibility genes by combining haplotype sharing-based Mantel statistics with an iterative entropy-based marker selection algorithm. For each marker at which the Mantel statistic is evaluated, the algorithm selects a subset of surrounding markers. The subset is chosen to maximize multilocus linkage disequilibrium, which is measured by the normalized entropy difference introduced by Nothnagel et al. (2002). We evaluated the algorithm with respect to type I error and power. Its ability to localize the disease variant was compared to the localization (i) without marker selection and (ii) considering haplotype block structure. Case-control samples were simulated from a set of 18 haplotypes, consisting of 15 SNPs in two haplotype blocks. The new algorithm gave correct type I error and yielded similar power to detect the disease locus compared to the alternative approaches. The neighboring markers were clearly less often significant than the causal locus, and also less often significant compared to the alternative approaches. Thus the new algorithm improved the precision of the localization of susceptibility genes. Mark M. Iles 123 Section of Epidemiology and Biostatistics, LIMM, University of Leeds, UK Keywords: tSNP, tagging, association, HapMap Tagging SNPs (tSNPs) are commonly used to capture genetic diversity cost-effectively. However, it is important that the efficacy of tSNPs is correctly estimated, otherwise coverage may be insufficient. If the pilot sample from which tSNPs are chosen is too small or the initial marker map too sparse, tSNP efficacy may be overestimated. An existing estimation method based on bootstrapping goes some way to correct for insufficient sample size and overfitting, but does not completely solve the problem. We describe a novel method, based on exclusion of haplotypes, that improves on the bootstrap approach. Using simulated data, the extent of the sample size problem is investigated and the performance of the bootstrap and the novel method are compared. We incorporate an existing method adjusting for marker density by ,SNP-dropping'. We find that insufficient sample size can cause large overestimates in tSNP efficacy, even with as many as 100 individuals, and the problem worsens as the region studied increases in size. Both the bootstrap and novel method correct much of this overestimate, with our novel method consistently outperforming the bootstrap method. We conclude that a combination of insufficient sample size and overfitting may lead to overestimation of tSNP efficacy and underpowering of studies based on tSNPs. Our novel approach corrects for much of this bias and is superior to the previous method. Sample sizes larger than previously suggested may still be required for accurate estimation of tSNP efficacy. This has obvious ramifications for the selection of tSNPs from HapMap data. Claudio Verzilli 1 , Juliet Chapman 1 , Aroon Hingorani 2 , Juan Pablo-Casas 1 , Tina Shah 2 , Liam Smeeth 1 , John Whittaker 124 Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, UK 25 Division of Medicine, University College London, UK Keywords: Meta-analysis, Genetic association studies We present a Bayesian hierarchical model for the meta-analysis of candidate gene studies with a continuous outcome. Such studies often report results from association tests for different, possibly study-specific and non-overlapping markers (typically SNPs) in the same genetic region. Meta analyses of the results at each marker in isolation are seldom appropriate as they ignore the correlation that may exist between markers due to linkage disequlibrium (LD) and cannot assess the relative importance of variants at each marker. Also such marker-wise meta analyses are restricted to only those studies that have typed the marker in question, with a potential loss of power. A better strategy is one which incorporates information about the LD between markers so that any combined estimate of the effect of each variant is corrected for the effect of other variants, as in multiple regression. Here we develop a Bayesian hierarchical linear regression that models the observed genotype group means and uses pairwise LD measurements between markers as prior information to make posterior inference on adjusted effects. The approach is applied to the meta analysis of 24 studies assessing the effect of 7 variants in the C-reactive protein (CRP) gene region on plasma CRP levels, an inflammatory biomarker shown in observational studies to be positively associated with cardiovascular disease. Cathryn M. Lewis 1 , Christopher G. Mathew 1 , Theresa M. Marteau 226 Dept. of Medical and Molecular Genetics, King's College London, UK 27 Department of Psychology, King's College London, UK Keywords: Risk, genetics, CARD15, smoking, model Recently progress has been made in identifying mutations that confer susceptibility to complex diseases, with the potential to use these mutations in determining disease risk. We developed methods to estimate disease risk based on genotype relative risks (for a gene G), exposure to an environmental factor (E), and family history (with recurrence risk ,R for a relative of type R). ,R must be partitioned into the risk due to G (which is modelled independently) and the residual risk. The risk model was then applied to Crohn's disease (CD), a severe gastrointestinal disease for which smoking increases disease risk approximately 2-fold, and mutations in CARD15 confer increased risks of 2.25 (for carriers of a single mutation) and 9.3 (for carriers of two mutations). CARD15 accounts for only a small proportion of the genetic component of CD, with a gene-specific ,S, CARD15 of 1.16, from a total sibling relative risk of ,S= 27. CD risks were estimated for high-risk individuals who are siblings of a CD case, and who also smoke. The CD risk to such individuals who carry two CARD15 mutations is approximately 0.34, and for those carrying a single CARD15 mutation the risk is 0.08, compared to a population prevalence of approximately 0.001. These results imply that complex disease genes may be valuable in estimating with greater precision than has hitherto been possible disease risks in specific, easily identified subgroups of the population with a view to prevention. Yurii Aulchenko 128 Department of Epidemiology & Biostatistics, Erasmus Medical Centre Rotterdam, The Netherlands Keywords: Compression, information, bzip2, genome-wide SNP data, statistical genetics With advances in molecular technology, studies accessing millions of genetic polymorphisms in thousands of study subjects will soon become common. Such studies generate large amounts of data, whose effective storage and management is a challenge to the modern statistical genetics. Standard file compression utilities, such as Zip, Gzip and Bzip2, may be helpful to minimise the storage requirements. Less obvious is the fact that the data compression techniques may be also used in the analysis of genetic data. It is known that the efficiency of a particular compression algorithm depends on the probability structure of the data. In this work, we compared different standard and customised tools using the data from human HapMap project. Secondly, we investigate the potential uses of data compression techniques for the analysis of linkage, association and linkage disequilibrium Suzanne Leal 1 , Bingshan Li 129 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, USA Keywords: Consanguineous pedigrees, missing genotype data Missing genotype data can increase false-positive evidence for linkage when either parametric or nonparametric analysis is carried out ignoring intermarker linkage disequilibrium (LD). Previously it was demonstrated by Huang et al (2005) that no bias occurs in this situation for affected sib-pairs with unrelated parents when either both parents are genotyped or genotype data is available for two additional unaffected siblings when parental genotypes are missing. However, this is not the case for consanguineous pedigrees, where missing genotype data for any pedigree member within a consanguinity loop can increase false-positive evidence of linkage. The false-positive evidence for linkage is further increased when cryptic consanguinity is present. The amount of false-positive evidence for linkage is highly dependent on which family members are genotyped. When parental genotype data is available, the false-positive evidence for linkage is usually not as strong as when parental genotype data is unavailable. Which family members will aid in the reduction of false-positive evidence of linkage is highly dependent on which other family members are genotyped. For a pedigree with an affected proband whose first-cousin parents have been genotyped, further reduction in the false-positive evidence of linkage can be obtained by including genotype data from additional affected siblings of the proband or genotype data from the proband's sibling-grandparents. When parental genotypes are not available, false-positive evidence for linkage can be reduced by including in the analysis genotype data from either unaffected siblings of the proband or the proband's married-in-grandparents. Najaf Amin 1 , Yurii Aulchenko 130 Department of Epidemiology & Biostatistics, Erasmus Medical Centre Rotterdam, The Netherlands Keywords: Genomic Control, pedigree structure, quantitative traits The Genomic Control (GC) method was originally developed to control for population stratification and cryptic relatedness in association studies. This method assumes that the effect of population substructure on the test statistics is essentially constant across the genome, and therefore unassociated markers can be used to estimate the effect of confounding onto the test statistic. The properties of GC method were extensively investigated for different stratification scenarios, and compared to alternative methods, such as the transmission-disequilibrium test. The potential of this method to correct not for occasional cryptic relations, but for regular pedigree structure, however, was not investigated before. In this work we investigate the potential of the GC method for pedigree-based association analysis of quantitative traits. The power and type one error of the method was compared to standard methods, such as the measured genotype (MG) approach and quantitative trait transmission-disequilibrium test. In human pedigrees, with trait heritability varying from 30 to 80%, the power of MG and GC approach was always higher than that of TDT. GC had correct type 1 error and its power was close to that of MG under moderate heritability (30%), but decreased with higher heritability. William Astle 1 , Chris Holmes 2 , David Balding 131 Department of Epidemiology and Public Health, Imperial College London, UK 32 Department of Statistics, University of Oxford, UK Keywords: Population structure, association studies, genetic epidemiology, statistical genetics In the analysis of population association studies, Genomic Control (Devlin & Roeder, 1999) (GC) adjusts the Armitage test statistic to correct the type I error for the effects of population substructure, but its power is often sub-optimal. Turbo Genomic Control (TGC) generalises GC to incorporate co-variation of relatedness and phenotype, retaining control over type I error while improving power. TGC is similar to the method of Yu et al. (2006), but we extend it to binary (case-control) in addition to quantitative phenotypes, we implement improved estimation of relatedness coefficients, and we derive an explicit statistic that generalizes the Armitage test statistic and is fast to compute. TGC also has similarities to EIGENSTRAT (Price et al., 2006) which is a new method based on principle components analysis. The problems of population structure(Clayton et al., 2005) and cryptic relatedness (Voight & Pritchard, 2005) are essentially the same: if patterns of shared ancestry differ between cases and controls, whether distant (coancestry) or recent (cryptic relatedness), false positives can arise and power can be diminished. With large numbers of widely-spaced genetic markers, coancestry can now be measured accurately for each pair of individuals via patterns of allele-sharing. Instead of modelling subpopulations, we work instead with a coancestry coefficient for each pair of individuals in the study. We explain the relationships between TGC, GC and EIGENSTRAT. We present simulation studies and real data analyses to illustrate the power advantage of TGC in a range of scenarios incorporating both substructure and cryptic relatedness. References Clayton, D. G.et al. (2005) Population structure, differential bias and genomic control in a large-scale case-control association study. Nature Genetics37(11) November 2005. Devlin, B. & Roeder, K. (1999) Genomic control for association studies. Biometics55(4) December 1999. Price, A. L.et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics38(8) (August 2006). Voight, B. J. & Pritchard, J. K. (2005) Confounding from cryptic relatedness in case-control association studies. Public Library of Science Genetics1(3) September 2005. Yu, J.et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics38(2) February 2006. Hervé Perdry 1 , Marie-Claude Babron 1 , Françoise Clerget-Darpoux 133 INSERM U535 and Univ. Paris Sud, UMR-S 535, Villejuif, France Keywords: Modifier genes, case-parents trios, ordered transmission disequilibrium test A modifying locus is a polymorphic locus, distinct from the disease locus, which leads to differences in the disease phenotype, either by modifying the penetrance of the disease allele, or by modifying the expression of the disease. The effect of such a locus is a clinical heterogeneity that can be reflected by the values of an appropriate covariate, such as the age of onset, or the severity of the disease. We designed the Ordered Transmission Disequilibrium Test (OTDT) to test for a relation between the clinical heterogeneity, expressed by the covariate, and marker genotypes of a candidate gene. The method applies to trio families with one affected child and his parents. Each family member is genotyped at a bi-allelic marker M of a candidate gene. To each of the families is associated a covariate value; the families are ordered on the values of this covariate. As the TDT (Spielman et al. 1993), the OTDT is based on the observation of the transmission rate T of a given allele at M. The OTDT aims to find a critical value of the covariate which separates the sample of families in two subsamples in which the transmission rates are significantly different. We investigate the power of the method by simulations under various genetic models and covariate distributions. Acknowledgments H Perdry is funded by ARSEP. Pascal Croiseau 1 , Heather Cordell 2 , Emmanuelle Génin 134 INSERM U535 and University Paris Sud, UMR-S535, Villejuif, France 35 Institute of Human Genetics, Newcastle University, UK Keywords: Association, missing data, conditionnal logistic regression Missing data is an important problem in association studies. Several methods used to test for association need that individuals be genotyped at the full set of markers. Individuals with missing data need to be excluded from the analysis. This could involve an important decrease in sample size and a loss of information. If the disease susceptibility locus (DSL) is poorly typed, it is also possible that a marker in linkage disequilibrium gives a stronger association signal than the DSL. One may then falsely conclude that the marker is more likely to be the DSL. We recently developed a Multiple Imputation method to infer missing data on case-parent trios Starting from the observed data, a few number of complete data sets are generated by Markov-Chain Monte Carlo approach. These complete datasets are analysed using standard statistical package and the results are combined as described in Little & Rubin (2002). Here we report the results of simulations performed to examine, for different patterns of missing data, how often the true DSL gives the highest association score among different loci in LD. We found that multiple imputation usually correctly detect the DSL site even if the percentage of missing data is high. This is not the case for the naďve approach that consists in discarding trios with missing data. In conclusion, Multiple imputation presents the advantage of being easy to use and flexible and is therefore a promising tool in the search for DSL involved in complex diseases. Salma Kotti 1 , Heike Bickeböller 2 , Françoise Clerget-Darpoux 136 University Paris Sud, UMR-S535, Villejuif, France 37 Department of Genetic Epidemiology, Medical School, University of Göttingen, Germany Keywords: Genotype relative risk, internal controls, Family based analyses Family based analyses using internal controls are very popular both for detecting the effect of a genetic factor and for estimating the relative disease risk on the corresponding genotypes. Two different procedures are often applied to reconstitute internal controls. The first one considers one pseudocontrol genotype formed by the parental non-transmitted alleles called also 1:1 matching of alleles, while the second corresponds to three pseudocontrols corresponding to all genotypes formed by the parental alleles except the one of the case (1:3 matching). Many studies have compared between the two procedures in terms of the power and have concluded that the difference depends on the underlying genetic model and the allele frequencies. However, the estimation of the Genotype Relative Risk (GRR) under the two procedures has not been studied. Based on the fact that on the 1:1 matching, the control group is composed of the alleles untransmitted to the affected child and on the 1:3 matching, the control group is composed amongst alleles already transmitted to the affected child, we expect a difference on the GRR estimation. In fact, we suspect that the second procedure leads to biased estimation of the GRRs. We will analytically derive the GRR estimators for the 1:1 and 1:3 matching and will present the results at the meeting. Family based analyses using internal controls are very popular both for detecting the effect of a genetic factor and for estimating the relative disease risk on the corresponding genotypes. Two different procedures are often applied to reconstitute internal controls. The first one considers one pseudocontrol genotype formed by the parental non-transmitted alleles called also 1:1 matching of alleles, while the second corresponds to three pseudocontrols corresponding to all genotypes formed by the parental alleles except the one of the case (1:3 matching). Many studies have compared between the two procedures in terms of the power and have concluded that the difference depends on the underlying genetic model and the allele frequencies. However, the estimation of the Genotype Relative Risk (GRR) under the two procedures has not been studied. Based on the fact that on the 1:1 matching, the control group is composed of the alleles untransmitted to the affected child and on the 1:3 matching, the control group is composed amongst alleles already transmitted to the affected child, we expect a difference on the GRR estimation. In fact, we suspect that the second procedure leads to biased estimation of the GRR. We will analytically derive the GRR estimator for the 1:1 and 1:3 matching and will present the results at the meeting. Luigi Palla 1 , David Siegmund 239 Department of Mathematics,Free University Amsterdam, The Netherlands 40 Department of Statistics, Stanford University, California, USA Keywords: TDT, assortative mating, inbreeding, statistical power A substantial amount of Assortative Mating (AM) is often recorded on physical and psychological, dichotomous as well as quantitative traits that are supposed to have a multifactorial genetic component. In particular AM has the effect of increasing the genetic variance, even more than inbreeding because it acts across loci beside within loci, when the trait has a multifactorial origin. Under the assumption of a polygenic model for AM dating back to Wright (1921) and refined by Crow and Felsenstein (1968,1982), the effect of assortative mating on the power to detect genetic association in the Transmission Disequilibrium Test (TDT) is explored as parameters, such as the effective number of genes and the allelic frequency vary. The power is reflected by the non centrality parameter of the TDT and is expressed as a function of the number of trios, the relative risk of the heterozygous genotype and the allele frequency (Siegmund and Yakir, 2007). The noncentrality parameter of the relevant score statistic is updated considering the effect of AM which is expressed in terms of an ,effective' inbreeding coefficient. In particular, for dichotomous traits it is apparent that the higher the number of genes involved in the trait, the lower the loss in power due to AM. Finally an attempt is made to extend this relation to the Q-TDT (Rabinowitz, 1997), which involves considering the effect of AM also on the phenotypic variance of the trait of interest, under the assumption that AM affects only its additive genetic component. References Crow, & Felsenstein, (1968). The effect of assortative mating on the genetic composition of a population. Eugen.Quart.15, 87,97. Rabinowitz,, 1997. A Transmission Disequilibrium Test for Quantitative Trait Loci. Human Heredity47, 342,350. Siegmund, & Yakir, (2007) Statistics of gene mapping, Springer. Wright, (1921). System of mating.III. Assortative mating based on somatic resemblance. Genetics6, 144,161. Jérémie Nsengimana 1 , Ben D Brown 2 , Alistair S Hall 2 , Jenny H Barrett 141 Leeds Institute of Molecular Medicine, University of Leeds, UK 42 Leeds Institute for Genetics, Health and Therapeutics, University of Leeds, UK Keywords: Inflammatory genes, haplotype, coronary artery disease Genetic Risk of Acute Coronary Events (GRACE) is an initiative to collect cases of coronary artery disease (CAD) and their unaffected siblings in the UK and to use them to map genetic variants increasing disease risk. The aim of the present study was to test the association between CAD and 51 single nucleotide polymorphisms (SNPs) and their haplotypes from 35 inflammatory genes. Genotype data were available for 1154 persons affected before age 66 (including 48% before age 50) and their 1545 unaffected siblings (891 discordant families). Each SNP was tested for association to CAD, and haplotypes within genes or gene clusters were tested using FBAT (Rabinowitz & Laird, 2000). For the most significant results, genetic effect size was estimated using conditional logistic regression (CLR) within STATA adjusting for other risk factors. Haplotypes were assigned using HAPLORE (Zhang et al., 2005), which considers all parental mating types consistent with offspring genotypes and assigns them a probability of occurence. This probability was used in CLR to weight the haplotypes. In the single SNP analysis, several SNPs showed some evidence for association, including one SNP in the interleukin-1A gene. Analysing haplotypes in the interleukin-1 gene cluster, a common 3-SNP haplotype was found to increase the risk of CAD (P = 0.009). In an additive genetic model adjusting for covariates the odds ratio (OR) for this haplotype is 1.56 (95% CI: 1.16-2.10, p = 0.004) for early-onset CAD (before age 50). This study illustrates the utility of haplotype analysis in family-based association studies to investigate candidate genes. References Rabinowitz, D. & Laird, N. M. (2000) Hum Hered50, 211,223. Zhang, K., Sun, F. & Zhao, H. (2005) Bioinformatics21, 90,103. Andrea Foulkes 1 , Recai Yucel 1 , Xiaohong Li 143 Division of Biostatistics, University of Massachusetts, USA Keywords: Haploytpe, high-dimensional, mixed modeling The explosion of molecular level information coupled with large epidemiological studies presents an exciting opportunity to uncover the genetic underpinnings of complex diseases; however, several analytical challenges remain to be addressed. Characterizing the components to complex diseases inevitably requires consideration of synergies across multiple genetic loci and environmental and demographic factors. In addition, it is critical to capture information on allelic phase, that is whether alleles within a gene are in cis (on the same chromosome) or in trans (on different chromosomes.) In associations studies of unrelated individuals, this alignment of alleles within a chromosomal copy is generally not observed. We address the potential ambiguity in allelic phase in this high dimensional data setting using mixed effects models. Both a semi-parametric and fully likelihood-based approach to estimation are considered to account for missingness in cluster identifiers. In the first case, we apply a multiple imputation procedure coupled with a first stage expectation maximization algorithm for parameter estimation. A bootstrap approach is employed to assess sensitivity to variability induced by parameter estimation. Secondly, a fully likelihood-based approach using an expectation conditional maximization algorithm is described. Notably, these models allow for characterizing high-order gene-gene interactions while providing a flexible statistical framework to account for the confounding or mediating role of person specific covariates. The proposed method is applied to data arising from a cohort of human immunodeficiency virus type-1 (HIV-1) infected individuals at risk for therapy associated dyslipidemia. Simulation studies demonstrate reasonable power and control of family-wise type 1 error rates. Vivien Marquard 1 , Lars Beckmann 1 , Jenny Chang-Claude 144 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany Keywords: Genotyping errors, type I error, haplotype-based association methods It has been shown in several simulation studies that genotyping errors may have a great impact on the type I error of statistical methods used in genetic association analysis of complex diseases. Our aim was to investigate type I error rates in a case-control study, when differential and non-differential genotyping errors were introduced in realistic scenarios. We simulated case-control data sets, where individual genotypes were drawn from a haplotype distribution of 18 haplotypes with 15 markers in the APM1 gene. Genotyping errors were introduced following the unrestricted and symmetric with 0 edges error models described by Heid et al. (2006). In six scenarios, errors resulted from changes of one allele to another with predefined probabilities of 1%, 2.5% or 10%, respectively. A multiple number of errors per haplotype was possible and could vary between 0 and 15, the number of markers investigated. We examined three association methods: Mantel statistics using haplotype-sharing; a haplotype-specific score test; and Armitage trend test for single markers. The type I error rates were not influenced for any of all the three methods for a genotyping error rate of less than 1%. For higher error rates and differential errors, the type I error of the Mantel statistic was only slightly and of the Armitage trend test moderately increased. The type I error rates of the score test were highly increased. The type I error rates were correct for all three methods for non-differential errors. Further investigations will be carried out with different frequencies of differential error rates and focus on power. Arne Neumann 1 , Dörthe Malzahn 1 , Martina Müller 2 , Heike Bickeböller 145 Department of Genetic Epidemiology, Medical School, University of Göttingen, Germany 46 GSF-National Research Center for Environment and Health, Neuherberg & IBE-Institute of Epidemiology, Ludwig-Maximilians University München, Germany Keywords: Interaction, longitudinal, nonparametric Longitudinal data show the time dependent course of phenotypic traits. In this contribution, we consider longitudinal cohort studies and investigate the association between two candidate genes and a dependent quantitative longitudinal phenotype. The set-up defines a factorial design which allows us to test simultaneously for the overall gene effect of the loci as well as for possible gene-gene and gene time interaction. The latter would induce genetically based time-profile differences in the longitudinal phenotype. We adopt a non-parametric statistical test to genetic epidemiological cohort studies and investigate its performance by simulation studies. The statistical test was originally developed for longitudinal clinical studies (Brunner, Munzel, Puri, 1999 J Multivariate Anal 70:286-317). It is non-parametric in the sense that no assumptions are made about the underlying distribution of the quantitative phenotype. Longitudinal observations belonging to the same individual can be arbitrarily dependent on one another for the different time points whereas trait observations of different individuals are independent. The two loci are assumed to be statistically independent. Our simulations show that the nonparametric test is comparable with ANOVA in terms of power of detecting gene-gene and gene-time interaction in an ANOVA favourable setting. Rebecca Hein 1 , Lars Beckmann 1 , Jenny Chang-Claude 147 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany Keywords: Indirect association studies, interaction effects, linkage disequilibrium, marker allele frequency Association studies accounting for gene-environment interactions (GxE) may be useful for detecting genetic effects and identifying important environmental effect modifiers. Current technology facilitates very dense marker spacing in genetic association studies; however, the true disease variant(s) may not be genotyped. In this situation, an association between a gene and a phenotype may still be detectable, using genetic markers associated with the true disease variant(s) (indirect association). Zondervan and Cardon [2004] showed that the odds ratios (OR) of markers which are associated with the disease variant depend highly on the linkage disequilibrium (LD) between the variant and the markers, and whether the allele frequencies match and thereby influence the sample size needed to detect genetic association. We examined the influence of LD and allele frequencies on the sample size needed to detect GxE in indirect association studies, and provide tables for sample size estimation. For discordant allele frequencies and incomplete LD, sample sizes can be unfeasibly large. The influence of both factors is stronger for disease loci with small rather than moderate to high disease allele frequencies. A decline in D' of e.g. 5% has less impact on sample size than increasing the difference in allele frequencies by the same percentage. Assuming 80% power, large interaction effects can be detected using smaller sample sizes than those needed for the detection of main effects. The detection of interaction effects involving rare alleles may not be possible. Focussing only on marker density can be a limited strategy in indirect association studies for GxE. Cyril Dalmasso 1 , Emmanuelle Génin 2 , Catherine Bourgain 2 , Philippe Broët 148 JE 2492 , Univ. Paris-Sud, France 49 INSERM UMR-S 535 and University Paris Sud, Villejuif, France Keywords: Linkage analysis, Multiple testing, False Discovery Rate, Mixture model In the context of genome-wide linkage analyses, where a large number of statistical tests are simultaneously performed, the False Discovery Rate (FDR) that is defined as the expected proportion of false discoveries among all discoveries is nowadays widely used for taking into account the multiple testing problem. Other related criteria have been considered such as the local False Discovery Rate (lFDR) that is a variant of the FDR giving to each test its own measure of significance. The lFDR is defined as the posterior probability that a null hypothesis is true. Most of the proposed methods for estimating the lFDR or the FDR rely on distributional assumption under the null hypothesis. However, in observational studies, the empirical null distribution may be very different from the theoretical one. In this work, we propose a mixture model based approach that provides estimates of the lFDR and the FDR in the context of large-scale variance component linkage analyses. In particular, this approach allows estimating the empirical null distribution, this latter being a key quantity for any simultaneous inference procedure. The proposed method is applied on a real dataset. Arief Gusnanto 1 , Frank Dudbridge 150 MRC Biostatistics Unit, Cambridge UK Keywords: Significance, genome-wide, association, permutation, multiplicity Genome-wide association scans have introduced statistical challenges, mainly in the multiplicity of thousands of tests. The question of what constitutes a significant finding remains somewhat unresolved. Permutation testing is very time-consuming, whereas Bayesian arguments struggle to distinguish direct from indirect association. It seems attractive to summarise the multiplicity in a simple form that allows users to avoid time-consuming permutations. A standard significance level would facilitate reporting of results and reduce the need for permutation tests. This is potentially important because current scans do not have full coverage of the whole genome, and yet, the implicit multiplicity is genome-wide. We discuss some proposed summaries, with reference to the empirical null distribution of the multiple tests, approximated through a large number of random permutations. Using genome-wide data from the Wellcome Trust Case-Control Consortium, we use a sub-sampling approach with increasing density to estimate the nominal p-value to obtain family-wise significance of 5%. The results indicate that the significance level is converging to about 1e-7 as the marker spacing becomes infinitely dense. We considered the concept of an effective number of independent tests, and showed that when used in a Bonferroni correction, the number varies with the overall significance level, but is roughly constant in the region of interest. We compared several estimators of the effective number of tests, and showed that in the region of significance of interest, Patterson's eigenvalue based estimator gives approximately the right family-wise error rate. Michael Nothnagel 1 , Amke Caliebe 1 , Michael Krawczak 151 Institute of Medical Informatics and Statistics, University Clinic Schleswig-Holstein, University of Kiel, Germany Keywords: Association scans, Bayesian framework, posterior odds, genetic risk, multiplicative model Whole-genome association scans have been suggested to be a cost-efficient way to survey genetic variation and to map genetic disease factors. We used a Bayesian framework to investigate the posterior odds of a genuine association under multiplicative disease models. We demonstrate that the p value alone is not a sufficient means to evaluate the findings in association studies. We suggest that likelihood ratios should accompany p values in association reports. We argue, that, given the reported results of whole-genome scans, more associations should have been successfully replicated if the consistently made assumptions about considerable genetic risks were correct. We conclude that it is very likely that the vast majority of relative genetic risks are only of the order of 1.2 or lower. Clive Hoggart 1 , Maria De Iorio 1 , John Whittakker 2 , David Balding 152 Department of Epidemiology and Public Health, Imperial College London, UK 53 Department of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, UK Keywords: Genome-wide association analyses, shrinkage priors, Lasso Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants of small effect, which is a plausible scenario for many complex diseases. Moreover, many simulation studies assume a single causal variant and so more complex realities are ignored. Analysing large numbers of variants simultaneously is now becoming feasible, thanks to developments in Bayesian stochastic search methods. We pose the problem of SNP selection as variable selection in a regression model. In contrast to single SNP tests this approach simultaneously models the effect of all SNPs. SNPs are selected by a Bayesian interpretation of the lasso (Tibshirani, 1996); the maximum a posterior (MAP) estimate of the regression coefficients, which have been given independent, double exponential prior distributions. The double exponential distribution is an example of a shrinkage prior, MAP estimates with shrinkage priors can be zero, thus all SNPs with non zero regression coefficients are selected. In addition to the commonly-used double exponential (Laplace) prior, we also implement the normal exponential gamma prior distribution. We show that use of the Laplace prior improves SNP selection in comparison with single -SNP tests, and that the normal exponential gamma prior leads to a further improvement. Our method is fast and can handle very large numbers of SNPs: we demonstrate its performance using both simulated and real genome-wide data sets with 500 K SNPs, which can be analysed in 2 hours on a desktop workstation. Mickael Guedj 1,2 , Jerome Wojcik 2 , Gregory Nuel 154 Laboratoire Statistique et Génome, Université d'Evry, Evry France 55 Serono Pharmaceutical Research Institute, Plan-les-Ouates, Switzerland Keywords: Local Replication, Local Score, Association In gene-mapping, replication of initial findings has been put forwards as the approach of choice for filtering false-positives from true signals for underlying loci. In practice, such replications are however too poorly observed. Besides the statistical and technical-related factors (lack of power, multiple-testing, stratification, quality control,) inconsistent conclusions obtained from independent populations might result from real biological differences. In particular, the high degree of variation in the strength of LD among populations of different origins is a major challenge to the discovery of genes. Seeking for Local Replications (defined as the presence of a signal of association in a same genomic region among populations) instead of strict replications (same locus, same risk allele) may lead to more reliable results. Recently, a multi-markers approach based on the Local Score statistic has been proposed as a simple and efficient way to select candidate genomic regions at the first stage of genome-wide association studies. Here we propose an extension of this approach adapted to replicated association studies. Based on simulations, this method appears promising. In particular it outperforms classical simple-marker strategies to detect modest-effect genes. Additionally it constitutes, to our knowledge, a first framework dedicated to the detection of such Local Replications. Juliet Chapman 1 , Claudio Verzilli 1 , John Whittaker 156 Department of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, UK Keywords: FDR, Association studies, Bayesian model selection As genomewide association studies become commonplace there is debate as to how such studies might be analysed and what we might hope to gain from the data. It is clear that standard single locus approaches are limited in that they do not adjust for the effects of other loci and problematic since it is not obvious how to adjust for multiple comparisons. False discovery rates have been suggested, but it is unclear how well these will cope with highly correlated genetic data. We consider the validity of standard false discovery rates in large scale association studies. We also show that a Bayesian procedure has advantages in detecting causal loci amongst a large number of dependant SNPs and investigate properties of a Bayesian FDR. Peter Kraft 157 Harvard School of Public Health, Boston USA Keywords: Gene-environment interaction, genome-wide association scans Appropriately analyzed two-stage designs,where a subset of available subjects are genotyped on a genome-wide panel of markers at the first stage and then a much smaller subset of the most promising markers are genotyped on the remaining subjects,can have nearly as much power as a single-stage study where all subjects are genotyped on the genome-wide panel yet can be much less expensive. Typically, the "most promising" markers are selected based on evidence for a marginal association between genotypes and disease. Subsequently, the few markers found to be associated with disease at the end of the second stage are interrogated for evidence of gene-environment interaction, mainly to understand their impact on disease etiology and public health impact. However, this approach may miss variants which have a sizeable effect restricted to one exposure stratum and therefore only a modest marginal effect. We have proposed to use information on the joint effects of genes and a discrete list of environmental exposures at the initial screening stage to select promising markers for the second stage [Kraft et al Hum Hered 2007]. This approach optimizes power to detect variants that have a sizeable marginal effect and variants that have a small marginal effect but a sizeable effect in a stratum defined by an environmental exposure. As an example, I discuss a proposed genome-wide association scan for Type II diabetes susceptibility variants based in several large nested case-control studies. Beate Glaser 1 , Peter Holmans 158 Biostatistics and Bioinformatics Unit, Cardiff University, School of Medicine, Heath Park, Cardiff, UK Keywords: Combined case-control and trios analysis, Power, False-positive rate, Simulation, Association studies The statistical power of genetic association studies can be enhanced by combining the analysis of case-control with parent-offspring trio samples. Various combined analysis techniques have been recently developed; as yet, there have been no comparisons of their power. This work was performed with the aim of identifying the most powerful method among available combined techniques including test statistics developed by Kazeem and Farrall (2005), Nagelkerke and colleagues (2004) and Dudbridge (2006), as well as a simple combination of ,2-statistics from single samples. Simulation studies were performed to investigate their power under different additive, multiplicative, dominant and recessive disease models. False-positive rates were determined by studying the type I error rates under null models including models with unequal allele frequencies between the single case-control and trios samples. We identified three techniques with equivalent power and false-positive rates, which included modifications of the three main approaches: 1) the unmodified combined Odds ratio estimate by Kazeem & Farrall (2005), 2) a modified approach of the combined risk ratio estimate by Nagelkerke & colleagues (2004) and 3) a modified technique for a combined risk ratio estimate by Dudbridge (2006). Our work highlights the importance of studies investigating test performance criteria of novel methods, as they will help users to select the optimal approach within a range of available analysis techniques. David Almorza 1 , M.V. Kandus 2 , Juan Carlos Salerno 2 , Rafael Boggio 359 Facultad de Ciencias del Trabajo, University of Cádiz, Spain 60 Instituto de Genética IGEAF, Buenos Aires, Argentina 61 Universidad Nacional de La Plata, Buenos Aires, Argentina Keywords: Principal component analysis, maize, ear weight, inbred lines The objective of this work was to evaluate the relationship among different traits of the ear of maize inbred lines and to group genotypes according to its performance. Ten inbred lines developed at IGEAF (INTA Castelar) and five public inbred lines as checks were used. A field trial was carried out in Castelar, Buenos Aires (34° 36' S , 58° 39' W) using a complete randomize design with three replications. At harvest, individual weight (P.E.), diameter (D.E.), row number (N.H.) and length (L.E.) of the ear were assessed. A principal component analysis, PCA, (Infostat 2005) was used, and the variability of the data was depicted with a biplot. Principal components 1 and 2 (CP1 and CP2) explained 90% of the data variability. CP1 was correlated with P.E., L.E. and D.E., meanwhile CP2 was correlated with N.H. We found that individual weight (P.E.) was more correlated with diameter of the ear (D.E.) than with length (L.E). Five groups of inbred lines were distinguished: with high P.E. and mean N.H. (04-70, 04-73, 04-101 and MO17), with high P.E. but less N.H. (04-61 and B14), with mean P.E. and N.H. (B73, 04-123 and 04-96), with high N.H. but less P.E. (LP109, 04-8, 04-91 and 04-76) and with low P.E. and low N.H. (LP521 and 04-104). The use of PCA showed which variables had more incidence in ear weight and how is the correlation among them. Moreover, the different groups found with this analysis allow the evaluation of inbred lines by several traits simultaneously. Sven Knüppel 1 , Anja Bauerfeind 1 , Klaus Rohde 162 Department of Bioinformatics, MDC Berlin, Germany Keywords: Haplotypes, association studies, case-control, nuclear families The area of gene chip technology provides a plethora of phase-unknown SNP genotypes in order to find significant association to some genetic trait. To circumvent possibly low information content of a single SNP one groups successive SNPs and estimates haplotypes. Haplotype estimation, however, may reveal ambiguous haplotype pairs and bias the application of statistical methods. Zaykin et al. (Hum Hered, 53:79-91, 2002) proposed the construction of a design matrix to take this ambiguity into account. Here we present a set of functions written for the Statistical package R, which carries out haplotype estimation on the basis of the EM-algorithm for individuals (case-control) or nuclear families. The construction of a design matrix on basis of estimated haplotypes or haplotype pairs allows application of standard methods for association studies (linear, logistic regression), as well as statistical methods as haplotype sharing statistics and TDT-Test. Applications of these methods to genome-wide association screens will be demonstrated. Manuela Zucknick 1 , Chris Holmes 2 , Sylvia Richardson 163 Department of Epidemiology and Public Health, Imperial College London, UK 64 Department of Statistics, Oxford Center for Gene Function, University of Oxford, UK Keywords: Bayesian, variable selection, MCMC, large p, small n, structured dependence In large-scale genomic applications vast numbers of markers or genes are scanned to find a few candidates which are linked to a particular phenotype. Statistically, this is a variable selection problem in the "large p, small n" situation where many more variables than samples are available. An additional feature is the complex dependence structure which is often observed among the markers/genes due to linkage disequilibrium or their joint involvement in biological processes. Bayesian variable selection methods using indicator variables are well suited to the problem. Binary phenotypes like disease status are common and both Bayesian probit and logistic regression can be applied in this context. We argue that logistic regression models are both easier to tune and to interpret than probit models and implement the approach by Holmes & Held (2006). Because the model space is vast, MCMC methods are used as stochastic search algorithms with the aim to quickly find regions of high posterior probability. In a trade-off between fast-updating but slow-moving single-gene Metropolis-Hastings samplers and computationally expensive full Gibbs sampling, we propose to employ the dependence structure among the genes/markers to help decide which variables to update together. Also, parallel tempering methods are used to aid bold moves and help avoid getting trapped in local optima. Mixing and convergence of the resulting Markov chains are evaluated and compared to standard samplers in both a simulation study and in an application to a gene expression data set. Reference Holmes, C. C. & Held, L. (2006) Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis1, 145,168. Dawn Teare 165 MMGE, University of Sheffield, UK Keywords: CNP, family-based analysis, MCMC Evidence is accumulating that segmental copy number polymorphisms (CNPs) may represent a significant portion of human genetic variation. These highly polymorphic systems require handling as phenotypes rather than co-dominant markers, placing new demands on family-based analyses. We present an integrated approach to meet these challenges in the form of a graphical model, where the underlying discrete CNP phenotype is inferred from the (single or replicate) quantitative measure within the analysis, whilst assuming an allele based system segregating through the pedigree. [source] Variations in high-density lipoprotein cholesterol in relation to physical activity and Taq 1B polymorphism of the cholesteryl ester transfer protein geneCLINICAL GENETICS, Issue 5 2004M Mukherjee The aim of the study was to determine any association of physical activity and Taq 1B polymorphism in the cholesteryl ester transfer protein gene on high-density lipoprotein (HDL) cholesterol. Five hundred and four subjects, 390 males and 114 females consisting of an equal number of age- and sex-matched healthy controls and patients with coronary artery disease, were included. The mean age (ąSD) of the patients and controls were 57.5 ą 10.6 years and 56.8 ą 11.0 years, respectively. All the patients underwent coronary angiography; 33, 58, 63, and 98 patients had normal coronaries, single-, two-, or triple-vessel disease, respectively. A third of the patients had suffered from a myocardial infarction. The genotype distribution conforming to Hardy,Weinberg equilibrium was similar for cases and controls. The mean HDL cholesterol increased from B1B1 through B2B2 genotype in controls and sedentary male patients. Self-reported leisure time physical activity, consisting mostly of an hour of morning walk daily, was associated with a rise in mean HDL cholesterol in male controls (33.6 ą 7.9 mg/dl to 36.2 ą 8.9 mg/dl, p = 0.037) and patients (32.4 ą 7.9 mg/dl to 35.7 ą 11.0 mg/dl; p = 0.018). The exercise-associated rise in HDL cholesterol was most pronounced in controls (32.1 ą 9.1 mg/dl to 36.8 ą 9.3 mg/dl, p = 0.05) and male patients (30.5 ą 7.4 mg/dl to 37.2 ą 9.7 mg/dl, p = 0.007) with B1B1 rather than B1B2 or B2B2 genotype. The results suggest a possible gene-environment interaction in the regulation of HDL cholesterol that needs to be confirmed in other populations and larger samples to rule out a chance occurrence. [source] REVIEW: Consilient research approaches in studying gene × environment interactions in alcohol researchADDICTION BIOLOGY, Issue 2 2010Kenneth J. Sher ABSTRACT This review article discusses the importance of identifying gene-environment interactions for understanding the etiology and course of alcohol use disorders and related conditions. A number of critical challenges are discussed, including the fact that there is no organizing typology for classifying different types of environmental exposures, many key human environmental risk factors for alcohol dependence have no clear equivalents in other species, much of the genetic variance of alcohol dependence in human is not ,alcohol specific', and the potential range of gene-environment interactions that could be considered is so vast that maintaining statistical control of Type 1 errors is a daunting task. Despite these and other challenges, there appears to be a number of promising approaches that could be taken in order to achieve consilience and ecologically valid translation between human alcohol dependence and animal models. Foremost among these is to distinguish environmental exposures that are thought to have enduring effects on alcohol use motivation (and self-regulation) from situational environmental exposures that facilitate the expression of such motivations but do not, by themselves, have enduring effects. In order to enhance consilience, various domains of human approach motivation should be considered so that relevant environmental exposures can be sampled, as well as the appropriate species to study them in (i.e. where such motivations are ecologically relevant). Foremost among these are social environments, which are central to the initiation and escalation of human alcohol consumption. The value of twin studies, human laboratory studies and pharmacogenetic studies is also highlighted. [source] The evolutionary genetics of personality,EUROPEAN JOURNAL OF PERSONALITY, Issue 5 2007Lars Penke Abstract Genetic influences on personality differences are ubiquitous, but their nature is not well understood. A theoretical framework might help, and can be provided by evolutionary genetics. We assess three evolutionary genetic mechanisms that could explain genetic variance in personality differences: selective neutrality, mutation-selection balance, and balancing selection. Based on evolutionary genetic theory and empirical results from behaviour genetics and personality psychology, we conclude that selective neutrality is largely irrelevant, that mutation-selection balance seems best at explaining genetic variance in intelligence, and that balancing selection by environmental heterogeneity seems best at explaining genetic variance in personality traits. We propose a general model of heritable personality differences that conceptualises intelligence as fitness components and personality traits as individual reaction norms of genotypes across environments, with different fitness consequences in different environmental niches. We also discuss the place of mental health in the model. This evolutionary genetic framework highlights the role of gene-environment interactions in the study of personality, yields new insight into the person-situation-debate and the structure of personality, and has practical implications for both quantitative and molecular genetic studies of personality. Copyright Š 2007 John Wiley & Sons, Ltd. [source] Effect of including environmental data in investigations of gene-disease associations in the presence of qualitative interactionsGENETIC EPIDEMIOLOGY, Issue 6 2010Elizabeth Williamson Abstract Complex diseases are likely to be caused by the interplay of genetic and environmental factors. Despite this, gene-disease associations are frequently investigated using models that focus solely on a marginal gene effect, ignoring environmental factors entirely. Failing to take into account a gene-environment interaction can weaken the apparent gene-disease association, leading to loss in statistical power and, potentially, inability to identify genuine risk factors. If a gene-environment interaction exists, therefore, a joint analysis allowing the effect of the gene to differ between groups defined by the environmental exposure can have greater statistical power than a marginal gene-disease model. However, environmental data are subject to measurement error. Substantial losses in statistical power for detecting gene-environment interactions can arise from measurement error in the environmental exposure. It is unclear, however, what effect measurement error may have on the power of the joint analysis. We consider the potential benefits, in terms of statistical power, of collecting concurrent environmental data within large cohorts in order to enhance gene detection. We further consider whether these benefits remain in the presence of misclassification in both the gene and the environmental exposure. We find that when an effect of the gene is apparent only in the presence of the environmental exposure, the joint analysis has greater power than a marginal gene-disease analysis. This comparative increase in power remains in the presence of likely levels of misclassification of either the gene or environmental exposure. Genet. Epidemiol. 34:552,560, 2010. Š 2010 Wiley-Liss, Inc. [source] Detection of SNP-SNP interactions in trios of parents with schizophrenic childrenGENETIC EPIDEMIOLOGY, Issue 5 2010Qing Li Abstract Schizophrenia (SZ) is a heritable and complex psychiatric disorder with an estimated worldwide prevalence of about 1%. Research on the risk factors for SZ has thus far yielded few clues to causes, but has pointed to a heterogeneous etiology that likely involves multiple genes and gene-environment interactions. In this manuscript, we apply a novel method (trio logic regression, Li et al., 2009) to case-parent trio data from a SZ candidate gene study conducted on families of Ashkenazi Jewish descent, and demonstrate the method's ability to detect multi-gene models for SZ risk in the family-based design. In particular, we demonstrate how this method revealed a genotype-phenotype association that includes an allele without marginal effect. Genet. Epidemiol. 34: 396,406, 2010. Š 2010 Wiley-Liss, Inc. [source] Methods for detecting interactions between genetic polymorphisms and prenatal environment exposure with a mother-child designGENETIC EPIDEMIOLOGY, Issue 2 2010Shuang Wang Abstract Prenatal exposures such as polycyclic aromatic hydrocarbons and early postnatal environmental exposures are of particular concern because of the heightened susceptibility of the fetus and infant to diverse environmental pollutants. Marked inter-individual variation in response to the same level of exposure was observed in both mothers and their newborns, indicating that susceptibility might be due to genetic factors. With the mother-child pair design, existing methods developed for parent-child trio data or random sample data are either not applicable or not designed to optimally use the information. To take full advantage of this unique design, which provides partial information on genetic transmission and has both maternal and newborn outcome status collected, we developed a likelihood-based method that uses both the maternal and the newborn information together and jointly models gene-environment interactions on maternal and newborn outcomes. Through intensive simulation studies, the proposed method has demonstrated much improved power in detecting gene-environment interactions. The application on a real mother-child pair data from a study conducted in Krakow, Poland, suggested four significant gene-environment interactions after multiple comparisons adjustment. Genet. Epidemiol. 34: 125,132, 2010. Š 2009 Wiley-Liss, Inc. [source] A partial least-square approach for modeling gene-gene and gene-environment interactions when multiple markers are genotypedGENETIC EPIDEMIOLOGY, Issue 1 2009Tao Wang Abstract Genetic association studies achieve an unprecedented level of resolution in mapping disease genes by genotyping dense single nucleotype polymorphisms (SNPs) in a gene region. Meanwhile, these studies require new powerful statistical tools that can optimally handle a large amount of information provided by genotype data. A question that arises is how to model interactions between two genes. Simply modeling all possible interactions between the SNPs in two gene regions is not desirable because a greatly increased number of degrees of freedom can be involved in the test statistic. We introduce an approach to reduce the genotype dimension in modeling interactions. The genotype compression of this approach is built upon the information on both the trait and the cross-locus gametic disequilibrium between SNPs in two interacting genes, in such a way as to parsimoniously model the interactions without loss of useful information in the process of dimension reduction. As a result, it improves power to detect association in the presence of gene-gene interactions. This approach can be similarly applied for modeling gene-environment interactions. We compare this method with other approaches, the corresponding test without modeling any interaction, that based on a saturated interaction model, that based on principal component analysis, and that based on Tukey's one-degree-of-freedom model. Our simulations suggest that this new approach has superior power to that of the other methods. In an application to endometrial cancer case-control data from the Women's Health Initiative, this approach detected AKT1 and AKT2 as being significantly associated with endometrial cancer susceptibility by taking into account their interactions with body mass index. Genet. Epidemiol. 2008. Š 2008 Wiley-Liss, Inc. [source] Quantitative trait association in parent offspring trios: Extension of case/pseudocontrol method and comparison of prospective and retrospective approachesGENETIC EPIDEMIOLOGY, Issue 8 2007Eleanor Wheeler Abstract The case/pseudocontrol method provides a convenient framework for family-based association analysis of case-parent trios, incorporating several previously proposed methods such as the transmission/disequilibrium test and log-linear modelling of parent-of-origin effects. The method allows genotype and haplotype analysis at an arbitrary number of linked and unlinked multiallelic loci, as well as modelling of more complex effects such as epistasis, parent-of-origin effects, maternal genotype and mother-child interaction effects, and gene-environment interactions. Here we extend the method for analysis of quantitative as opposed to dichotomous (e.g. disease) traits. The resulting method can be thought of as a retrospective approach, modelling genotype given trait value, in contrast to prospective approaches that model trait given genotype. Through simulations and analytical derivations, we examine the power and properties of our proposed approach, and compare it to several previously proposed single-locus methods for quantitative trait association analysis. We investigate the performance of the different methods when extended to allow analysis of haplotype, maternal genotype and parent-of-origin effects. With randomly ascertained families, with or without population stratification, the prospective approach (modeling trait value given genotype) is found to be generally most effective, although the retrospective approach has some advantages with regard to estimation and interpretability of parameter estimates when applied to selected samples. Genet. Epidemiol. 2007. Š 2007 Wiley-Liss, Inc. [source] A unified framework for transmission-disequilibrium test analysis of discrete and continuous traitsGENETIC EPIDEMIOLOGY, Issue 1 2002Ying Liu Abstract This paper presents a unified framework for transmission-disequilibrium tests for discrete and continuous traits. A conditional score test is derived that maximizes power to detect small effects for any exponential family distribution, which includes binary and normal distributions, and distributions that are skewed or have non-normal kurtosis. The specific distributional form need not be specified, and the method applies to sibships of arbitrary size. Formulas for the distribution of the test statistic are given for models including complex genetic effects (additive, dominant, and recessive gene action), covariates, multiple gene models including gene-gene interactions or heterogeneity, and gene-environment interactions. We develop refinements of our method for trait-based sampling designs and multiple siblings that can have dramatic effects on power. Genet. Epidemiol. 22:26,40, 2002. Š 2002 Wiley-Liss, Inc. [source] Stratified case sampling and the use of family controlsGENETIC EPIDEMIOLOGY, Issue 3 2001Kimberly D. Siegmund Abstract We compare the asymptotic relative efficiency (ARE) of different study designs for estimating gene and gene-environment interaction effects using matched case-control data. In the sampling schemes considered, cases are selected differentially based on their family history of disease. Controls are selected either from unrelated subjects or from among the case's unaffected siblings and cousins. Parameters are estimated using weighted conditional logistic regression, where the likelihood contributions for each subject are weighted by the fraction of cases sampled sharing the same family history. Results showed that compared to random sampling, over-sampling cases with a positive family history increased the efficiency for estimating the main effect of a gene for sib-control designs (103,254% ARE) and decreased efficiency for cousin-control and population-control designs (68,94% ARE and 67,84% ARE, respectively). Population controls and random sampling of cases were most efficient for a recessive gene or a dominant gene with an relative risk less than 9. For estimating gene-environment interactions, over-sampling positive-family-history cases again led to increased efficiency using sib controls (111,180% ARE) and decreased efficiency using population controls (68,87% ARE). Using case-cousin pairs, the results differed based on the genetic model and the size of the interaction effect; biased sampling was only slightly more efficient than random sampling for large interaction effects under a dominant gene model (relative risk ratio = 8, 106% ARE). Overall, the most efficient study design for studying gene-environment interaction was the case-sib-control design with over-sampling of positive-family-history-cases. Genet. Epidemiol. 20:316,327, 2001. Š 2001 Wiley-Liss, Inc. [source] The role of gene,environment interaction in the aetiology of human cancer: examples from cancers of the large bowel, lung and breastJOURNAL OF INTERNAL MEDICINE, Issue 6 2001L. A. Mucci Abstract. Mucci LA, Wedren S, Tamimi RM, Trichopoulos D, Adami H.-O. (Harvard School of Public Health, Boston, MA, USA; and Karolinska Institutet, Stockholm, Sweden) The role of gene,environment interaction in the aetiology of human cancer: examples from cancers of the large bowel, lung and breast. J Intern Med 2001; 249: 477,493. It has become increasingly clear that cancer can be considered neither purely genetic nor purely environmental. A relatively new area of cancer research has focused on the interaction between genes and environment in the same causal mechanism. Primary candidates for gene-environment interaction studies have been genes that encode enzymes involved in the metabolism of established cancer risk factors. There are common variant forms of these genes (polymorphisms), which may alter metabolism and increase or decrease exposure to carcinogens, thus impacting the risk of cancer. We present an overview of enzymes involved in carcinogen metabolism, present epidemiological tools to evaluate gene-environment interactions, and provide examples from cancers of the breast, lung and large bowel. [source] Use of Genetic Analyses to Refine Phenotypes Related to Alcohol Tolerance and DependenceALCOHOLISM, Issue 2 2001John C. Crabbe Various explanations for the dependence on alcohol are attributed to the development of tolerance to some of alcohol's effects, alterations in sensitivity to its rewarding effects, and unknown pathologic consequences of repeated exposure. All these aspects of dependence have been modeled in laboratory rodents, and these studies have consistently shown a significant influence of genetics. Genetic mapping studies have identified the genomic location of the specific genes for some of these contributing phenotypes. In addition, studies have shown that some genes in mice seem to affect both alcohol self-administration and alcohol withdrawal severity: genetic predisposition to high levels of drinking covaries with genetic predisposition to low withdrawal severity, and vice versa. Finally, the role of genetic background on which genes are expressed is important, as are the specifics of the environment in which genetically defined animals are tested. Understanding dependence will require disentangling the multiple interactions of many contributing phenotypes, and genetic analyses are proving very helpful. However, rigorous understanding of both gene-gene and gene-environment interactions will be required to interpret genetic experiments clearly. [source] Genetic and Environmental Contributions to Racial Disparities in Preterm BirthMOUNT SINAI JOURNAL OF MEDICINE: A JOURNAL OF PERSONALIZED AND TRANSLATIONAL MEDICINE, Issue 2 2010Siobhan M. Dolan MD Abstract The preterm birth rate exceeds 12% in the United States, and preterm birth continues to be a clinical and public health challenge globally. Even though preterm birth is a major contributor to infant mortality and lifelong morbidity, there are few effective strategies to predict preterm birth and few clinical interventions to prevent it. Genomic research approaches that identify risk factors at the intersection of genetics and the environment will likely provide insights. Both genetic and environmental factors are known to contribute to the racial disparity seen in preterm birth. Through the identification of relevant gene-environment interactions that contribute to preterm birth and may underlie the racial disparity in preterm birth, research that will translate to clinical practice and ultimately prevent a number of preterm births is possible. Mt Sinai J Med 77:160,165, 2010. Š 2010 Mount Sinai School of Medicine [source] A taxonomy of biological informationOIKOS, Issue 2 2010Richard H. Wagner Reproduction, and thus information transfer across generations, is the most essential process of life, yet biologists lack a consensus on terms to define biological information. Unfortunately, multiple definitions of the same terms and other disagreements have long inhibited the development of a general framework for integrating the various categories of biological information. Currently, the only consensus is over two general categories, genetic information, which is encoded in DNA, and non-genetic information, which is extracted from the environment. Non-genetic information is the key to understanding gene-environment interactions and is the raw material of fields such as developmental plasticity, behavior, communication, social learning and cultural evolution. In effect, differences in information possessed by individuals produce phenotypic variation. We thus define biological information as ,factors that can affect the phenotype in ways that may influence fitness'. This definition encompasses all information that is potentially relevant to organisms, which includes the physical environment. Biological information can be acquired passively from genes or via processes such as epigenetics, parental effects and habitat inheritance, or actively by organisms sensing facts about their environment. The confusion over definitions mainly concerns non-genetic information, which takes many more forms than genetic information. Much of the confusion derives from definitions based on how information is used rather than on the facts from which it is extracted. We recognize that a fact becomes information once it is detected. Information can thus be viewed analogously to energy in being either potential or realized. Another source of confusion is in the use of words outside their usual meanings. We therefore present intuitive definitions and classify them according to categories of facts in a hierarchical framework. Clarifying these concepts and terms may help researchers to manipulate facts, allowing a fuller study of biological information. [source] European Mathematical Genetics Meeting, Heidelberg, Germany, 12th,13th April 2007ANNALS OF HUMAN GENETICS, Issue 4 2007Article first published online: 28 MAY 200 Saurabh Ghosh 11 Indian Statistical Institute, Kolkata, India High correlations between two quantitative traits may be either due to common genetic factors or common environmental factors or a combination of both. In this study, we develop statistical methods to extract the contribution of a common QTL to the total correlation between the components of a bivariate phenotype. Using data on bivariate phenotypes and marker genotypes for sib-pairs, we propose a test for linkage between a common QTL and a marker locus based on the conditional cross-sib trait correlations (trait 1 of sib 1 , trait 2 of sib 2 and conversely) given the identity-by-descent sharing at the marker locus. The null hypothesis cannot be rejected unless there exists a common QTL. We use Monte-Carlo simulations to evaluate the performance of the proposed test under different trait parameters and quantitative trait distributions. An application of the method is illustrated using data on two alcohol-related phenotypes from the Collaborative Study On The Genetics Of Alcoholism project. Rémi Kazma 1 , Catherine Bonaďti-Pellié 1 , Emmanuelle Génin 12 INSERM UMR-S535 and Université Paris Sud, Villejuif, 94817, France Keywords: Gene-environment interaction, sibling recurrence risk, exposure correlation Gene-environment interactions may play important roles in complex disease susceptibility but their detection is often difficult. Here we show how gene-environment interactions can be detected by investigating the degree of familial aggregation according to the exposure of the probands. In case of gene-environment interaction, the distribution of genotypes of affected individuals, and consequently the risk in relatives, depends on their exposure. We developed a test comparing the risks in sibs according to the proband exposure. To evaluate the properties of this new test, we derived the formulas for calculating the expected risks in sibs according to the exposure of probands for various values of exposure frequency, relative risk due to exposure alone, frequencies of latent susceptibility genotypes, genetic relative risks and interaction coefficients. We find that the ratio of risks when the proband is exposed and not exposed is a good indicator of the interaction effect. We evaluate the power of the test for various sample sizes of affected individuals. We conclude that this test is valuable for diseases with moderate familial aggregation, only when the role of the exposure has been clearly evidenced. Since a correlation for exposure among sibs might lead to a difference in risks among sibs in the different proband exposure strata, we also add an exposure correlation coefficient in the model. Interestingly, we find that when this correlation is correctly accounted for, the power of the test is not decreased and might even be significantly increased. Andrea Callegaro 1 , Hans J.C. Van Houwelingen 1 , Jeanine Houwing-Duistermaat 13 Dept. of Medical Statistics and Bioinformatics, Leiden University Medical Center, The Netherlands Keywords: Survival analysis, age at onset, score test, linkage analysis Non parametric linkage (NPL) analysis compares the identical by descent (IBD) sharing in sibling pairs to the expected IBD sharing under the hypothesis of no linkage. Often information is available on the marginal cumulative hazards (for example breast cancer incidence curves). Our aim is to extend the NPL methods by taking into account the age at onset of selected sibling pairs using these known marginal hazards. Li and Zhong (2002) proposed a (retrospective) likelihood ratio test based on an additive frailty model for genetic linkage analysis. From their model we derive a score statistic for selected samples which turns out to be a weighed NPL method. The weights depend on the marginal cumulative hazards and on the frailty parameter. A second approach is based on a simple gamma shared frailty model. Here, we simply test whether the score function of the frailty parameter depends on the excess IBD. We compare the performance of these methods using simulated data. Céline Bellenguez 1 , Carole Ober 2 , Catherine Bourgain 14 INSERM U535 and University Paris Sud, Villejuif, France 5 Department of Human Genetics, The University of Chicago, USA Keywords: Linkage analysis, linkage disequilibrium, high density SNP data Compared with microsatellite markers, high density SNP maps should be more informative for linkage analyses. However, because they are much closer, SNPs present important linkage disequilibrium (LD), which biases classical nonparametric multipoint analyses. This problem is even stronger in population isolates where LD extends over larger regions with a more stochastic pattern. We investigate the issue of linkage analysis with a 500K SNP map in a large and inbred 1840-member Hutterite pedigree, phenotyped for asthma. Using an efficient pedigree breaking strategy, we first identified linked regions with a 5cM microsatellite map, on which we focused to evaluate the SNP map. The only method that models LD in the NPL analysis is limited in both the pedigree size and the number of markers (Abecasis and Wigginton, 2005) and therefore could not be used. Instead, we studied methods that identify sets of SNPs with maximum linkage information content in our pedigree and no LD-driven bias. Both algorithms that directly remove pairs of SNPs in high LD and clustering methods were evaluated. Null simulations were performed to control that Zlr calculated with the SNP sets were not falsely inflated. Preliminary results suggest that although LD is strong in such populations, linkage information content slightly better than that of microsatellite maps can be extracted from dense SNP maps, provided that a careful marker selection is conducted. In particular, we show that the specific LD pattern requires considering LD between a wide range of marker pairs rather than only in predefined blocks. Peter Van Loo 1,2,3 , Stein Aerts 1,2 , Diether Lambrechts 4,5 , Bernard Thienpont 2 , Sunit Maity 4,5 , Bert Coessens 3 , Frederik De Smet 4,5 , Leon-Charles Tranchevent 3 , Bart De Moor 2 , Koen Devriendt 3 , Peter Marynen 1,2 , Bassem Hassan 1,2 , Peter Carmeliet 4,5 , Yves Moreau 36 Department of Molecular and Developmental Genetics, VIB, Belgium 7 Department of Human Genetics, University of Leuven, Belgium 8 Bioinformatics group, Department of Electrical Engineering, University of Leuven, Belgium 9 Department of Transgene Technology and Gene Therapy, VIB, Belgium 10 Center for Transgene Technology and Gene Therapy, University of Leuven, Belgium Keywords: Bioinformatics, gene prioritization, data fusion The identification of genes involved in health and disease remains a formidable challenge. Here, we describe a novel bioinformatics method to prioritize candidate genes underlying pathways or diseases, based on their similarity to genes known to be involved in these processes. It is freely accessible as an interactive software tool, ENDEAVOUR, at http://www.esat.kuleuven.be/endeavour. Unlike previous methods, ENDEAVOUR generates distinct prioritizations from multiple heterogeneous data sources, which are then integrated, or fused, into one global ranking using order statistics. ENDEAVOUR prioritizes candidate genes in a three-step process. First, information about a disease or pathway is gathered from a set of known "training" genes by consulting multiple data sources. Next, the candidate genes are ranked based on similarity with the training properties obtained in the first step, resulting in one prioritized list for each data source. Finally, ENDEAVOUR fuses each of these rankings into a single global ranking, providing an overall prioritization of the candidate genes. Validation of ENDEAVOUR revealed it was able to efficiently prioritize 627 genes in disease data sets and 76 genes in biological pathway sets, identify candidates of 16 mono- or polygenic diseases, and discover regulatory genes of myeloid differentiation. Furthermore, the approach identified YPEL1 as a novel gene involved in craniofacial development from a 2-Mb chromosomal region, deleted in some patients with DiGeorge-like birth defects. Finally, we are currently evaluating a pipeline combining array-CGH, ENDEAVOUR and in vivo validation in zebrafish to identify novel genes involved in congenital heart defects. Mark Broom 1 , Graeme Ruxton 2 , Rebecca Kilner 311 Mathematics Dept., University of Sussex, UK 12 Division of Environmental and Evolutionary Biology, University of Glasgow, UK 13 Department of Zoology, University of Cambridge, UK Keywords: Evolutionarily stable strategy, parasitism, asymmetric game Brood parasites chicks vary in the harm that they do to their companions in the nest. In this presentation we use game-theoretic methods to model this variation. Our model considers hosts which potentially abandon single nestlings and instead choose to re-allocate their reproductive effort to future breeding, irrespective of whether the abandoned chick is the host's young or a brood parasite's. The parasite chick must decide whether or not to kill host young by balancing the benefits from reduced competition in the nest against the risk of desertion by host parents. The model predicts that three different types of evolutionarily stable strategies can exist. (1) Hosts routinely rear depleted broods, the brood parasite always kills host young and the host never then abandons the nest. (2) When adult survival after deserting single offspring is very high, hosts always abandon broods of a single nestling and the parasite never kills host offspring, effectively holding them as hostages to prevent nest desertion. (3) Intermediate strategies, in which parasites sometimes kill their nest-mates and host parents sometimes desert nests that contain only a single chick, can also be evolutionarily stable. We provide quantitative descriptions of how the values given to ecological and behavioral parameters of the host-parasite system influence the likelihood of each strategy and compare our results with real host-brood parasite associations in nature. Martin Harrison 114 Mathematics Dept, University of Sussex, UK Keywords: Brood parasitism, games, host, parasite The interaction between hosts and parasites in bird populations has been studied extensively. Game theoretical methods have been used to model this interaction previously, but this has not been studied extensively taking into account the sequential nature of this game. We consider a model allowing the host and parasite to make a number of decisions, which depend on a number of natural factors. The host lays an egg, a parasite bird will arrive at the nest with a certain probability and then chooses to destroy a number of the host eggs and lay one of it's own. With some destruction occurring, either natural or through the actions of the parasite, the host chooses to continue, eject an egg (hoping to eject the parasite) or abandon the nest. Once the eggs have hatched the game then falls to the parasite chick versus the host. The chick chooses to destroy or eject a number of eggs. The final decision is made by the host, choosing whether to raise or abandon the chicks that are in the nest. We consider various natural parameters and probabilities which influence these decisions. We then use this model to look at real-world situations of the interactions of the Reed Warbler and two different parasites, the Common Cuckoo and the Brown-Headed Cowbird. These two parasites have different methods in the way that they parasitize the nests of their hosts. The hosts in turn have a different reaction to these parasites. Arne Jochens 1 , Amke Caliebe 2 , Uwe Roesler 1 , Michael Krawczak 215 Mathematical Seminar, University of Kiel, Germany 16 Institute of Medical Informatics and Statistics, University of Kiel, Germany Keywords: Stepwise mutation model, microsatellite, recursion equation, temporal behaviour We consider the stepwise mutation model which occurs, e.g., in microsatellite loci. Let X(t,i) denote the allelic state of individual i at time t. We compute expectation, variance and covariance of X(t,i), i=1,,,N, and provide a recursion equation for P(X(t,i)=z). Because the variance of X(t,i) goes to infinity as t grows, for the description of the temporal behaviour, we regard the scaled process X(t,i)-X(t,1). The results furnish a better understanding of the behaviour of the stepwise mutation model and may in future be used to derive tests for neutrality under this model. Paul O'Reilly 1 , Ewan Birney 2 , David Balding 117 Statistical Genetics, Department of Epidemiology and Public Health, Imperial, College London, UK 18 European Bioinformatics Institute, EMBL, Cambridge, UK Keywords: Positive selection, Recombination rate, LD, Genome-wide, Natural Selection In recent years efforts to develop population genetics methods that estimate rates of recombination and levels of natural selection in the human genome have intensified. However, since the two processes have an intimately related impact on genetic variation their inference is vulnerable to confounding. Genomic regions subject to recent selection are likely to have a relatively recent common ancestor and consequently less opportunity for historical recombinations that are detectable in contemporary populations. Here we show that selection can reduce the population-based recombination rate estimate substantially. In genome-wide studies for detecting selection we observe a tendency to highlight loci that are subject to low levels of recombination. We find that the outlier approach commonly adopted in such studies may have low power unless variable recombination is accounted for. We introduce a new genome-wide method for detecting selection that exploits the sensitivity to recent selection of methods for estimating recombination rates, while accounting for variable recombination using pedigree data. Through simulations we demonstrate the high power of the Ped/Pop approach to discriminate between neutral and adaptive evolution, particularly in the context of choosing outliers from a genome-wide distribution. Although methods have been developed showing good power to detect selection ,in action', the corresponding window of opportunity is small. In contrast, the power of the Ped/Pop method is maintained for many generations after the fixation of an advantageous variant Sarah Griffiths 1 , Frank Dudbridge 120 MRC Biostatistics Unit, Cambridge, UK Keywords: Genetic association, multimarker tag, haplotype, likelihood analysis In association studies it is generally too expensive to genotype all variants in all subjects. We can exploit linkage disequilibrium between SNPs to select a subset that captures the variation in a training data set obtained either through direct resequencing or a public resource such as the HapMap. These ,tag SNPs' are then genotyped in the whole sample. Multimarker tagging is a more aggressive adaptation of pairwise tagging that allows for combinations of two or more tag SNPs to predict an untyped SNP. Here we describe a new method for directly testing the association of an untyped SNP using a multimarker tag. Previously, other investigators have suggested testing a specific tag haplotype, or performing a weighted analysis using weights derived from the training data. However these approaches do not properly account for the imperfect correlation between the tag haplotype and the untyped SNP. Here we describe a straightforward approach to testing untyped SNPs using a missing-data likelihood analysis, including the tag markers as nuisance parameters. The training data is stacked on top of the main body of genotype data so there is information on how the tag markers predict the genotype of the untyped SNP. The uncertainty in this prediction is automatically taken into account in the likelihood analysis. This approach yields more power and also a more accurate prediction of the odds ratio of the untyped SNP. Anke Schulz 1 , Christine Fischer 2 , Jenny Chang-Claude 1 , Lars Beckmann 121 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany 22 Institute of Human Genetics, University of Heidelberg, Germany Keywords: Haplotype, haplotype sharing, entropy, Mantel statistics, marker selection We previously introduced a new method to map genes involved in complex diseases, using haplotype sharing-based Mantel statistics to correlate genetic and phenotypic similarity. Although the Mantel statistic is powerful in narrowing down candidate regions, the precise localization of a gene is hampered in genomic regions where linkage disequilibrium is so high that neighboring markers are found to be significant at similar magnitude and we are not able to discriminate between them. Here, we present a new approach to localize susceptibility genes by combining haplotype sharing-based Mantel statistics with an iterative entropy-based marker selection algorithm. For each marker at which the Mantel statistic is evaluated, the algorithm selects a subset of surrounding markers. The subset is chosen to maximize multilocus linkage disequilibrium, which is measured by the normalized entropy difference introduced by Nothnagel et al. (2002). We evaluated the algorithm with respect to type I error and power. Its ability to localize the disease variant was compared to the localization (i) without marker selection and (ii) considering haplotype block structure. Case-control samples were simulated from a set of 18 haplotypes, consisting of 15 SNPs in two haplotype blocks. The new algorithm gave correct type I error and yielded similar power to detect the disease locus compared to the alternative approaches. The neighboring markers were clearly less often significant than the causal locus, and also less often significant compared to the alternative approaches. Thus the new algorithm improved the precision of the localization of susceptibility genes. Mark M. Iles 123 Section of Epidemiology and Biostatistics, LIMM, University of Leeds, UK Keywords: tSNP, tagging, association, HapMap Tagging SNPs (tSNPs) are commonly used to capture genetic diversity cost-effectively. However, it is important that the efficacy of tSNPs is correctly estimated, otherwise coverage may be insufficient. If the pilot sample from which tSNPs are chosen is too small or the initial marker map too sparse, tSNP efficacy may be overestimated. An existing estimation method based on bootstrapping goes some way to correct for insufficient sample size and overfitting, but does not completely solve the problem. We describe a novel method, based on exclusion of haplotypes, that improves on the bootstrap approach. Using simulated data, the extent of the sample size problem is investigated and the performance of the bootstrap and the novel method are compared. We incorporate an existing method adjusting for marker density by ,SNP-dropping'. We find that insufficient sample size can cause large overestimates in tSNP efficacy, even with as many as 100 individuals, and the problem worsens as the region studied increases in size. Both the bootstrap and novel method correct much of this overestimate, with our novel method consistently outperforming the bootstrap method. We conclude that a combination of insufficient sample size and overfitting may lead to overestimation of tSNP efficacy and underpowering of studies based on tSNPs. Our novel approach corrects for much of this bias and is superior to the previous method. Sample sizes larger than previously suggested may still be required for accurate estimation of tSNP efficacy. This has obvious ramifications for the selection of tSNPs from HapMap data. Claudio Verzilli 1 , Juliet Chapman 1 , Aroon Hingorani 2 , Juan Pablo-Casas 1 , Tina Shah 2 , Liam Smeeth 1 , John Whittaker 124 Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, UK 25 Division of Medicine, University College London, UK Keywords: Meta-analysis, Genetic association studies We present a Bayesian hierarchical model for the meta-analysis of candidate gene studies with a continuous outcome. Such studies often report results from association tests for different, possibly study-specific and non-overlapping markers (typically SNPs) in the same genetic region. Meta analyses of the results at each marker in isolation are seldom appropriate as they ignore the correlation that may exist between markers due to linkage disequlibrium (LD) and cannot assess the relative importance of variants at each marker. Also such marker-wise meta analyses are restricted to only those studies that have typed the marker in question, with a potential loss of power. A better strategy is one which incorporates information about the LD between markers so that any combined estimate of the effect of each variant is corrected for the effect of other variants, as in multiple regression. Here we develop a Bayesian hierarchical linear regression that models the observed genotype group means and uses pairwise LD measurements between markers as prior information to make posterior inference on adjusted effects. The approach is applied to the meta analysis of 24 studies assessing the effect of 7 variants in the C-reactive protein (CRP) gene region on plasma CRP levels, an inflammatory biomarker shown in observational studies to be positively associated with cardiovascular disease. Cathryn M. Lewis 1 , Christopher G. Mathew 1 , Theresa M. Marteau 226 Dept. of Medical and Molecular Genetics, King's College London, UK 27 Department of Psychology, King's College London, UK Keywords: Risk, genetics, CARD15, smoking, model Recently progress has been made in identifying mutations that confer susceptibility to complex diseases, with the potential to use these mutations in determining disease risk. We developed methods to estimate disease risk based on genotype relative risks (for a gene G), exposure to an environmental factor (E), and family history (with recurrence risk ,R for a relative of type R). ,R must be partitioned into the risk due to G (which is modelled independently) and the residual risk. The risk model was then applied to Crohn's disease (CD), a severe gastrointestinal disease for which smoking increases disease risk approximately 2-fold, and mutations in CARD15 confer increased risks of 2.25 (for carriers of a single mutation) and 9.3 (for carriers of two mutations). CARD15 accounts for only a small proportion of the genetic component of CD, with a gene-specific ,S, CARD15 of 1.16, from a total sibling relative risk of ,S= 27. CD risks were estimated for high-risk individuals who are siblings of a CD case, and who also smoke. The CD risk to such individuals who carry two CARD15 mutations is approximately 0.34, and for those carrying a single CARD15 mutation the risk is 0.08, compared to a population prevalence of approximately 0.001. These results imply that complex disease genes may be valuable in estimating with greater precision than has hitherto been possible disease risks in specific, easily identified subgroups of the population with a view to prevention. Yurii Aulchenko 128 Department of Epidemiology & Biostatistics, Erasmus Medical Centre Rotterdam, The Netherlands Keywords: Compression, information, bzip2, genome-wide SNP data, statistical genetics With advances in molecular technology, studies accessing millions of genetic polymorphisms in thousands of study subjects will soon become common. Such studies generate large amounts of data, whose effective storage and management is a challenge to the modern statistical genetics. Standard file compression utilities, such as Zip, Gzip and Bzip2, may be helpful to minimise the storage requirements. Less obvious is the fact that the data compression techniques may be also used in the analysis of genetic data. It is known that the efficiency of a particular compression algorithm depends on the probability structure of the data. In this work, we compared different standard and customised tools using the data from human HapMap project. Secondly, we investigate the potential uses of data compression techniques for the analysis of linkage, association and linkage disequilibrium Suzanne Leal 1 , Bingshan Li 129 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, USA Keywords: Consanguineous pedigrees, missing genotype data Missing genotype data can increase false-positive evidence for linkage when either parametric or nonparametric analysis is carried out ignoring intermarker linkage disequilibrium (LD). Previously it was demonstrated by Huang et al (2005) that no bias occurs in this situation for affected sib-pairs with unrelated parents when either both parents are genotyped or genotype data is available for two additional unaffected siblings when parental genotypes are missing. However, this is not the case for consanguineous pedigrees, where missing genotype data for any pedigree member within a consanguinity loop can increase false-positive evidence of linkage. The false-positive evidence for linkage is further increased when cryptic consanguinity is present. The amount of false-positive evidence for linkage is highly dependent on which family members are genotyped. When parental genotype data is available, the false-positive evidence for linkage is usually not as strong as when parental genotype data is unavailable. Which family members will aid in the reduction of false-positive evidence of linkage is highly dependent on which other family members are genotyped. For a pedigree with an affected proband whose first-cousin parents have been genotyped, further reduction in the false-positive evidence of linkage can be obtained by including genotype data from additional affected siblings of the proband or genotype data from the proband's sibling-grandparents. When parental genotypes are not available, false-positive evidence for linkage can be reduced by including in the analysis genotype data from either unaffected siblings of the proband or the proband's married-in-grandparents. Najaf Amin 1 , Yurii Aulchenko 130 Department of Epidemiology & Biostatistics, Erasmus Medical Centre Rotterdam, The Netherlands Keywords: Genomic Control, pedigree structure, quantitative traits The Genomic Control (GC) method was originally developed to control for population stratification and cryptic relatedness in association studies. This method assumes that the effect of population substructure on the test statistics is essentially constant across the genome, and therefore unassociated markers can be used to estimate the effect of confounding onto the test statistic. The properties of GC method were extensively investigated for different stratification scenarios, and compared to alternative methods, such as the transmission-disequilibrium test. The potential of this method to correct not for occasional cryptic relations, but for regular pedigree structure, however, was not investigated before. In this work we investigate the potential of the GC method for pedigree-based association analysis of quantitative traits. The power and type one error of the method was compared to standard methods, such as the measured genotype (MG) approach and quantitative trait transmission-disequilibrium test. In human pedigrees, with trait heritability varying from 30 to 80%, the power of MG and GC approach was always higher than that of TDT. GC had correct type 1 error and its power was close to that of MG under moderate heritability (30%), but decreased with higher heritability. William Astle 1 , Chris Holmes 2 , David Balding 131 Department of Epidemiology and Public Health, Imperial College London, UK 32 Department of Statistics, University of Oxford, UK Keywords: Population structure, association studies, genetic epidemiology, statistical genetics In the analysis of population association studies, Genomic Control (Devlin & Roeder, 1999) (GC) adjusts the Armitage test statistic to correct the type I error for the effects of population substructure, but its power is often sub-optimal. Turbo Genomic Control (TGC) generalises GC to incorporate co-variation of relatedness and phenotype, retaining control over type I error while improving power. TGC is similar to the method of Yu et al. (2006), but we extend it to binary (case-control) in addition to quantitative phenotypes, we implement improved estimation of relatedness coefficients, and we derive an explicit statistic that generalizes the Armitage test statistic and is fast to compute. TGC also has similarities to EIGENSTRAT (Price et al., 2006) which is a new method based on principle components analysis. The problems of population structure(Clayton et al., 2005) and cryptic relatedness (Voight & Pritchard, 2005) are essentially the same: if patterns of shared ancestry differ between cases and controls, whether distant (coancestry) or recent (cryptic relatedness), false positives can arise and power can be diminished. With large numbers of widely-spaced genetic markers, coancestry can now be measured accurately for each pair of individuals via patterns of allele-sharing. Instead of modelling subpopulations, we work instead with a coancestry coefficient for each pair of individuals in the study. We explain the relationships between TGC, GC and EIGENSTRAT. We present simulation studies and real data analyses to illustrate the power advantage of TGC in a range of scenarios incorporating both substructure and cryptic relatedness. References Clayton, D. G.et al. (2005) Population structure, differential bias and genomic control in a large-scale case-control association study. Nature Genetics37(11) November 2005. Devlin, B. & Roeder, K. (1999) Genomic control for association studies. Biometics55(4) December 1999. Price, A. L.et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics38(8) (August 2006). Voight, B. J. & Pritchard, J. K. (2005) Confounding from cryptic relatedness in case-control association studies. Public Library of Science Genetics1(3) September 2005. Yu, J.et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics38(2) February 2006. Hervé Perdry 1 , Marie-Claude Babron 1 , Françoise Clerget-Darpoux 133 INSERM U535 and Univ. Paris Sud, UMR-S 535, Villejuif, France Keywords: Modifier genes, case-parents trios, ordered transmission disequilibrium test A modifying locus is a polymorphic locus, distinct from the disease locus, which leads to differences in the disease phenotype, either by modifying the penetrance of the disease allele, or by modifying the expression of the disease. The effect of such a locus is a clinical heterogeneity that can be reflected by the values of an appropriate covariate, such as the age of onset, or the severity of the disease. We designed the Ordered Transmission Disequilibrium Test (OTDT) to test for a relation between the clinical heterogeneity, expressed by the covariate, and marker genotypes of a candidate gene. The method applies to trio families with one affected child and his parents. Each family member is genotyped at a bi-allelic marker M of a candidate gene. To each of the families is associated a covariate value; the families are ordered on the values of this covariate. As the TDT (Spielman et al. 1993), the OTDT is based on the observation of the transmission rate T of a given allele at M. The OTDT aims to find a critical value of the covariate which separates the sample of families in two subsamples in which the transmission rates are significantly different. We investigate the power of the method by simulations under various genetic models and covariate distributions. Acknowledgments H Perdry is funded by ARSEP. Pascal Croiseau 1 , Heather Cordell 2 , Emmanuelle Génin 134 INSERM U535 and University Paris Sud, UMR-S535, Villejuif, France 35 Institute of Human Genetics, Newcastle University, UK Keywords: Association, missing data, conditionnal logistic regression Missing data is an important problem in association studies. Several methods used to test for association need that individuals be genotyped at the full set of markers. Individuals with missing data need to be excluded from the analysis. This could involve an important decrease in sample size and a loss of information. If the disease susceptibility locus (DSL) is poorly typed, it is also possible that a marker in linkage disequilibrium gives a stronger association signal than the DSL. One may then falsely conclude that the marker is more likely to be the DSL. We recently developed a Multiple Imputation method to infer missing data on case-parent trios Starting from the observed data, a few number of complete data sets are generated by Markov-Chain Monte Carlo approach. These complete datasets are analysed using standard statistical package and the results are combined as described in Little & Rubin (2002). Here we report the results of simulations performed to examine, for different patterns of missing data, how often the true DSL gives the highest association score among different loci in LD. We found that multiple imputation usually correctly detect the DSL site even if the percentage of missing data is high. This is not the case for the naďve approach that consists in discarding trios with missing data. In conclusion, Multiple imputation presents the advantage of being easy to use and flexible and is therefore a promising tool in the search for DSL involved in complex diseases. Salma Kotti 1 , Heike Bickeböller 2 , Françoise Clerget-Darpoux 136 University Paris Sud, UMR-S535, Villejuif, France 37 Department of Genetic Epidemiology, Medical School, University of Göttingen, Germany Keywords: Genotype relative risk, internal controls, Family based analyses Family based analyses using internal controls are very popular both for detecting the effect of a genetic factor and for estimating the relative disease risk on the corresponding genotypes. Two different procedures are often applied to reconstitute internal controls. The first one considers one pseudocontrol genotype formed by the parental non-transmitted alleles called also 1:1 matching of alleles, while the second corresponds to three pseudocontrols corresponding to all genotypes formed by the parental alleles except the one of the case (1:3 matching). Many studies have compared between the two procedures in terms of the power and have concluded that the difference depends on the underlying genetic model and the allele frequencies. However, the estimation of the Genotype Relative Risk (GRR) under the two procedures has not been studied. Based on the fact that on the 1:1 matching, the control group is composed of the alleles untransmitted to the affected child and on the 1:3 matching, the control group is composed amongst alleles already transmitted to the affected child, we expect a difference on the GRR estimation. In fact, we suspect that the second procedure leads to biased estimation of the GRRs. We will analytically derive the GRR estimators for the 1:1 and 1:3 matching and will present the results at the meeting. Family based analyses using internal controls are very popular both for detecting the effect of a genetic factor and for estimating the relative disease risk on the corresponding genotypes. Two different procedures are often applied to reconstitute internal controls. The first one considers one pseudocontrol genotype formed by the parental non-transmitted alleles called also 1:1 matching of alleles, while the second corresponds to three pseudocontrols corresponding to all genotypes formed by the parental alleles except the one of the case (1:3 matching). Many studies have compared between the two procedures in terms of the power and have concluded that the difference depends on the underlying genetic model and the allele frequencies. However, the estimation of the Genotype Relative Risk (GRR) under the two procedures has not been studied. Based on the fact that on the 1:1 matching, the control group is composed of the alleles untransmitted to the affected child and on the 1:3 matching, the control group is composed amongst alleles already transmitted to the affected child, we expect a difference on the GRR estimation. In fact, we suspect that the second procedure leads to biased estimation of the GRR. We will analytically derive the GRR estimator for the 1:1 and 1:3 matching and will present the results at the meeting. Luigi Palla 1 , David Siegmund 239 Department of Mathematics,Free University Amsterdam, The Netherlands 40 Department of Statistics, Stanford University, California, USA Keywords: TDT, assortative mating, inbreeding, statistical power A substantial amount of Assortative Mating (AM) is often recorded on physical and psychological, dichotomous as well as quantitative traits that are supposed to have a multifactorial genetic component. In particular AM has the effect of increasing the genetic variance, even more than inbreeding because it acts across loci beside within loci, when the trait has a multifactorial origin. Under the assumption of a polygenic model for AM dating back to Wright (1921) and refined by Crow and Felsenstein (1968,1982), the effect of assortative mating on the power to detect genetic association in the Transmission Disequilibrium Test (TDT) is explored as parameters, such as the effective number of genes and the allelic frequency vary. The power is reflected by the non centrality parameter of the TDT and is expressed as a function of the number of trios, the relative risk of the heterozygous genotype and the allele frequency (Siegmund and Yakir, 2007). The noncentrality parameter of the relevant score statistic is updated considering the effect of AM which is expressed in terms of an ,effective' inbreeding coefficient. In particular, for dichotomous traits it is apparent that the higher the number of genes involved in the trait, the lower the loss in power due to AM. Finally an attempt is made to extend this relation to the Q-TDT (Rabinowitz, 1997), which involves considering the effect of AM also on the phenotypic variance of the trait of interest, under the assumption that AM affects only its additive genetic component. References Crow, & Felsenstein, (1968). The effect of assortative mating on the genetic composition of a population. Eugen.Quart.15, 87,97. Rabinowitz,, 1997. A Transmission Disequilibrium Test for Quantitative Trait Loci. Human Heredity47, 342,350. Siegmund, & Yakir, (2007) Statistics of gene mapping, Springer. Wright, (1921). System of mating.III. Assortative mating based on somatic resemblance. Genetics6, 144,161. Jérémie Nsengimana 1 , Ben D Brown 2 , Alistair S Hall 2 , Jenny H Barrett 141 Leeds Institute of Molecular Medicine, University of Leeds, UK 42 Leeds Institute for Genetics, Health and Therapeutics, University of Leeds, UK Keywords: Inflammatory genes, haplotype, coronary artery disease Genetic Risk of Acute Coronary Events (GRACE) is an initiative to collect cases of coronary artery disease (CAD) and their unaffected siblings in the UK and to use them to map genetic variants increasing disease risk. The aim of the present study was to test the association between CAD and 51 single nucleotide polymorphisms (SNPs) and their haplotypes from 35 inflammatory genes. Genotype data were available for 1154 persons affected before age 66 (including 48% before age 50) and their 1545 unaffected siblings (891 discordant families). Each SNP was tested for association to CAD, and haplotypes within genes or gene clusters were tested using FBAT (Rabinowitz & Laird, 2000). For the most significant results, genetic effect size was estimated using conditional logistic regression (CLR) within STATA adjusting for other risk factors. Haplotypes were assigned using HAPLORE (Zhang et al., 2005), which considers all parental mating types consistent with offspring genotypes and assigns them a probability of occurence. This probability was used in CLR to weight the haplotypes. In the single SNP analysis, several SNPs showed some evidence for association, including one SNP in the interleukin-1A gene. Analysing haplotypes in the interleukin-1 gene cluster, a common 3-SNP haplotype was found to increase the risk of CAD (P = 0.009). In an additive genetic model adjusting for covariates the odds ratio (OR) for this haplotype is 1.56 (95% CI: 1.16-2.10, p = 0.004) for early-onset CAD (before age 50). This study illustrates the utility of haplotype analysis in family-based association studies to investigate candidate genes. References Rabinowitz, D. & Laird, N. M. (2000) Hum Hered50, 211,223. Zhang, K., Sun, F. & Zhao, H. (2005) Bioinformatics21, 90,103. Andrea Foulkes 1 , Recai Yucel 1 , Xiaohong Li 143 Division of Biostatistics, University of Massachusetts, USA Keywords: Haploytpe, high-dimensional, mixed modeling The explosion of molecular level information coupled with large epidemiological studies presents an exciting opportunity to uncover the genetic underpinnings of complex diseases; however, several analytical challenges remain to be addressed. Characterizing the components to complex diseases inevitably requires consideration of synergies across multiple genetic loci and environmental and demographic factors. In addition, it is critical to capture information on allelic phase, that is whether alleles within a gene are in cis (on the same chromosome) or in trans (on different chromosomes.) In associations studies of unrelated individuals, this alignment of alleles within a chromosomal copy is generally not observed. We address the potential ambiguity in allelic phase in this high dimensional data setting using mixed effects models. Both a semi-parametric and fully likelihood-based approach to estimation are considered to account for missingness in cluster identifiers. In the first case, we apply a multiple imputation procedure coupled with a first stage expectation maximization algorithm for parameter estimation. A bootstrap approach is employed to assess sensitivity to variability induced by parameter estimation. Secondly, a fully likelihood-based approach using an expectation conditional maximization algorithm is described. Notably, these models allow for characterizing high-order gene-gene interactions while providing a flexible statistical framework to account for the confounding or mediating role of person specific covariates. The proposed method is applied to data arising from a cohort of human immunodeficiency virus type-1 (HIV-1) infected individuals at risk for therapy associated dyslipidemia. Simulation studies demonstrate reasonable power and control of family-wise type 1 error rates. Vivien Marquard 1 , Lars Beckmann 1 , Jenny Chang-Claude 144 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany Keywords: Genotyping errors, type I error, haplotype-based association methods It has been shown in several simulation studies that genotyping errors may have a great impact on the type I error of statistical methods used in genetic association analysis of complex diseases. Our aim was to investigate type I error rates in a case-control study, when differential and non-differential genotyping errors were introduced in realistic scenarios. We simulated case-control data sets, where individual genotypes were drawn from a haplotype distribution of 18 haplotypes with 15 markers in the APM1 gene. Genotyping errors were introduced following the unrestricted and symmetric with 0 edges error models described by Heid et al. (2006). In six scenarios, errors resulted from changes of one allele to another with predefined probabilities of 1%, 2.5% or 10%, respectively. A multiple number of errors per haplotype was possible and could vary between 0 and 15, the number of markers investigated. We examined three association methods: Mantel statistics using haplotype-sharing; a haplotype-specific score test; and Armitage trend test for single markers. The type I error rates were not influenced for any of all the three methods for a genotyping error rate of less than 1%. For higher error rates and differential errors, the type I error of the Mantel statistic was only slightly and of the Armitage trend test moderately increased. The type I error rates of the score test were highly increased. The type I error rates were correct for all three methods for non-differential errors. Further investigations will be carried out with different frequencies of differential error rates and focus on power. Arne Neumann 1 , Dörthe Malzahn 1 , Martina Müller 2 , Heike Bickeböller 145 Department of Genetic Epidemiology, Medical School, University of Göttingen, Germany 46 GSF-National Research Center for Environment and Health, Neuherberg & IBE-Institute of Epidemiology, Ludwig-Maximilians University München, Germany Keywords: Interaction, longitudinal, nonparametric Longitudinal data show the time dependent course of phenotypic traits. In this contribution, we consider longitudinal cohort studies and investigate the association between two candidate genes and a dependent quantitative longitudinal phenotype. The set-up defines a factorial design which allows us to test simultaneously for the overall gene effect of the loci as well as for possible gene-gene and gene time interaction. The latter would induce genetically based time-profile differences in the longitudinal phenotype. We adopt a non-parametric statistical test to genetic epidemiological cohort studies and investigate its performance by simulation studies. The statistical test was originally developed for longitudinal clinical studies (Brunner, Munzel, Puri, 1999 J Multivariate Anal 70:286-317). It is non-parametric in the sense that no assumptions are made about the underlying distribution of the quantitative phenotype. Longitudinal observations belonging to the same individual can be arbitrarily dependent on one another for the different time points whereas trait observations of different individuals are independent. The two loci are assumed to be statistically independent. Our simulations show that the nonparametric test is comparable with ANOVA in terms of power of detecting gene-gene and gene-time interaction in an ANOVA favourable setting. Rebecca Hein 1 , Lars Beckmann 1 , Jenny Chang-Claude 147 Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg, Germany Keywords: Indirect association studies, interaction effects, linkage disequilibrium, marker allele frequency Association studies accounting for gene-environment interactions (GxE) may be useful for detecting genetic effects and identifying important environmental effect modifiers. Current technology facilitates very dense marker spacing in genetic association studies; however, the true disease variant(s) may not be genotyped. In this situation, an association between a gene and a phenotype may still be detectable, using genetic markers associated with the true disease variant(s) (indirect association). Zondervan and Cardon [2004] showed that the odds ratios (OR) of markers which are associated with the disease variant depend highly on the linkage disequilibrium (LD) between the variant and the markers, and whether the allele frequencies match and thereby influence the sample size needed to detect genetic association. We examined the influence of LD and allele frequencies on the sample size needed to detect GxE in indirect association studies, and provide tables for sample size estimation. For discordant allele frequencies and incomplete LD, sample sizes can be unfeasibly large. The influence of both factors is stronger for disease loci with small rather than moderate to high disease allele frequencies. A decline in D' of e.g. 5% has less impact on sample size than increasing the difference in allele frequencies by the same percentage. Assuming 80% power, large interaction effects can be detected using smaller sample sizes than those needed for the detection of main effects. The detection of interaction effects involving rare alleles may not be possible. Focussing only on marker density can be a limited strategy in indirect association studies for GxE. Cyril Dalmasso 1 , Emmanuelle Génin 2 , Catherine Bourgain 2 , Philippe Broët 148 JE 2492 , Univ. Paris-Sud, France 49 INSERM UMR-S 535 and University Paris Sud, Villejuif, France Keywords: Linkage analysis, Multiple testing, False Discovery Rate, Mixture model In the context of genome-wide linkage analyses, where a large number of statistical tests are simultaneously performed, the False Discovery Rate (FDR) that is defined as the expected proportion of false discoveries among all discoveries is nowadays widely used for taking into account the multiple testing problem. Other related criteria have been considered such as the local False Discovery Rate (lFDR) that is a variant of the FDR giving to each test its own measure of significance. The lFDR is defined as the posterior probability that a null hypothesis is true. Most of the proposed methods for estimating the lFDR or the FDR rely on distributional assumption under the null hypothesis. However, in observational studies, the empirical null distribution may be very different from the theoretical one. In this work, we propose a mixture model based approach that provides estimates of the lFDR and the FDR in the context of large-scale variance component linkage analyses. In particular, this approach allows estimating the empirical null distribution, this latter being a key quantity for any simultaneous inference procedure. The proposed method is applied on a real dataset. Arief Gusnanto 1 , Frank Dudbridge 150 MRC Biostatistics Unit, Cambridge UK Keywords: Significance, genome-wide, association, permutation, multiplicity Genome-wide association scans have introduced statistical challenges, mainly in the multiplicity of thousands of tests. The question of what constitutes a significant finding remains somewhat unresolved. Permutation testing is very time-consuming, whereas Bayesian arguments struggle to distinguish direct from indirect association. It seems attractive to summarise the multiplicity in a simple form that allows users to avoid time-consuming permutations. A standard significance level would facilitate reporting of results and reduce the need for permutation tests. This is potentially important because current scans do not have full coverage of the whole genome, and yet, the implicit multiplicity is genome-wide. We discuss some proposed summaries, with reference to the empirical null distribution of the multiple tests, approximated through a large number of random permutations. Using genome-wide data from the Wellcome Trust Case-Control Consortium, we use a sub-sampling approach with increasing density to estimate the nominal p-value to obtain family-wise significance of 5%. The results indicate that the significance level is converging to about 1e-7 as the marker spacing becomes infinitely dense. We considered the concept of an effective number of independent tests, and showed that when used in a Bonferroni correction, the number varies with the overall significance level, but is roughly constant in the region of interest. We compared several estimators of the effective number of tests, and showed that in the region of significance of interest, Patterson's eigenvalue based estimator gives approximately the right family-wise error rate. Michael Nothnagel 1 , Amke Caliebe 1 , Michael Krawczak 151 Institute of Medical Informatics and Statistics, University Clinic Schleswig-Holstein, University of Kiel, Germany Keywords: Association scans, Bayesian framework, posterior odds, genetic risk, multiplicative model Whole-genome association scans have been suggested to be a cost-efficient way to survey genetic variation and to map genetic disease factors. We used a Bayesian framework to investigate the posterior odds of a genuine association under multiplicative disease models. We demonstrate that the p value alone is not a sufficient means to evaluate the findings in association studies. We suggest that likelihood ratios should accompany p values in association reports. We argue, that, given the reported results of whole-genome scans, more associations should have been successfully replicated if the consistently made assumptions about considerable genetic risks were correct. We conclude that it is very likely that the vast majority of relative genetic risks are only of the order of 1.2 or lower. Clive Hoggart 1 , Maria De Iorio 1 , John Whittakker 2 , David Balding 152 Department of Epidemiology and Public Health, Imperial College London, UK 53 Department of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, UK Keywords: Genome-wide association analyses, shrinkage priors, Lasso Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants of small effect, which is a plausible scenario for many complex diseases. Moreover, many simulation studies assume a single causal variant and so more complex realities are ignored. Analysing large numbers of variants simultaneously is now becoming feasible, thanks to developments in Bayesian stochastic search methods. We pose the problem of SNP selection as variable selection in a regression model. In contrast to single SNP tests this approach simultaneously models the effect of all SNPs. SNPs are selected by a Bayesian interpretation of the lasso (Tibshirani, 1996); the maximum a posterior (MAP) estimate of the regression coefficients, which have been given independent, double exponential prior distributions. The double exponential distribution is an example of a shrinkage prior, MAP estimates with shrinkage priors can be zero, thus all SNPs with non zero regression coefficients are selected. In addition to the commonly-used double exponential (Laplace) prior, we also implement the normal exponential gamma prior distribution. We show that use of the Laplace prior improves SNP selection in comparison with single -SNP tests, and that the normal exponential gamma prior leads to a further improvement. Our method is fast and can handle very large numbers of SNPs: we demonstrate its performance using both simulated and real genome-wide data sets with 500 K SNPs, which can be analysed in 2 hours on a desktop workstation. Mickael Guedj 1,2 , Jerome Wojcik 2 , Gregory Nuel 154 Laboratoire Statistique et Génome, Université d'Evry, Evry France 55 Serono Pharmaceutical Research Institute, Plan-les-Ouates, Switzerland Keywords: Local Replication, Local Score, Association In gene-mapping, replication of initial findings has been put forwards as the approach of choice for filtering false-positives from true signals for underlying loci. In practice, such replications are however too poorly observed. Besides the statistical and technical-related factors (lack of power, multiple-testing, stratification, quality control,) inconsistent conclusions obtained from independent populations might result from real biological differences. In particular, the high degree of variation in the strength of LD among populations of different origins is a major challenge to the discovery of genes. Seeking for Local Replications (defined as the presence of a signal of association in a same genomic region among populations) instead of strict replications (same locus, same risk allele) may lead to more reliable results. Recently, a multi-markers approach based on the Local Score statistic has been proposed as a simple and efficient way to select candidate genomic regions at the first stage of genome-wide association studies. Here we propose an extension of this approach adapted to replicated association studies. Based on simulations, this method appears promising. In particular it outperforms classical simple-marker strategies to detect modest-effect genes. Additionally it constitutes, to our knowledge, a first framework dedicated to the detection of such Local Replications. Juliet Chapman 1 , Claudio Verzilli 1 , John Whittaker 156 Department of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, UK Keywords: FDR, Association studies, Bayesian model selection As genomewide association studies become commonplace there is debate as to how such studies might be analysed and what we might hope to gain from the data. It is clear that standard single locus approaches are limited in that they do not adjust for the effects of other loci and problematic since it is not obvious how to adjust for multiple comparisons. False discovery rates have been suggested, but it is unclear how well these will cope with highly correlated genetic data. We consider the validity of standard false discovery rates in large scale association studies. We also show that a Bayesian procedure has advantages in detecting causal loci amongst a large number of dependant SNPs and investigate properties of a Bayesian FDR. Peter Kraft 157 Harvard School of Public Health, Boston USA Keywords: Gene-environment interaction, genome-wide association scans Appropriately analyzed two-stage designs,where a subset of available subjects are genotyped on a genome-wide panel of markers at the first stage and then a much smaller subset of the most promising markers are genotyped on the remaining subjects,can have nearly as much power as a single-stage study where all subjects are genotyped on the genome-wide panel yet can be much less expensive. Typically, the "most promising" markers are selected based on evidence for a marginal association between genotypes and disease. Subsequently, the few markers found to be associated with disease at the end of the second stage are interrogated for evidence of gene-environment interaction, mainly to understand their impact on disease etiology and public health impact. However, this approach may miss variants which have a sizeable effect restricted to one exposure stratum and therefore only a modest marginal effect. We have proposed to use information on the joint effects of genes and a discrete list of environmental exposures at the initial screening stage to select promising markers for the second stage [Kraft et al Hum Hered 2007]. This approach optimizes power to detect variants that have a sizeable marginal effect and variants that have a small marginal effect but a sizeable effect in a stratum defined by an environmental exposure. As an example, I discuss a proposed genome-wide association scan for Type II diabetes susceptibility variants based in several large nested case-control studies. Beate Glaser 1 , Peter Holmans 158 Biostatistics and Bioinformatics Unit, Cardiff University, School of Medicine, Heath Park, Cardiff, UK Keywords: Combined case-control and trios analysis, Power, False-positive rate, Simulation, Association studies The statistical power of genetic association studies can be enhanced by combining the analysis of case-control with parent-offspring trio samples. Various combined analysis techniques have been recently developed; as yet, there have been no comparisons of their power. This work was performed with the aim of identifying the most powerful method among available combined techniques including test statistics developed by Kazeem and Farrall (2005), Nagelkerke and colleagues (2004) and Dudbridge (2006), as well as a simple combination of ,2-statistics from single samples. Simulation studies were performed to investigate their power under different additive, multiplicative, dominant and recessive disease models. False-positive rates were determined by studying the type I error rates under null models including models with unequal allele frequencies between the single case-control and trios samples. We identified three techniques with equivalent power and false-positive rates, which included modifications of the three main approaches: 1) the unmodified combined Odds ratio estimate by Kazeem & Farrall (2005), 2) a modified approach of the combined risk ratio estimate by Nagelkerke & colleagues (2004) and 3) a modified technique for a combined risk ratio estimate by Dudbridge (2006). Our work highlights the importance of studies investigating test performance criteria of novel methods, as they will help users to select the optimal approach within a range of available analysis techniques. David Almorza 1 , M.V. Kandus 2 , Juan Carlos Salerno 2 , Rafael Boggio 359 Facultad de Ciencias del Trabajo, University of Cádiz, Spain 60 Instituto de Genética IGEAF, Buenos Aires, Argentina 61 Universidad Nacional de La Plata, Buenos Aires, Argentina Keywords: Principal component analysis, maize, ear weight, inbred lines The objective of this work was to evaluate the relationship among different traits of the ear of maize inbred lines and to group genotypes according to its performance. Ten inbred lines developed at IGEAF (INTA Castelar) and five public inbred lines as checks were used. A field trial was carried out in Castelar, Buenos Aires (34° 36' S , 58° 39' W) using a complete randomize design with three replications. At harvest, individual weight (P.E.), diameter (D.E.), row number (N.H.) and length (L.E.) of the ear were assessed. A principal component analysis, PCA, (Infostat 2005) was used, and the variability of the data was depicted with a biplot. Principal components 1 and 2 (CP1 and CP2) explained 90% of the data variability. CP1 was correlated with P.E., L.E. and D.E., meanwhile CP2 was correlated with N.H. We found that individual weight (P.E.) was more correlated with diameter of the ear (D.E.) than with length (L.E). Five groups of inbred lines were distinguished: with high P.E. and mean N.H. (04-70, 04-73, 04-101 and MO17), with high P.E. but less N.H. (04-61 and B14), with mean P.E. and N.H. (B73, 04-123 and 04-96), with high N.H. but less P.E. (LP109, 04-8, 04-91 and 04-76) and with low P.E. and low N.H. (LP521 and 04-104). The use of PCA showed which variables had more incidence in ear weight and how is the correlation among them. Moreover, the different groups found with this analysis allow the evaluation of inbred lines by several traits simultaneously. Sven Knüppel 1 , Anja Bauerfeind 1 , Klaus Rohde 162 Department of Bioinformatics, MDC Berlin, Germany Keywords: Haplotypes, association studies, case-control, nuclear families The area of gene chip technology provides a plethora of phase-unknown SNP genotypes in order to find significant association to some genetic trait. To circumvent possibly low information content of a single SNP one groups successive SNPs and estimates haplotypes. Haplotype estimation, however, may reveal ambiguous haplotype pairs and bias the application of statistical methods. Zaykin et al. (Hum Hered, 53:79-91, 2002) proposed the construction of a design matrix to take this ambiguity into account. Here we present a set of functions written for the Statistical package R, which carries out haplotype estimation on the basis of the EM-algorithm for individuals (case-control) or nuclear families. The construction of a design matrix on basis of estimated haplotypes or haplotype pairs allows application of standard methods for association studies (linear, logistic regression), as well as statistical methods as haplotype sharing statistics and TDT-Test. Applications of these methods to genome-wide association screens will be demonstrated. Manuela Zucknick 1 , Chris Holmes 2 , Sylvia Richardson 163 Department of Epidemiology and Public Health, Imperial College London, UK 64 Department of Statistics, Oxford Center for Gene Function, University of Oxford, UK Keywords: Bayesian, variable selection, MCMC, large p, small n, structured dependence In large-scale genomic applications vast numbers of markers or genes are scanned to find a few candidates which are linked to a particular phenotype. Statistically, this is a variable selection problem in the "large p, small n" situation where many more variables than samples are available. An additional feature is the complex dependence structure which is often observed among the markers/genes due to linkage disequilibrium or their joint involvement in biological processes. Bayesian variable selection methods using indicator variables are well suited to the problem. Binary phenotypes like disease status are common and both Bayesian probit and logistic regression can be applied in this context. We argue that logistic regression models are both easier to tune and to interpret than probit models and implement the approach by Holmes & Held (2006). Because the model space is vast, MCMC methods are used as stochastic search algorithms with the aim to quickly find regions of high posterior probability. In a trade-off between fast-updating but slow-moving single-gene Metropolis-Hastings samplers and computationally expensive full Gibbs sampling, we propose to employ the dependence structure among the genes/markers to help decide which variables to update together. Also, parallel tempering methods are used to aid bold moves and help avoid getting trapped in local optima. Mixing and convergence of the resulting Markov chains are evaluated and compared to standard samplers in both a simulation study and in an application to a gene expression data set. Reference Holmes, C. C. & Held, L. (2006) Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis1, 145,168. Dawn Teare 165 MMGE, University of Sheffield, UK Keywords: CNP, family-based analysis, MCMC Evidence is accumulating that segmental copy number polymorphisms (CNPs) may represent a significant portion of human genetic variation. These highly polymorphic systems require handling as phenotypes rather than co-dominant markers, placing new demands on family-based analyses. We present an integrated approach to meet these challenges in the form of a graphical model, where the underlying discrete CNP phenotype is inferred from the (single or replicate) quantitative measure within the analysis, whilst assuming an allele based system segregating through the pedigree. [source] |