Home About us Contact | |||
Markov Chain (markov + chain)
Kinds of Markov Chain Terms modified by Markov Chain Selected AbstractsMeasuring and partitioning the high-order linkage disequilibrium by multiple order Markov chainsGENETIC EPIDEMIOLOGY, Issue 4 2008Yunjung Kim Abstract A map of the background levels of disequilibrium between nearby markers can be useful for association mapping studies. In order to assess the background levels of linkage disequilibrium (LD), multilocus LD measures are more advantageous than pairwise LD measures because the combined analysis of pairwise LD measures is not adequate to detect simultaneous allele associations among multiple markers. Various multilocus LD measures based on haplotypes have been proposed. However, most of these measures provide a single index of association among multiple markers and does not reveal the complex patterns and different levels of LD structure. In this paper, we employ non-homogeneous, multiple order Markov Chain models as a statistical framework to measure and partition the LD among multiple markers into components due to different orders of marker associations. Using a sliding window of multiple markers on phased haplotype data, we compute corresponding likelihoods for different Markov Chain (MC) orders in each window. The log-likelihood difference between the lowest MC order model (MC0) and the highest MC order model in each window is used as a measure of the total LD or the overall deviation from the gametic equilibrium for the window. Then, we partition the total LD into lower order disequilibria and estimate the effects from two-, three-, and higher order disequilibria. The relationship between different orders of LD and the log-likelihood difference involving two different orders of MC models are explored. By applying our method to the phased haplotype data in the ENCODE regions of the HapMap project, we are able to identify high/low multilocus LD regions. Our results reveal that the most LD in the HapMap data is attributed to the LD between adjacent pairs of markers across the whole region. LD between adjacent pairs of markers appears to be more significant in high multilocus LD regions than in low multilocus LD regions. We also find that as the multilocus total LD increases, the effects of high-order LD tends to get weaker due to the lack of observed multilocus haplotypes. The overall estimates of first, second, third, and fourth order LD across the ENCODE regions are 64, 23, 9, and 3%. Genet. Epidemiol. 2008. © 2008 Wiley-Liss, Inc. [source] Analysis of multivariable controllers using degree of freedom dataINTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, Issue 7-9 2003T. J. Harris Abstract Most approaches for monitoring, diagnosis and performance analysis of multivariable control loops employ time series methods and use non-parametric statistics to analyse the process inputs and outputs. In this paper, we explore the use of a discrete variable that summarizes the status of the constraint set of the controller to analyse the long run behaviour of control systems. We introduce a number of waiting and holding time statistics that describe the status of this data, which we call the degree of freedom data. We demonstrate how Markov Chains might be used to model the status of the degree of freedom data. This model-based approach has the potential to provide considerable insight into the behaviour of a model based control scheme with relative ease. We demonstrate the methodologies on simulated and industrial data. Copyright © 2003 John Wiley & Sons, Ltd. [source] Rapid haplotype reconstruction in pedigrees with dense marker mapsJOURNAL OF ANIMAL BREEDING AND GENETICS, Issue 1 2004J. J. Windig Summary Reconstruction of marker phases is not straightforward when parents are untyped. In these cases information from other relatives has to be used. In dense marker maps, however, the space of possible haplotype configurations tends to be too large for procedures such as Monte Carlo Markov chains (MCMC) to be finished within a reasonable time. We developed an algorithm that is fast and generally finds the most probable haplotype. The basic idea is to use, the smallest informative marker brackets in offspring, for each marker interval. By using only information from the offspring and analysing each marker interval separately, the lengthy analysis of large numbers of different haplotype configurations is avoided. Nevertheless the most probable haplotype can be found quickly provided the marker map is dense and enough offspring are available. Simulations are provided to indicate how well the algorithm works at different combinations of marker density, number of offspring and number of alleles per marker. In situations where the algorithm reconstruction of the most probable haplotype is not guaranteed, the algorithm may still provide a haplotype close to the optimum, i.e. a suitable starting point for numeric optimization algorithms. Zusammenfassung Die Rekonstruktion der Kopplungsphasen von Markern ist nicht unkompliziert, wenn die Typisierung der Eltern fehlt. In derartigen Fällen müssen Informationen von Verwandten genutzt werden. In dichten Markerkarten tendiert der Bereich für mögliche Haplotypenkonfigurationen jedoch dazu, zu groß zu werden, um Verfahren wie Monte Carlo Markov Chains (MCMC) in einem angemessenen Zeitrahmen anzuwenden. Wir entwickelten einen Algorithmus, der schnell ist und im Allgemeinen die wahrscheinlichsten Haplotypen findet. Die grundlegende Idee dabei bestand darin, für jeden Markerintervall erstfolgende informative Markern am linker und rechter Zeite in den Nachkommen zu nutzen. Durch die ausschließliche Nutzung von Nachkommeninformationen und durch die separate Analyse von Markerintervallen, wird die langatmige Analyse großer Anzahlen unterschiedlicher Haplotypenkonfigurationen umgangen. Dennoch kann der wahrscheinlichste Haplotyp schnell gefunden werden, vorausgesetzt die Markerkarte ist dicht und ausreichend Nachkommen sind verfügbar. Simulationen werden zur Verfügung gestellt, um zu zeigen wie gut der Algorithmus bei unterschiedlichen Kombinationen von Markerdichte, Anzahl von Nachkommen und Allelen pro Marker arbeitet. In Situationen, wo die algorithmische Rekonstruktion des wahrscheinlichsten Haplotypen nicht garantiert werden kann, kann der Algorithmus dennoch einen Haplotypen nahe des Optimums bereitstellen, z.B. einen geeigneten Startpunkt für numerische Optimierungsalgorithmen. [source] Conditional Heteroskedasticity Driven by Hidden Markov ChainsJOURNAL OF TIME SERIES ANALYSIS, Issue 2 2001Christian Francq We consider a generalized autoregressive conditionally heteroskedastic (GARCH) equation where the coefficients depend on the state of a nonobserved Markov chain. Necessary and sufficient conditions ensuring the existence of a stationary solution are given. In the case of ARCH regimes, the maximum likelihood estimates are shown to be consistent. The identification problem is also considered. This is illustrated by means of real and simulated data sets. [source] Finding Generators for Markov Chains via Empirical Transition Matrices, with Applications to Credit RatingsMATHEMATICAL FINANCE, Issue 2 2001Robert B. Israel In this paper we identify conditions under which a true generator does or does not exist for an empirically observed Markov transition matrix. We show how to search for valid generators and choose the "correct" one that is the most compatible with bond rating behaviors. We also show how to obtain an approximate generator when a true generator does not exist. We give illustrations using credit rating transition matrices published by Moody's and by Standard and Poor's. [source] Applications of Markov Chains in Particulate Process Engineering: A ReviewTHE CANADIAN JOURNAL OF CHEMICAL ENGINEERING, Issue 6 2004Henri Berthiaux Abstract Processes involving particles, are known to exhibit extremely unpredictable behaviour, mainly due to the mesoscopic nature of granular media. Understanding particulate processes, not only for intellectual satisfaction, but also for process design and operation, basically requires a systems approach in modelling. Because they combine simplicity and flexibility, the stochastic models based on the Markov chain theory are very valuable mathematical tools to this respect. However, they are still largely ignored by the whole core of chemical engineering researchers. This motivates the existence of this review paper, in which we examine the three traditional issues: mixing and transport, separation and transformation. Les procédés faisant intervenir des particules sont connus pour avoir un comportement extrêmement non prévisible, principalement à cause de la nature mésoscopique des milieux granulaires. Comprendre les procédés particulaires, non seulement pour la satisfaction intellectuelle, mais aussi pour la conception et le fonctionnement des procédés, nécessite en fait une approche des systèmes dans la modélisation. Parce qu'ils allient simplicité et flexibilité, les modèles stochastiques basés sur la théorie des chaînes de Markov sont des outils mathématiques à cet égard très valables. Cependant, ceux-ci sont encore largement ignorés par une majorité de chercheurs en génie chimique. Cette lacune motive le présent article, dans lequel nous examinons les trois thèmes traditionnels : mélange et transport, séparation, transformation. [source] Covariate Adjustment of Event Histories Estimated from Markov Chains: The Additive ApproachBIOMETRICS, Issue 4 2001Odd O. Aalen Summary. Markov chain models are frequently used for studying event histories that include transitions between several states. An empirical transition matrix for nonhomogeneous Markov chains has previously been developed, including a detailed statistical theory based on counting processes and martingales. In this article, we show how to estimate transition probabilities dependent on covariates. This technique may, e.g., be used for making estimates of individual prognosis in epidemiological or clinical studies. The covariates are included through nonparametric additive models on the transition intensities of the Markov chain. The additive model allows for estimation of covariate-dependent transition intensities, and again a detailed theory exists based on counting processes. The martingale setting now allows for a very natural combination of the empirical transition matrix and the additive model, resulting in estimates that can be expressed as stochastic integrals, and hence their properties are easily evaluated. Two medical examples will be given. In the first example, we study how the lung cancer mortality of uranium miners depends on smoking and radon exposure. In the second example, we study how the probability of being in response depends on patient group and prophylactic treatment for leukemia patients who have had a bone marrow transplantation. A program in R and S-PLUS that can carry out the analyses described here has been developed and is freely available on the Internet. [source] Replica Exchange Light TransportCOMPUTER GRAPHICS FORUM, Issue 8 2009Shinya Kitaoka I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism; I.3.3 [Computer Graphics]: Picture/Image Generation Abstract We solve the light transport problem by introducing a novel unbiased Monte Carlo algorithm called replica exchange light transport, inspired by the replica exchange Monte Carlo method in the fields of computational physics and statistical information processing. The replica exchange Monte Carlo method is a sampling technique whose operation resembles simulated annealing in optimization algorithms using a set of sampling distributions. We apply it to the solution of light transport integration by extending the probability density function of an integrand of the integration to a set of distributions. That set of distributions is composed of combinations of the path densities of different path generation types: uniform distributions in the integral domain, explicit and implicit paths in light (particle/photon) tracing, indirect paths in bidirectional path tracing, explicit and implicit paths in path tracing, and implicit caustics paths seen through specular surfaces including the delta function in path tracing. The replica-exchange light transport algorithm generates a sequence of path samples from each distribution and samples the simultaneous distribution of those distributions as a stationary distribution by using the Markov chain Monte Carlo method. Then the algorithm combines the obtained path samples from each distribution using multiple importance sampling. We compare the images generated with our algorithm to those generated with bidirectional path tracing and Metropolis light transport based on the primary sample space. Our proposing algorithm has better convergence property than bidirectional path tracing and the Metropolis light transport, and it is easy to implement by extending the Metropolis light transport. [source] Decision Making with Uncertain Judgments: A Stochastic Formulation of the Analytic Hierarchy Process*DECISION SCIENCES, Issue 3 2003Eugene D. Hahn ABSTRACT In the analytic hierarchy process (AHP), priorities are derived via a deterministic method, the eigenvalue decomposition. However, judgments may be subject to error. A stochastic characterization of the pairwise comparison judgment task is provided and statistical models are introduced for deriving the underlying priorities. Specifically, a weighted hierarchical multinomial logit model is used to obtain the priorities. Inference is then conducted from the Bayesian viewpoint using Markov chain Monte Carlo methods. The stochastic methods are found to give results that are congruent with those of the eigenvector method in matrices of different sizes and different levels of inconsistency. Moreover, inferential statements can be made about the priorities when the stochastic approach is adopted, and these statements may be of considerable value to a decision maker. The methods described are fully compatible with judgments from the standard version of AHP and can be used to construct a stochastic formulation of it. [source] Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methodsECOLOGY LETTERS, Issue 7 2007Subhash R. Lele Abstract We introduce a new statistical computing method, called data cloning, to calculate maximum likelihood estimates and their standard errors for complex ecological models. Although the method uses the Bayesian framework and exploits the computational simplicity of the Markov chain Monte Carlo (MCMC) algorithms, it provides valid frequentist inferences such as the maximum likelihood estimates and their standard errors. The inferences are completely invariant to the choice of the prior distributions and therefore avoid the inherent subjectivity of the Bayesian approach. The data cloning method is easily implemented using standard MCMC software. Data cloning is particularly useful for analysing ecological situations in which hierarchical statistical models, such as state-space models and mixed effects models, are appropriate. We illustrate the method by fitting two nonlinear population dynamics models to data in the presence of process and observation noise. [source] Spatio-temporal point process filtering methods with an applicationENVIRONMETRICS, Issue 3-4 2010ena Frcalová Abstract The paper deals with point processes in space and time and the problem of filtering. Real data monitoring the spiking activity of a place cell of hippocampus of a rat moving in an environment are evaluated. Two approaches to the modelling and methodology are discussed. The first one (known from literature) is based on recursive equations which enable to describe an adaptive system. Sequential Monte Carlo methods including particle filter algorithm are available for the solution. The second approach makes use of a continuous time shot-noise Cox point process model. The inference of the driving intensity leads to a nonlinear filtering problem. Parametric models support the solution by means of the Bayesian Markov chain Monte Carlo methods, moreover the Cox model enables to detect adaptivness. Model selection is discussed, numerical results are presented and interpreted. Copyright © 2009 John Wiley & Sons, Ltd. [source] INAR(1) modeling of overdispersed count series with an environmental applicationENVIRONMETRICS, Issue 4 2008Harry Pavlopoulos Abstract This paper is concerned with a novel version of the INAR(1) model, a non-linear auto-regressive Markov chain on ,, with innovations following a finite mixture distribution of Poisson laws. For , the stationary marginal probability distribution of the chain is overdispersed relative to a Poisson, thus making INAR(1) suitable for modeling time series of counts with arbitrary overdispersion. The one-step transition probability function of the chain is also a finite mixture, of m Poisson-Binomial laws, facilitating likelihood-based inference for model parameters. An explicit EM-algorithm is devised for inference by maximization of a conditional likelihood. Alternative options for inference are discussed along with criteria for selecting m. Integer-valued prediction (IP) is developed by a parametric bootstrap approach to ,coherent' forecasting, and a certain test statistic based on predictions is introduced for assessing performance of the fitted model. The proposed model is fitted to time series of counts of pixels where spatially averaged rain rate exceeds a given threshold level, illustrating its capabilities in challenging cases of highly overdispersed count data. Copyright © 2007 John Wiley & Sons, Ltd. [source] Chemical mass balance when an unknown source existsENVIRONMETRICS, Issue 8 2004Nobuhisa Kashiwagi Abstract A chemical mass balance method is proposed for the case where the existence of an unknown source is suspected. In general, when the existence of an unknown source is assumed in statistical receptor modeling, unknown quantities such as the composition of an unknown source and the contributions of assumed sources become unidentifiable. To estimate these unknown quantities avoiding the identification problem, a Bayes model for chemical mass balance is constructed in the form of composition without using prior knowledge on the unknown quantities except for natural constraints. The covariance of ambient observations given in the form of composition is defined in several ways. Markov chain Monte Carlo is used for evaluating the posterior means and variances of the unknown quantities as well as the likelihood for the proposed model. The likelihood is used for selecting the best fit covariance model. A simulation study is carried out to check the performance of the proposed method. Copyright © 2004 John Wiley & Sons, Ltd. [source] Empirical Bayes estimators and non-parametric mixture models for space and time,space disease mapping and surveillanceENVIRONMETRICS, Issue 5 2003Dankmar Böhning Abstract The analysis of the geographic variation of disease and its representation on a map is an important topic in epidemiological research and in public health in general. Identification of spatial heterogeneity of relative risk using morbidity and mortality data is required. Frequently, interest is also in the analysis of space data with respect to time, where typically data are used which are aggregated in certain time windows like 5 or 10 years. The occurrence measure of interest is usually the standardized mortality (morbidity) ratio (SMR). It is well known that disease maps in space or in space and time should not solely be based upon the crude SMR but rather some smoothed version of it. This fact has led to a tremendous amount of theoretical developments in spatial methodology, in particular in the area of hierarchical modeling in connection with fully Bayesian estimation techniques like Markov chain Monte Carlo. It seems, however, that at the same time, where these theoretical developments took place, on the practical side only very few of these developments have found their way into daily practice of epidemiological work and surveillance routines. In this article we focus on developments that avoid the pitfalls of the crude SMR and simultaneously retain a simplicity and, at least approximately, the validity of more complex models. After an illustration of the typical pitfalls of the crude SMR the article is centered around three issues: (a) the separation of spatial random variation from spatial structural variation; (b) a simple mixture model for capturing spatial heterogeneity; (c) an extension of this model for capturing temporal information. The techniques are illustrated by numerous examples. Public domain software like Dismap is mentioned that enables easy mixture modeling in the context of disease mapping. Copyright © 2003 John Wiley & Sons, Ltd. [source] Bayesian hierarchical models in ecological studies of health,environment effectsENVIRONMETRICS, Issue 2 2003Sylvia Richardson Abstract We describe Bayesian hierarchical models and illustrate their use in epidemiological studies of the effects of environment on health. The framework of Bayesian hierarchical models refers to a generic model building strategy in which unobserved quantities (e.g. statistical parameters, missing or mismeasured data, random effects, etc.) are organized into a small number of discrete levels with logically distinct and scientifically interpretable functions, and probabilistic relationships between them that capture inherent features of the data. It has proved to be successful for analysing many types of complex epidemiological and biomedical data. The general applicability of Bayesian hierarchical models has been enhanced by advances in computational algorithms, notably those belonging to the family of stochastic algorithms based on Markov chain Monte Carlo techniques. In this article, we review different types of design commonly used in studies of environment and health, give details on how to incorporate the hierarchical structure into the different components of the model (baseline risk, exposure) and discuss the model specification at the different levels of the hierarchy with particular attention to the problem of aggregation (ecological) bias. Copyright © 2003 John Wiley & Sons, Ltd. [source] Performance of TCP on low-bandwidth wireless links with delay spikesEUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, Issue 6 2008Pasi Lassila We model the goodput of a single TCP source on a wireless link experiencing sudden increases in Round Trip Time (RTT), that is delay spikes. Such spikes trigger spurious timeouts that reduce the TCP goodput. Renewal reward theory is used to derive a straightforward expression for TCP goodput that takes into account limited sending rates (limited window size), lost packets due to congestion and the delay spike properties such as the average spike duration and distribution of the spike intervals. The basic model is for independent and identically distributed (i.i.d.) spike intervals, and correlated spike intervals are modelled by using a modulating background Markov chain. Validation by ns2 simulations shows excellent agreement for lossless scenarios and good accuracy for moderate loss scenarios (for packet loss probabilities less than 5%). Numerical studies have also been performed to assess the impact of different spike interval distributions on TCP performance. Copyright © 2007 John Wiley & Sons, Ltd. [source] TCP-friendly transmission of voice over IPEUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, Issue 3 2003F. Beritelli In the last few years an increasing amount of attention has been paid to technologies for the transmission of voice over IP (VoIP). At present, the UDP transport protocol is used to provide this service. However, when the same bottleneck link is shared with TCP flows, and in the presence of a high network load and congestion, UDP sources capture most of the bandwidth, strongly penalizing TCP sources. To solve this problem some congestion control should be introduced for UDP traffic as well, in such a way that this traffic becomes TCP-friendly. In this perspective, several TCP-friendly algorithms have been proposed in the literature. Among them, the most promising candidates for the immediate future are RAP and TFRC. However, although these algorithms were introduced to support real-time applications on the Internet, up to now the only target in optimizing them has been that of achieving fairness with TCP flows in the network. No attention has been paid to the applications using them, and in particular, to the quality of service (QoS) perceived by their users. The target of this paper is to analyze the problem of transmitting voice over IP when voice sources use one of these TCP-friendly algorithms. With this aim, a VoIP system architecture is introduced and the characteristics of each its elements are discussed. To optimize the system, a multirate voice encoder is used so as to be feasible to work over a TCP layer, and a modification of both RAP and TFRC is proposed. Finally, in order to analyze the performance of the proposed system architecture and to compare the modified RAP and TFRC with the original algorithms, the sources have been modeled with an arrival process modulated by a Markov chain, and the model has been used to generate traffic in a simulation study performed with the ns-2 network simulator. Copyright © 2003 AEI. [source] DETECTING CORRELATION BETWEEN CHARACTERS IN A COMPARATIVE ANALYSIS WITH UNCERTAIN PHYLOGENYEVOLUTION, Issue 6 2003John P. Huelsenbeck Abstract., The importance of accommodating the phylogenetic history of a group when performing a comparative analysis is now widely recognized. The typical approaches either assume the tree is known without error, or they base inferences on a collection of well-supported trees or on a collection of trees generated under a stochastic model of cladogenesis. However, these approaches do not adequately account for the uncertainty of phylogenetic trees in a comparative analysis, especially when data relevant to the phylogeny of a group are available. Here, we develop a method for performing comparative analyses that is based on an extension of Felsenstein's independent contrasts method. Uncertainties in the phylogeny, branch lengths, and other parameters are accommodated by averaging over all possible trees, weighting each by the probability that the tree is correct. We do this in a Bayesian framework and use Markov chain Monte Carlo to perform the high-dimensional summations and integrations required by the analysis. We illustrate the method using comparative characters sampled from Anolis lizards. [source] A BAYESIAN FRAMEWORK FOR THE ANALYSIS OF COSPECIATIONEVOLUTION, Issue 2 2000John P. Huelsenbeck Abstract., Information on the history of cospeciation and host switching for a group of host and parasite species is contained in the DNA sequences sampled from each. Here, we develop a Bayesian framework for the analysis of cospeciation. We suggest a simple model of host switching by a parasite on a host phylogeny in which host switching events are assumed to occur at a constant rate over the entire evolutionary history of associated hosts and parasites. The posterior probability density of the parameters of the model of host switching are evaluated numerically using Markov chain Monte Carlo. In particular, the method generates the probability density of the number of host switches and of the host switching rate. Moreover, the method provides information on the probability that an event of host switching is associated with a particular pair of branches. A Bayesian approach has several advantages over other methods for the analysis of cospeciation. In particular, it does not assume that the host or parasite phylogenies are known without error; many alternative phylogenies are sampled in proportion to their probability of being correct. [source] Calculation of IBD probabilities with dense SNP or sequence dataGENETIC EPIDEMIOLOGY, Issue 6 2008Jonathan M. Keith Abstract The probabilities that two individuals share 0, 1, or 2 alleles identical by descent (IBD) at a given genotyped marker locus are quantities of fundamental importance for disease gene and quantitative trait mapping and in family-based tests of association. Until recently, genotyped markers were sufficiently sparse that founder haplotypes could be modelled as having been drawn from a population in linkage equilibrium for the purpose of estimating IBD probabilities. However, with the advent of high-throughput single nucleotide polymorphism genotyping assays, this is no longer a reasonable assumption. Indeed, the imminent arrival of individual sequencing will enable high-density single nucleotide polymorphism genotyping on a scale for which current algorithms are not equipped. In this paper, we present a simple new model in which founder haplotypes are modelled as a Markov chain. Another important innovation is that genotyping errors are explicitly incorporated into the model. We compare results obtained using the new model to those obtained using the popular genetic linkage analysis package Merlin, with and without using the cluster model of linkage disequilibrium that is incorporated into that program. We find that the new model results in accuracy approaching that of Merlin with haplotype blocks, but achieves this with orders of magnitude faster run times. Moreover, the new algorithm scales linearly with number of markers, irrespective of density, whereas Merlin scales supralinearly. We also confirm a previous finding that ignoring linkage disequilibrium in founder haplotypes can cause errors in the calculation of IBD probabilities. Genet. Epidemiol. 2008. © 2008 Wiley-Liss, Inc. [source] A dictionary model for haplotyping, genotype calling, and association testingGENETIC EPIDEMIOLOGY, Issue 7 2007Kristin L. Ayers Abstract We propose a new method for haplotyping, genotype calling, and association testing based on a dictionary model for haplotypes. In this framework, a haplotype arises as a concatenation of conserved haplotype segments, drawn from a predefined dictionary according to segment specific probabilities. The observed data consist of unphased multimarker genotypes gathered on a random sample of unrelated individuals. These genotypes are subject to mutation, genotyping errors, and missing data. The true pair of haplotypes corresponding to a person's multimarker genotype is reconstructed using a Markov chain that visits haplotype pairs according to their posterior probabilities. Our implementation of the chain alternates Gibbs steps, which rearrange the phase of a single marker, and Metropolis steps, which swap maternal and paternal haplotypes from a given maker onward. Output of the chain include the most likely haplotype pairs, the most likely genotypes at each marker, and the expected number of occurrences of each haplotype segment. Reconstruction accuracy is comparable to that achieved by the best existing algorithms. More importantly, the dictionary model yields expected counts of conserved haplotype segments. These imputed counts can serve as genetic predictors in association studies, as we illustrate by examples on cystic fibrosis, Friedreich's ataxia, and angiotensin-I converting enzyme levels. Genet. Epidemiol. © 2007 Wiley-Liss, Inc. [source] Linkage mapping methods applied to the COGA data set: Presentation Group 4 of Genetic Analysis Workshop 14GENETIC EPIDEMIOLOGY, Issue S1 2005E. Warwick Daw Abstract Presentation Group 4 participants analyzed the Collaborative Study on the Genetics of Alcoholism data provided for Genetic Analysis Workshop 14. This group examined various aspects of linkage analysis and related issues. Seven papers included linkage analyses, while the eighth calculated identity-by-descent (IBD) probabilities. Six papers analyzed linkage to an alcoholism phenotype: ALDX1 (four papers), ALDX2 (one paper), or a combination both (one paper). Methods used included Bayesian variable selection coupled with Haseman-Elston regression, recursive partitioning to identify phenotype and covariate groupings that interact with evidence for linkage, nonparametric linkage regression modeling, affected sib-pair linkage analysis with discordant sib-pair controls, simulation-based homozygosity mapping in a single pedigree, and application of a propensity score to collapse covariates in a general conditional logistic model. Alcoholism linkage was found with ,2 of these approaches on chromosomes 2, 4, 6, 7, 9, 14, and 21. The remaining linkage paper compared the utility of several single-nucleotide polymorphism (SNP) and microsatellite marker maps for Monte Carlo Markov chain combined oligogenic segregation and linkage analysis, and analyzed one of the electrophysiological endophenotypes, ttth1, on chromosome 7. Linkage was found with all marker sets. The last paper compared the multipoint IBD information content of several SNP sets and the microsatellite set, and found that while all SNP sets examined contained more information than the microsatellite set, most of the information contained in the SNP sets was captured by a subset of the SNP markers with ,1-cM marker spacing. From these papers, we highlight three points: a 1-cM SNP map seems to capture most of the linkage information, so denser maps do not appear necessary; careful and appropriate use of covariates can aid linkage analysis; and sources of increased gene-sharing between relatives should be accounted for in analyses. Genet. Epidemiol. 29(Suppl. 1):S29,S34, 2005. © 2005 Wiley-Liss, Inc. [source] Finding starting points for Markov chain Monte Carlo analysis of genetic data from large and complex pedigreesGENETIC EPIDEMIOLOGY, Issue 1 2003Yuqun Luo Abstract Genetic data from founder populations are advantageous for studies of complex traits that are often plagued by the problem of genetic heterogeneity. However, the desire to analyze large and complex pedigrees that often arise from such populations, coupled with the need to handle many linked and highly polymorphic loci simultaneously, poses challenges to current standard approaches. A viable alternative to solving such problems is via Markov chain Monte Carlo (MCMC) procedures, where a Markov chain, defined on the state space of a latent variable (e.g., genotypic configuration or inheritance vector), is constructed. However, finding starting points for the Markov chains is a difficult problem when the pedigree is not single-locus peelable; methods proposed in the literature have not yielded completely satisfactory solutions. We propose a generalization of the heated Gibbs sampler with relaxed penetrances (HGRP) of Lin et al., ([1993] IMA J. Math. Appl. Med. Biol. 10:1,17) to search for starting points. HGRP guarantees that a starting point will be found if there is no error in the data, but the chain usually needs to be run for a long time if the pedigree is extremely large and complex. By introducing a forcing step, the current algorithm substantially reduces the state space, and hence effectively speeds up the process of finding a starting point. Our algorithm also has a built-in preprocessing procedure for Mendelian error detection. The algorithm has been applied to both simulated and real data on two large and complex Hutterite pedigrees under many settings, and good results are obtained. The algorithm has been implemented in a user-friendly package called START. Genet Epidemiol 25:14,24, 2003. © 2003 Wiley-Liss, Inc. [source] A score for Bayesian genome screeningGENETIC EPIDEMIOLOGY, Issue 3 2003E. Warwick Daw Abstract Bayesian Monte Carlo Markov chain (MCMC) techniques have shown promise in dissecting complex genetic traits. The methods introduced by Heath ([1997], Am. J. Hum. Genet. 61:748,760), and implemented in the program Loki, have been able to localize genes for complex traits in both real and simulated data sets. Loki estimates the posterior probability of quantitative trait loci (QTL) at locations on a chromosome in an iterative MCMC process. Unfortunately, interpretation of the results and assessment of their significance have been difficult. Here, we introduce a score, the log of the posterior placement probability ratio (LOP), for assessing oligogenic QTL detection and localization. The LOP is the log of the posterior probability of linkage to the real chromosome divided by the posterior probability of linkage to an unlinked pseudochromosome, with marker informativeness similar to the marker data on the real chromosome. Since the LOP cannot be calculated exactly, we estimate it in simultaneous MCMC on both real and pseudochromosomes. We investigate empirically the distributional properties of the LOP in the presence and absence of trait genes. The LOP is not subject to trait model misspecification in the way a lod score may be, and we show that the LOP can detect linkage for loci of small effect when the lod score cannot. We show how, in the absence of linkage, an empirical distribution of the LOP may be estimated by simulation and used to provide an assessment of linkage detection significance. Genet Epidemiol 24:181,190, 2003. © 2003 Wiley-Liss, Inc. [source] Bayesian Estimation of Limited Dependent Variable Spatial Autoregressive ModelsGEOGRAPHICAL ANALYSIS, Issue 1 2000James P. LeSage A Gibbs sampling (Markov chain Monte Carlo) method for estimating spatial autoregressive limited dependent variable models is presented. The method can accommodate data sets containing spatial outliers and general forms of non-constant variance. It is argued that there are several advantages to the method proposed here relative to that proposed and illustrated in McMillen (1992) for spatial probit models. [source] Case studies in Bayesian segmentation applied to CD controlINTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, Issue 5 2003A.R. Taylor Identifying step changes in historical process and controller output variables can lead to improved process understanding and fault resolution in control system performance analysis. This paper describes an application of Bayesian methods in the search for statistically significant temporal segmentations in the data collected by a cross directional (CD) control system in an industrial web forming process. CD control systems give rise to vector observations which are often transformed through orthogonal bases for control and performance analysis. In this paper two models which exploit basis function representations of vector times series data are segmented. The first of these is a power spectrum model based on the asymptotic Chi-squared approximation which allows large data sets to be processed. The second approach, more capable of detecting small changes, but as a result is more computationally demanding, is a special case of the multivariate linear model. Given the statistical model of the data, inference regarding the number and location of the change-points is based on numerical Bayesian methods known as Markov chain Monte Carlo (MCMC). The methods are applied to real data and the resulting segmentation relates to real process events. Copyright © 2003 John Wiley & Sons, Ltd. [source] Hierarchical Bayesian modelling of wind and sea surface temperature from the Portuguese coastINTERNATIONAL JOURNAL OF CLIMATOLOGY, Issue 9 2010Ricardo T. Lemos Abstract In this work, we revisit a recent analysis that pointed to an overall relaxation of the Portuguese coastal upwelling system, between 1941 and 2000, and apply more elaborate statistical techniques to assess that evidence. Our goal is to fit a model for environmental variables that accommodate seasonal cycles, long-term trends, short-term fluctuations with some degree of autocorrelation, and cross-correlations between measuring sites and variables. Reference cell coding is used to investigate similarities in behaviour among sites. Parameter estimation is performed in a single modelling step, thereby producing more reliable credibility intervals than previous studies. This is of special importance in the assessment of trend significance. We employ a Bayesian approach with a purposely developed Markov chain Monte Carlo method to explore the posterior distribution of the parameters. Our results substantiate most previous findings and provide new insight on the relationship between wind and sea surface temperature off the Portuguese coast. Copyright © 2009 Royal Meteorological Society [source] Integration of mobility and intrusion detection for wireless ad hoc networksINTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, Issue 6 2007Bo Sun Abstract One of the main challenges in building intrusion detection systems (IDSs) for mobile ad hoc networks (MANETs) is to integrate mobility impacts and to adjust the behaviour of IDSs correspondingly. In this paper, we first introduce two different approaches, a Markov chain-based approach and a Hotelling's T2 test based approach, to construct local IDSs for MANETs. We then demonstrate that nodes' moving speed, a commonly used parameter in tuning IDS performances, is not an effective metric to tune IDS performances under different mobility models. To solve this problem, we further propose an adaptive scheme, in which suitable normal profiles and corresponding proper thresholds can be selected adaptively by each local IDS through periodically measuring its local link change rate, a proposed unified performance metric. We study the proposed adaptive mechanism at different mobility levels, using different mobility models such as random waypoint model, random drunken model, and obstacle mobility model. Simulation results show that our proposed adaptive scheme is less dependent on the underlying mobility models and can further reduce false positive ratio. Copyright © 2006 John Wiley & Sons, Ltd. [source] Mapping the forms of meaning in small worldsINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 7 2008Bruno Gaume Prox is a stochastic method to map the local and global structures of real-world complex networks, which are called small worlds. Prox transforms a graph into a Markov chain; the states of which are the nodes of the graph in question. Particles wander from one node to another within the graph by following the graph's edges. It is the dynamics of the particles' trajectories that map the structural properties of the graphs that are studied. Concrete examples are presented in a graph of synonyms to illustrate this approach. © 2008 Wiley Periodicals, Inc. [source] Efficiency of nested Markov chain Monte Carlo for polarizable potentials and perturbed HamiltoniansINTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, Issue 13 2010Florent Calvo Abstract Nested Markov chain Monte Carlo is a rigorous way to enhance sampling of a given energy landscape using an auxiliary, approximate potential energy surface. Its practical efficiency mainly depends on how cheap and how different are the auxiliary potential with respect to the reference system. In this article, a combined efficiency index is proposed and assessed for two important families of energy surfaces. As illustrated for water clusters, many-body polarizable potentials can be approximated by simplifying the polarization contribution and keeping only the two-body terms. In small systems, neglecting polarization entirely is also acceptable. When the reference potential energy is obtained from diagonalization of a quantum mechanical Hamiltonian, a first-order perturbation scheme can be used to estimate the energy difference occuring on a Monte Carlo move. Our results indicate that this perturbation approximation performs well provided that the number of steps between successive diagonalization is adjusted beforehand. © 2010 Wiley Periodicals, Inc. Int J Quantum Chem 110:2342,2346, 2010 [source] |