Correlation Structure (correlation + structure)

Distribution by Scientific Domains


Selected Abstracts


A Note on the Spatial Correlation Structure of County-Level Growth in the U.S.

JOURNAL OF REGIONAL SCIENCE, Issue 3 2001
Christopher H. Wheeler
This paper examines the spatial correlation structure of county-level growth across the contiguous United States. Estimated spatial correlograms using data on four different measures of aggregate economic activity,population, employment, income, and earnings,over the period 1984,1994 indicate that cross-county interdependence is limited to relatively short ranges of distance. For each of the measures, the average correlation between the growth rates of two counties approaches zero within a range of approximately 200 miles. Moreover, the rate at which correlations decline with distance is not uniform. Inside of roughly 40 miles correlations show only a very slow rate of decline whereas beyond this range they drop off at a substantially higher rate. [source]


Pyridazine and phthalazine derivatives with potential antimicrobial activity

JOURNAL OF HETEROCYCLIC CHEMISTRY, Issue 5 2007
Roxana M. Butnariu
Fifteen new pyridazine and phthalazine derivatives were prepared (in good to excellent yields) and tested in vitro as antimicrobial compounds. All the compounds have proved to have a remarkable activity against Gram positive germs, the results on Sarciria Luteea being spectacular. Correlation structure - biological activity have been done. Stereo- and region- chemistry involved in these reactions are discussed. [source]


Risk Modeling of Dependence among Project Task Durations

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, Issue 6 2007
I-Tung Yang
The assessments, however, can be strongly influenced by the dependence between task durations. In light of the need to address the dependence, the present study proposes a computer simulation model to incorporate and augment NORTA, a method for multivariate random number generation. The proposed model allows arbitrarily specified marginal distributions for task durations (need not be members of the same distribution family) and any desired correlation structure. This level of flexibility is of great practical value when systematic data is not available and planners have to rely on experts' subjective estimation. The application of the proposed model is demonstrated through scheduling a road pavement project. The proposed model is validated by showing that the sample correlation coefficients between task durations closely match the originally specified ones. Empirical comparisons between the proposed model and two conventional approaches, PERT and conventional simulation (without correlations), are used to illustrate the usefulness of the proposed model. [source]


Effects of species and habitat positional errors on the performance and interpretation of species distribution models

DIVERSITY AND DISTRIBUTIONS, Issue 4 2009
Patrick E. Osborne
Abstract Aim, A key assumption in species distribution modelling is that both species and environmental data layers contain no positional errors, yet this will rarely be true. This study assesses the effect of introduced positional errors on the performance and interpretation of species distribution models. Location, Baixo Alentejo region of Portugal. Methods, Data on steppe bird occurrence were collected using a random stratified sampling design on a 1-km2 pixel grid. Environmental data were sourced from satellite imagery and digital maps. Error was deliberately introduced into the species data as shifts in a random direction of 0,1, 2,3, 4,5 and 0,5 pixels. Whole habitat layers were shifted by 1 pixel to cause mis-registration, and the cumulative effect of one to three shifted layers investigated. Distribution models were built for three species using three algorithms with three replicates. Test models were compared with controls without errors. Results, Positional errors in the species data led to a drop in model performance (larger errors having larger effects , typically up to 10% drop in area under the curve on average), although not enough for models to be rejected. Model interpretation was more severely affected with inconsistencies in the contributing variables. Errors in the habitat layers had similar although lesser effects. Main conclusions, Models with species positional errors are hard to detect, often statistically good, ecologically plausible and useful for prediction, but interpreting them is dangerous. Mis-registered habitat layers produce smaller effects probably because shifting entire layers does not break down the correlation structure to the same extent as random shifts in individual species observations. Spatial autocorrelation in the habitat layers may protect against species positional errors to some extent but the relationship is complex and requires further work. The key recommendation must be that positional errors should be minimised through careful field design and data processing. [source]


Effects of spatially structured vegetation patterns on hillslope erosion in a semiarid Mediterranean environment: a simulation study

EARTH SURFACE PROCESSES AND LANDFORMS, Issue 2 2005
Matthias Boer
Abstract A general trend of decreasing soil loss rates with increasing vegetation cover fraction is widely accepted. Field observations and experimental work, however, show that the form of the cover-erosion function can vary considerably, in particular for low cover conditions that prevail on arid and semiarid hillslopes. In this paper the structured spatial distribution of the vegetation cover and associated soil attributes is proposed as one of the possible causes of variation in cover,erosion relationships, in particular in dryland environments where patchy vegetation covers are common. A simulation approach was used to test the hypothesis that hillslope discharge and soil loss could be affected by variation in the spatial correlation structure of coupled vegetation cover and soil patterns alone. The Limburg Soil Erosion Model (LISEM) was parameterized and verified for a small catchment with discontinuous vegetation cover at Rambla Honda, SE Spain. Using the same parameter sets LISEM was subsequently used to simulate water and sediment fluxes on 1 ha hypothetical hillslopes with simulated spatial distributions of vegetation and soil parameters. Storms of constant rainfall intensity in the range of 30,70 mm h,1 and 10,30 min duration were applied. To quantify the effect of the spatial correlation structure of the vegetation and soil patterns, predicted discharge and soil loss rates from hillslopes with spatially structured distributions of vegetation and soil parameters were compared with those from hillslopes with spatially uniform distributions. The results showed that the spatial organization of bare and vegetated surfaces alone can have a substantial impact on predicted storm discharge and erosion. In general, water and sediment yields from hillslopes with spatially structured distributions of vegetation and soil parameters were greater than from identical hillslopes with spatially uniform distributions. Within a storm the effect of spatially structured vegetation and soil patterns was observed to be highly dynamic, and to depend on rainfall intensity and slope gradient. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Stochastic matrix models for conservation and management: a comparative review of methods

ECOLOGY LETTERS, Issue 3 2001
John Fieberg
Stochastic matrix models are frequently used by conservation biologists to measure the viability of species and to explore various management actions. Models are typically parameterized using two or more sets of estimated transition rates between age/size/stage classes. While standard methods exist for analyzing a single set of transition rates, a variety of methods have been employed to analyze multiple sets of transition rates. We review applications of stochastic matrix models to problems in conservation and use simulation studies to compare the performance of different analytic methods currently in use. We find that model conclusions are likely to be robust to the choice of parametric distribution used to model vital rate fluctuations over time. However, conclusions can be highly sensitive to the within-year correlation structure among vital rates, and therefore we suggest using analytical methods that provide a means of conducting a sensitivity analysis with respect to correlation parameters. Our simulation results also suggest that the precision of population viability estimates can be improved by using matrix models that incorporate environmental covariates in conjunction with experiments to estimate transition rates under a range of environmental conditions. [source]


Predicting intra-urban variation in air pollution concentrations with complex spatio-temporal dependencies,

ENVIRONMETRICS, Issue 6 2010
Adam A. Szpiro
Abstract We describe a methodology for assigning individual estimates of long-term average air pollution concentrations that accounts for a complex spatio-temporal correlation structure and can accommodate spatio-temporally misaligned observations. This methodology has been developed as part of the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air), a prospective cohort study funded by the US EPA to investigate the relationship between chronic exposure to air pollution and cardiovascular disease. Our hierarchical model decomposes the space--time field into a "mean" that includes dependence on covariates and spatially varying seasonal and long-term trends and a "residual" that accounts for spatially correlated deviations from the mean model. The model accommodates complex spatio-temporal patterns by characterizing the temporal trend at each location as a linear combination of empirically derived temporal basis functions, and embedding the spatial fields of coefficients for the basis functions in separate linear regression models with spatially correlated residuals (universal kriging). This approach allows us to implement a scalable single-stage estimation procedure that easily accommodates a significant number of missing observations at some monitoring locations. We apply the model to predict long-term average concentrations of oxides of nitrogen (NOx) from 2005 to 2007 in the Los Angeles area, based on data from 18 EPA Air Quality System regulatory monitors. The cross-validated IR2 is 0.67. The MESA Air study is also collecting additional concentration data as part of a supplementary monitoring campaign. We describe the sampling plan and demonstrate in a simulation study that the additional data will contribute to improved predictions of long-term average concentrations. Copyright © 2009 John Wiley & Sons, Ltd. [source]


Allowing for redundancy and environmental effects in estimates of home range utilization distributions

ENVIRONMETRICS, Issue 1 2005
W. G. S. Hines
Abstract Real location data for radio tagged animals can be challenging to analyze. They can be somewhat redundant, since successive observations of an animal slowly wandering through its environment may well show very similar locations. The data set can possess trends over time or be irregularly timed, and they can report locations in environments with features that should be incorporated to some degree. Also, the periods of observation may be too short to provide reliable estimates of characteristics such as inter-observation correlation levels that can be used in conventional time-series analyses. Moreover, stationarity (in the sense of the data being generated by a source that provides observations of constant mean, variance and correlation structure) may not be present. This article considers an adaptation of the kernel density estimator for estimating home ranges, an adaptation which allows for these various complications and which works well in the absence of exact (or precise) information about correlation structure and parameters. Modifications to allow for irregularly timed observations, non-stationarity and heterogeneous environments are discussed and illustrated. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Systematic sample design for the estimation of spatial means

ENVIRONMETRICS, Issue 1 2003
Luis Ambrosio Flores
Abstract This article develops a practical approach to undertaking systematic sampling for the estimation of the spatial mean of an attribute in a selected area. A design-based approach is used to estimate population parameters, but it is combined with elements of a model-based approach in order to identify the spatial correlation structure, to evaluate the relative efficiency of the sample mean under simple random and systematic sampling, to estimate sampling error and to assess the sample size needed in order to achieve a desired level of precision. Using two case studies (land use estimation and weed seedbank in soil) it is demonstrated how the practical basis for the design of systematic samples provided in this work should be applied and it is shown that if the spatial correlation is ignored the sampling error of the sample mean and the sample size needed in order to achieve a desired level of precision with systematic sampling are overestimated. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Testing association between disease and multiple SNPs in a candidate gene

GENETIC EPIDEMIOLOGY, Issue 5 2007
W. James Gauderman
Abstract Current technology allows investigators to obtain genotypes at multiple single nucleotide polymorphism (SNPs) within a candidate locus. Many approaches have been developed for using such data in a test of association with disease, ranging from genotype-based to haplotype-based tests. We develop a new approach that involves two basic steps. In the first step, we use principal components (PCs) analysis to compute combinations of SNPs that capture the underlying correlation structure within the locus. The second step uses the PCs directly in a test of disease association. The PC approach captures linkage-disequilibrium information within a candidate region, but does not require the difficult computing implicit in a haplotype analysis. We demonstrate by simulation that the PC approach is typically as or more powerful than both genotype- and haplotype-based approaches. We also analyze association between respiratory symptoms in children and four SNPs in the Glutathione-S-Transferase P1 locus, based on data from the Children's Health Study. We observe stronger evidence of an association using the PC approach (p = 0.044) than using either a genotype-based (p = 0.13) or haplotype-based (p = 0.052) approach. Genet. Epidemiol. 2007. © 2007 Wiley-Liss, Inc. [source]


Analysis of single-locus tests to detect gene/disease associations,

GENETIC EPIDEMIOLOGY, Issue 3 2005
Kathryn Roeder
Abstract A goal of association analysis is to determine whether variation in a particular candidate region or gene is associated with liability to complex disease. To evaluate such candidates, ubiquitous Single Nucleotide Polymorphisms (SNPs) are useful. It is critical, however, to select a set of SNPs that are in substantial linkage disequilibrium (LD) with all other polymorphisms in the region. Whether there is an ideal statistical framework to test such a set of ,tag SNPs' for association is unknown. Compared to tests for association based on frequencies of haplotypes, recent evidence suggests tests for association based on linear combinations of the tag SNPs (Hotelling T2 test) are more powerful. Following this logical progression, we wondered if single-locus tests would prove generally more powerful than the regression-based tests? We answer this question by investigating four inferential procedures: the maximum of a series of test statistics corrected for multiple testing by the Bonferroni procedure, TB, or by permutation of case-control status, TP; a procedure that tests the maximum of a smoothed curve fitted to the series of of test statistics, TS; and the Hotelling T2 procedure, which we call TR. These procedures are evaluated by simulating data like that from human populations, including realistic levels of LD and realistic effects of alleles conferring liability to disease. We find that power depends on the correlation structure of SNPs within a gene, the density of tag SNPs, and the placement of the liability allele. The clearest pattern emerges between power and the number of SNPs selected. When a large fraction of the SNPs within a gene are tested, and multiple SNPs are highly correlated with the liability allele, TS has better power. Using a SNP selection scheme that optimizes power but also requires a substantial number of SNPs to be genotyped (roughly 10,20 SNPs per gene), power of TP is generally superior to that for the other procedures, including TR. Finally, when a SNP selection procedure that targets a minimal number of SNPs per gene is applied, the average performances of TP and TR are indistinguishable. Genet. Epidemiol. © 2005 Wiley-Liss, Inc. [source]


A Bayesian approach to the transmission/disequilibrium test for binary traits

GENETIC EPIDEMIOLOGY, Issue 1 2002
Varghese George
Abstract The transmission/disequilibrium test (TDT) for binary traits is a powerful method for detecting linkage between a marker locus and a trait locus in the presence of allelic association. The TDT uses information on the parent-to-offspring transmission status of the associated allele at the marker locus to assess linkage or association in the presence of the other, using one affected offspring from each set of parents. For testing for linkage in the presence of association, more than one offspring per family can be used. However, without incorporating the correlation structure among offspring, it is not possible to correctly assess the association in the presence of linkage. In this presentation, we propose a Bayesian TDT method as a complementary alternative to the classical approach. In the hypothesis testing setup, given two competing hypotheses, the Bayes factor can be used to weigh the evidence in favor of one of them, thus allowing us to decide between the two hypotheses using established criteria. We compare the proposed Bayesian TDT with a competing frequentist-testing method with respect to power and type I error validity. If we know the mode of inheritance of the disease, then the joint and marginal posterior distributions for the recombination fraction (,) and disequilibrium coefficient (,) can be obtained via standard MCMC methods, which lead naturally to Bayesian credible intervals for both parameters. Genet. Epidemiol. 22:41,51, 2002. © 2002 Wiley-Liss, Inc. [source]


Random fields,Union intersection tests for detecting functional connectivity in EEG/MEG imaging

HUMAN BRAIN MAPPING, Issue 8 2009
Felix Carbonell
Abstract Electrophysiological (EEG/MEG) imaging challenges statistics by providing two views of the same underlying spatio-temporal brain activity: a topographic view (EEG/MEG) and tomographic view (EEG/MEG source reconstructions). It is a common practice that statistical parametric mapping (SPM) for these two situations is developed separately. In particular, assessing statistical significance of functional connectivity is a major challenge in these types of studies. This work introduces statistical tests for assessing simultaneously the significance of spatio-temporal correlation structure between ERP/ERF components as well as that of their generating sources. We introduce a greatest root statistic as the multivariate test statistic for detecting functional connectivity between two sets of EEG/MEG measurements at a given time instant. We use some new results in random field theory to solve the multiple comparisons problem resulting from the correlated test statistics at each time instant. In general, our approach using the union-intersection (UI) principle provides a framework for hypothesis testing about any linear combination of sensor data, which allows the analysis of the correlation structure of both topographic and tomographic views. The performance of the proposed method is illustrated with real ERP data obtained from a face recognition experiment. Hum Brain Mapp 2009. © 2009 Wiley-Liss, Inc. [source]


Short-term MPEG-4 video traffic prediction using ANFIS

INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, Issue 6 2005
Adel Abdennour
Multimedia traffic and particularly MPEG-coded video streams are growing to be a major traffic component in high-speed networks. Accurate prediction of such traffic enhances the reliable operation and the quality of service of these networks through a more effective bandwidth allocation and better control strategies. However, MPEG video traffic is characterized by a periodic correlation structure, a highly complex bit rate distribution and very noisy streams. Therefore, it is considered an intractable problem. This paper presents a neuro-fuzzy short-term predictor for MPEG-4-coded videos. The predictor is based on the Adaptive Network Fuzzy Inference System (ANFIS) to perform single-step predictions for the I, P and B frames. Short-term predictions are also examined using smoothed signals of the video sequences. The ANFIS prediction results are evaluated using long entertainment and broadcast video sequences and compared to those obtained using a linear predictor. ANFIS is capable of providing accurate prediction and has the added advantage of being simple to design and to implement. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Evaluating effectiveness of preoperative testing procedure: some notes on modelling strategies in multi-centre surveys

JOURNAL OF EVALUATION IN CLINICAL PRACTICE, Issue 1 2008
Dario Gregori PhD
Abstract Rationale, In technology assessment in health-related fields the construction of a model for interpreting the economic implications of the introduction of a technology is only a part of the problem. The most important part is often the formulation of a model that can be used for selecting patients to submit to the new cost-saving procedure or medical strategy. The model is usually complicated by the fact that data are often non-homogeneous with respect to some uncontrolled variables and are correlated. The most typical example is the so-called hospital effect in multi-centre studies. Aims and objectives, We show the implications derived by different choices in modelling strategies when evaluating the usefulness of preoperative chest radiography, an exam performed before surgery, usually with the aim to detect unsuspected abnormalities that could influence the anaesthetic management and/or surgical plan. Method, We analyze the data from a multi-centre study including more than 7000 patients. We use about 6000 patients to fit regression models using both a population averaged and a subject-specific approach. We explore the limitations of these models when used for predictive purposes using a validation set of more than 1000 patients. Results, We show the importance of taking into account the heterogeneity among observations and the correlation structure of the data and propose an approach for integrating a population-averaged and subject specific approach into a single modeling strategy. We find that the hospital represents an important variable causing heterogeneity that influences the probability of a useful POCR. Conclusions, We find that starting with a marginal model, evaluating the shrinkage effect and eventually move to a more detailed model for the heterogeneity is preferable. This kind of flexible approach seems to be more informative at various phases of the model-building strategy. [source]


Monitoring of batch processes through state-space models

AICHE JOURNAL, Issue 6 2004
Jay H. Lee
Abstract The development of a state-space framework for monitoring batch processes that can complement the existing multivariate monitoring methods is presented. A subspace identification method will be used to extract the dynamic and batch-to-batch trends of the process and quality variables from historical operation data in the form of a "lifted" state-space stochastic model. A simple monitoring procedure can be formed around the state and residuals of the model using appropriate scalar statistical metrics. The proposed state-space monitoring framework complements the existing multivariate methods like the multi-way PCA method, in that it allows us to build a more complete statistical representation of batch operations and use it with incoming measurements for early detection of not only large, abrupt changes but also subtle changes. In particular, it is shown to be effective for detecting changes in the batch-to-batch correlation structure, slow drifts, and mean shifts. Such information can be useful in adapting the prediction model for batch-to-batch control. The framework allows for the use of on-line process measurements and/or off-line quality measurements. When both types of measurements are used in model building, one can also use the model to predict the quality variables based on incoming on-line measurements and quality measurements of previous batches. © 2004 American Institute of Chemical Engineers AIChE J, 50: 1198,1210, 2004 [source]


A Note on the Spatial Correlation Structure of County-Level Growth in the U.S.

JOURNAL OF REGIONAL SCIENCE, Issue 3 2001
Christopher H. Wheeler
This paper examines the spatial correlation structure of county-level growth across the contiguous United States. Estimated spatial correlograms using data on four different measures of aggregate economic activity,population, employment, income, and earnings,over the period 1984,1994 indicate that cross-county interdependence is limited to relatively short ranges of distance. For each of the measures, the average correlation between the growth rates of two counties approaches zero within a range of approximately 200 miles. Moreover, the rate at which correlations decline with distance is not uniform. Inside of roughly 40 miles correlations show only a very slow rate of decline whereas beyond this range they drop off at a substantially higher rate. [source]


Modelling longitudinal semicontinuous emesis volume data with serial correlation in an acupuncture clinical trial

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 4 2005
Paul S. Albert
Summary., In longitudinal studies, we are often interested in modelling repeated assessments of volume over time. Our motivating example is an acupuncture clinical trial in which we compare the effects of active acupuncture, sham acupuncture and standard medical care on chemotherapy-induced nausea in patients being treated for advanced stage breast cancer. An important end point for this study was the daily measurement of the volume of emesis over a 14-day follow-up period. The repeated volume data contained many 0s, had apparent serial correlation and had missing observations, making analysis challenging. The paper proposes a two-part latent process model for analysing the emesis volume data which addresses these challenges. We propose a Monte Carlo EM algorithm for parameter estimation and we use this methodology to show the beneficial effects of acupuncture on reducing the volume of emesis in women being treated for breast cancer with chemotherapy. Through simulations, we demonstrate the importance of correctly modelling the serial correlation for making conditional inference. Further, we show that the correct model for the correlation structure is less important for making correct inference on marginal means. [source]


Effects of correlation and missing data on sample size estimation in longitudinal clinical trials

PHARMACEUTICAL STATISTICS: THE JOURNAL OF APPLIED STATISTICS IN THE PHARMACEUTICAL INDUSTRY, Issue 1 2010
Song Zhang
Abstract In longitudinal clinical trials, a common objective is to compare the rates of changes in an outcome variable between two treatment groups. Generalized estimating equation (GEE) has been widely used to examine if the rates of changes are significantly different between treatment groups due to its robustness to misspecification of the true correlation structure and randomly missing data. The sample size formula for repeated outcomes is based on the assumption of missing completely at random and a large sample approximation. A simulation study is conducted to investigate the performance of GEE sample size formula with small sample sizes, damped exponential family of correlation structure and non-ignorable missing data. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Evaluation of the statistical power for multiple tests: a case study

PHARMACEUTICAL STATISTICS: THE JOURNAL OF APPLIED STATISTICS IN THE PHARMACEUTICAL INDUSTRY, Issue 1 2009
Adeline Yeo
Abstract It is challenging to estimate the statistical power when a complicated testing strategy is used to adjust for the type-I error for multiple comparisons in a clinical trial. In this paper, we use the Bonferroni Inequality to estimate the lower bound of the statistical power assuming that test statistics are approximately normally distributed and the correlation structure among test statistics is unknown or only partially known. The method was applied to the design of a clinical study for sample size and statistical power estimation. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Modified weights based generalized quasilikelihood inferences in incomplete longitudinal binary models

THE CANADIAN JOURNAL OF STATISTICS, Issue 2 2010
Brajendra C. Sutradhar
Abstract In an incomplete longitudinal set up, a small number of repeated responses subject to an appropriate missing mechanism along with a set of covariates are collected from a large number of independent individuals over a small period of time. In this set up, the regression effects of the covariates are routinely estimated by solving certain inverse weights based generalized estimating equations. These inverse weights are introduced to make the estimating equation unbiased so that a consistent estimate of the regression parameter vector may be obtained. In the existing studies, these weights are in general formulated conditional on the past responses. Since the past responses follow a correlation structure, the present study reveals that if the longitudinal data subject to missing mechanism are generated by accommodating the longitudinal correlation structure, the conditional weights based on past correlated responses may yield biased and hence inconsistent regression estimates. The bias appears to get larger as the correlation increases. As a remedy, in this paper the authors proposed a modification to the formulation of the existing weights so that weights are not affected directly or indirectly by the correlations. They have then exploited these modified weights to form a weighted generalized quasi-likelihood estimating equation that yields unbiased and hence consistent estimates for the regression effects irrespective of the magnitude of correlation. The efficiencies of the regression estimates follow due to the use of the true correlation structure as a separate longitudinal weights matrix in the estimating equation. The Canadian Journal of Statistics © 2010 Statistical Society of Canada Dans un cadre de données longitudinales incomplètes, nous observons un petit nombre de réponses répétées sujettes à un mécanisme de valeurs manquantes approprié avec un ensemble de covariables provenant d'un grand nombre d'individus indépendants observés sur une petite période de temps. Dans ce cadre, les composantes de régression des covariables sont habituellement estimées en résolvant certains poids inverses obtenus à partir d'équations d'estimation généralisées. Ces poids inverses sont utilisés afin de rendre les équations d'estimation sans biais et ainsi permettre d'obtenir des estimateurs cohérents pour le vecteur des paramètres de régressions. Dans les études déjà existantes, ces poids sont généralement formulés conditionnement aux réponses passées. Puisque les réponses passées possèdent une structure de corrélation, cet article révèle que si les données longitudinales, soumises à un mécanisme de valeurs manquantes, sont générées en adaptant la structure de corrélation longitudinale, alors les poids conditionnels basés sur les réponses corrélées passées peuvent mener à des estimations biaisées, et conséquemment non cohérentes, des composantes de régression. Ce biais semble augmenter lorsque la corrélation augmente. Pour remédier à cette situation, les auteurs proposent dans cet article, une modification aux poids déjà existants afin que ceux-ci ne soient plus affectés directement ou indirectement par les corrélations. Par la suite, ils ont exploité ces poids modifiés pour obtenir une équation d'estimation généralisée pondérée basée sur la quasi-vraisemblance qui conduit à des estimateurs sans biais, et ainsi cohérents, pour les composantes de régression sans égard à l'ampleur de la corrélation. L'efficacité de ces estimateurs est attribuable à l'utilisation de la vraie structure de corrélation comme matrice de poids longitudinale à part dans l'équation d'estimation. La revue canadienne de statistique © 2010 Société statistique du Canada [source]


The Relative Valuation of Caps and Swaptions: Theory and Empirical Evidence

THE JOURNAL OF FINANCE, Issue 6 2001
Francis A. Longstaff
Although traded as distinct products, caps and swaptions are linked by no-arbitrage relations through the correlation structure of interest rates. Using a string market model, we solve for the correlation matrix implied by swaptions and examine the relative valuation of caps and swaptions. We find that swaption prices are generated by four factors and that implied correlations are lower than historical correlations. Long-dated swaptions appear mispriced and there were major pricing distortions during the 1998 hedge-fund crisis. Cap prices periodically deviate significantly from the no-arbitrage values implied by the swaptions market. [source]


Efficient Association Study Design Via Power-Optimized Tag SNP Selection

ANNALS OF HUMAN GENETICS, Issue 6 2008
B. Han
Summary Discovering statistical correlation between causal genetic variation and clinical traits through association studies is an important method for identifying the genetic basis of human diseases. Since fully resequencing a cohort is prohibitively costly, genetic association studies take advantage of local correlation structure (or linkage disequilibrium) between single nucleotide polymorphisms (SNPs) by selecting a subset of SNPs to be genotyped (tag SNPs). While many current association studies are performed using commercially available high-throughput genotyping products that define a set of tag SNPs, choosing tag SNPs remains an important problem for both custom follow-up studies as well as designing the high-throughput genotyping products themselves. The most widely used tag SNP selection method optimizes the correlation between SNPs (r2). However, tag SNPs chosen based on an r2 criterion do not necessarily maximize the statistical power of an association study. We propose a study design framework that chooses SNPs to maximize power and efficiently measures the power through empirical simulation. Empirical results based on the HapMap data show that our method gains considerable power over a widely used r2 -based method, or equivalently reduces the number of tag SNPs required to attain the desired power of a study. Our power-optimized 100k whole genome tag set provides equivalent power to the Affymetrix 500k chip for the CEU population. For the design of custom follow-up studies, our method provides up to twice the power increase using the same number of tag SNPs as r2 -based methods. Our method is publicly available via web server at http://design.cs.ucla.edu. [source]


A Bayesian Spatial Multimarker Genetic Random-Effect Model for Fine-Scale Mapping

ANNALS OF HUMAN GENETICS, Issue 5 2008
M.-Y. Tsai
Summary Multiple markers in linkage disequilibrium (LD) are usually used to localize the disease gene location. These markers may contribute to the disease etiology simultaneously. In contrast to the single-locus tests, we propose a genetic random effects model that accounts for the dependence between loci via their spatial structures. In this model, the locus-specific random effects measure not only the genetic disease risk, but also the correlations between markers. In other words, the model incorporates this relation in both mean and covariance structures, and the variance components play important roles. We consider two different settings for the spatial relations. The first is our proposal, relative distance function (RDF), which is intuitive in the sense that markers nearby are likely to correlate with each other. The second setting is a common exponential decay function (EDF). Under each setting, the inference of the genetic parameters is fully Bayesian with Markov chain Monte Carlo (MCMC) sampling. We demonstrate the validity and the utility of the proposed approach with two real datasets and simulation studies. The analyses show that the proposed model with either one of two spatial correlations performs better as compared with the single locus analysis. In addition, under the RDF model, a more precise estimate for the disease locus can be obtained even when the candidate markers are fairly dense. In all simulations, the inference under the true model provides unbiased estimates of the genetic parameters, and the model with the spatial correlation structure does lead to greater confidence interval coverage probabilities. [source]


Optimal designs for parameter estimation of the Ornstein,Uhlenbeck process

APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, Issue 5 2009
Maroussa Zagoraiou
Abstract This paper deals with optimal designs for Gaussian random fields with constant trend and exponential correlation structure, widely known as the Ornstein,Uhlenbeck process. Assuming the maximum likelihood approach, we study the optimal design problem for the estimation of the trend µ and the correlation parameter , using a criterion based on the Fisher information matrix. For the problem of trend estimation, we give a new proof of the optimality of the equispaced design for any sample size (see Statist. Probab. Lett. 2008; 78:1388,1396). We also show that for the estimation of the correlation parameter, an optimal design does not exist. Furthermore, we show that the optimal strategy for µ conflicts with the one for ,, since the equispaced design is the worst solution for estimating the correlation. Hence, when the inferential purpose concerns both the unknown parameters we propose the geometric progression design, namely a flexible class of procedures that allow the experimenter to choose a suitable compromise regarding the estimation's precision of the two unknown parameters guaranteeing, at the same time, high efficiency for both. Copyright © 2008 John Wiley & Sons, Ltd. [source]


Issues in the optimal design of computer simulation experiments

APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, Issue 2 2009
Werner Müller
Abstract Output from computer simulation experiments is often approximated as realizations of correlated random fields. Consequently, the corresponding optimal design questions must cope with the existence and detection of an error correlation structure, issues largely unaccounted for by traditional optimal design theory. Unfortunately, many of the nice features of well-established design techniques, such as additivity of the information matrix, convexity of design criteria, etc., do not carry over to the setting of interest. This may lead to unexpected, counterintuitive, even paradoxical effects in the design as well as the analysis stage of computer simulation experiments. In this paper we intend to give an overview and some simple but illuminating examples of this behaviour. Copyright © 2009 John Wiley & Sons, Ltd. [source]


APPROXIMATING VOLATILITIES BY ASYMMETRIC POWER GARCH FUNCTIONS

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2009
Jeremy Penzer
Summary ARCH/GARCH representations of financial series usually attempt to model the serial correlation structure of squared returns. Although it is undoubtedly true that squared returns are correlated, there is increasing empirical evidence of stronger correlation in the absolute returns than in squared returns. Rather than assuming an explicit form for volatility, we adopt an approximation approach; we approximate the ,th power of volatility by an asymmetric GARCH function with the power index , chosen so that the approximation is optimum. Asymptotic normality is established for both the quasi-maximum likelihood estimator (qMLE) and the least absolute deviations estimator (LADE) in our approximation setting. A consequence of our approach is a relaxation of the usual stationarity condition for GARCH models. In an application to real financial datasets, the estimated values for , are found to be close to one, consistent with the stylized fact that the strongest autocorrelation is found in the absolute returns. A simulation study illustrates that the qMLE is inefficient for models with heavy-tailed errors, whereas the LADE is more robust. [source]


A Note on the Use of Unbiased Estimating Equations to Estimate Correlation in Analysis of Longitudinal Trials

BIOMETRICAL JOURNAL, Issue 1 2009
Wenguang Sun
Abstract Longitudinal trials can yield outcomes that are continuous, binary (yes/no), or are realizations of counts. In this setting we compare three approaches that have been proposed for estimation of the correlation in the framework of generalized estimating equations (GEE): quasi-least squares (QLS), pseudo-likelihood (PL), and an approach we refer to as Wang,Carey (WC). We prove that WC and QLS are identical for the first-order autoregressive AR(1) correlation structure. Using simulations, we then develop guidelines for selection of an appropriate method for analysis of data from a longitudinal trial. In particular, we argue that no method is uniformly superior for analysis of unbalanced and unequally spaced data with a Markov correlation structure. Choice of the best approach will depend on the degree of imbalance and variability in the temporal spacing of measurements, value of the correlation, and type of outcome, e.g. binary or continuous. Finally, we contrast the methods in analysis of a longitudinal study of obesity following renal transplantation in children (© 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


The Wilcoxon Signed Rank Test for Paired Comparisons of Clustered Data

BIOMETRICS, Issue 1 2006
Bernard Rosner
Summary The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with ,20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen (1971, Nonparametric Methods in Multivariate Analysis, New York: John Wiley). Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols. [source]


Fitting copulas to bivariate earthquake data: the seismic gap hypothesis revisited

ENVIRONMETRICS, Issue 3 2008
Aristidis K. Nikoloulopoulos
Abstract The seismic gap hypothesis assumes that the intensity of an earthquake and the time elapsed from the previous one are positively related. Previous works on this topic were based on particular assumptions for the joint distribution implying specific type of dependence. We investigate this hypothesis using copulas. Copulas are flexible for modelling the dependence structure far from assuming simple linear correlation structures and, thus, allow for better examination of this controversial aspect of geophysical research. In fact, via copulas, marginal properties and dependence structure can be separated. We propose a model averaging approach in order to allow for model uncertainty and diminish the effect of the choice of a particular copula. This enlarges the range of potential dependence structures that can be investigated. Application to a real data set is provided. Copyright © 2007 John Wiley & Sons, Ltd. [source]