Home About us Contact

Simulated Data (simulated + data)

Distribution by Scientific Domains

Life Sciences	28%
Chemistry	16%
Mathematics and Statistics	14%
Engineering	10%
Psychology	6%
Physics and Astronomy	6%
Earth and Environmental Science	6%
5 Other Domains	14%

Terms modified by Simulated Data

simulated data set

Selected Abstracts

Case-control single-marker and haplotypic association analysis of pedigree data

GENETIC EPIDEMIOLOGY, Issue 2 2005
Sharon R. Browning
Abstract Related individuals collected for use in linkage studies may be used in case-control linkage disequilibrium analysis, provided one takes into account correlations between individuals due to identity-by-descent (IBD) sharing. We account for these correlations by calculating a weight for each individual. The weights are used in constructing a composite likelihood, which is maximized iteratively to form likelihood ratio tests for single-marker and haplotypic associations. The method scales well with increasing pedigree size and complexity, and is applicable to both autosomal and X chromosomes. We apply the approach to an analysis of association between type 2 diabetes and single-nucleotide polymorphism markers in the PPAR-, gene. Simulated data are used to check validity of the test and examine power. Analysis of related cases has better power than analysis of population-based cases because of the increased frequencies of disease-susceptibility alleles in pedigrees with multiple cases compared to the frequencies of these alleles in population-based cases. Also, utilizing all cases in a pedigree rather than just one per pedigree improves power by increasing the effective sample size. We demonstrate that our method has power at least as great as that of several competing methods, while offering advantages in the ability to handle missing data and perform haplotypic analysis. Genet. Epidemiol. 28:110,122, 2005. © 2004 Wiley-Liss, Inc. [source]

A randomisation program to compare species-richness values

INSECT CONSERVATION AND DIVERSITY, Issue 3 2008
JEAN M. L. RICHARDSON
Abstract., 1Comparisons of biodiversity estimates among sites or through time are hampered by a focus on using mean and variance estimates for diversity measures. These estimators depend on both sampling effort and on the abundances of organisms in communities, which makes comparison of communities possible only through the use of rarefaction curves that reduce all samples to the lowest sample size. However, comparing species richness among communities does not demand absolute estimates of species richness and statistical tests of similarity among communities are potentially more straightforward. 2This paper presents a program that uses randomisation methods to robustly test for differences in species richness among samples. Simulated data are used to show that the analysis has acceptable type I error rates and sufficient power to detect violations of the null hypothesis. An analysis of published bee data collected in 4 years shows how both sample size and hierarchical structure in sample type are incorporated into the analysis. 3The randomisation program is shown to be very robust to the presence of a dominant species, many rare species, and decreased sample size, giving quantitatively similar conclusions under all conditions. This method of testing for differences in biodiversity provides an important tool for researchers working on questions in community ecology and conservation biology. [source]

Scaling and Testing Multiplicative Combinations in the Expectancy,Value Model of Attitudes

JOURNAL OF APPLIED SOCIAL PSYCHOLOGY, Issue 9 2008
Icek Ajzen
This article examines the multiplicative combination of belief strength by outcome evaluation in the expectancy,value model of attitudes. Because linear transformation of a belief strength measure results in a nonlinear transformation of its product with outcome evaluation, use of unipolar or bipolar scoring must be empirically justified. Also, the claim that the Belief × Evaluation product fails to explain significant variance in attitudes is found to be baseless. In regression analyses, the main effect of belief strength takes account of the outcome's valence, and the main effect of outcome evaluation incorporates the outcome's perceived likelihood. Simulated data showed that multiplication adds substantially to the prediction of attitudes only when belief and evaluation measures cover the full range of potential scores. [source]

Time Deformation, Continuous Euler Processes and Forecasting

JOURNAL OF TIME SERIES ANALYSIS, Issue 6 2006
Chu-Ping C. Vijverberg
Abstract., A continuous Euler model has time-varying coefficients. Through a logarithmic time transformation, a continuous Euler model can be transformed to a continuous autoregressive (AR) model. By using the continuous Kalman filtering through the Laplace method, this article explores the data application of a continuous Euler process. This time deformation of an Euler process deforms specific time-variant (non-stationary) behaviour to time-invariant (stationary) data on the deformed time scale. With these time-invariant data on the transformed time scale, one may use traditional tools to conduct parameter estimation and forecasts. The obtained results can then be transformed back to the original time scale. Simulated data and actual data such as bat echolocation and the US residential investment growth are used to demonstrate the usefulness of time deformation in forecasting. The results indicate that fitting a traditional autoregressive moving-average (ARMA) model on an Euler data set without imposing time transformation leads to forecasts that are out of phase while the forecasts of an Euler model stay mostly in phase. [source]

Particle Size Distributions from Static Light Scattering with Regularized Non-Negative Least Squares Constraints

PARTICLE & PARTICLE SYSTEMS CHARACTERIZATION, Issue 6 2006
Alejandro R. Roig
Abstract Simulated data from static light scattering produced by several particle size distributions (PSD) of spherical particles in dilute solution is analyzed with a regularized non-negative least squares method (r-NNLS). Strong fluctuations in broad PSD's obtained from direct application of NNLS are supressed through an averaging procedure, as introduced long ago in the inversion problem in dynamic light scattering. A positive correlation between the best PSD obtained from several averaging schemes and the condition number of the respective data transfer matrices was obtained. The performance of the method is found to be similar to that of constrained regularization (CONTIN), which uses also NNLS as a starting solution, but incorporates another regularizing strategy. [source]

Methods to account for spatial autocorrelation in the analysis of species distributional data: a review

ECOGRAPHY, Issue 5 2007
Carsten F. Dormann
Species distributional or trait data based on range map (extent-of-occurrence) or atlas survey data often display spatial autocorrelation, i.e. locations close to each other exhibit more similar values than those further apart. If this pattern remains present in the residuals of a statistical model based on such data, one of the key assumptions of standard statistical analyses, that residuals are independent and identically distributed (i.i.d), is violated. The violation of the assumption of i.i.d. residuals may bias parameter estimates and can increase type I error rates (falsely rejecting the null hypothesis of no effect). While this is increasingly recognised by researchers analysing species distribution data, there is, to our knowledge, no comprehensive overview of the many available spatial statistical methods to take spatial autocorrelation into account in tests of statistical significance. Here, we describe six different statistical approaches to infer correlates of species' distributions, for both presence/absence (binary response) and species abundance data (poisson or normally distributed response), while accounting for spatial autocorrelation in model residuals: autocovariate regression; spatial eigenvector mapping; generalised least squares; (conditional and simultaneous) autoregressive models and generalised estimating equations. A comprehensive comparison of the relative merits of these methods is beyond the scope of this paper. To demonstrate each method's implementation, however, we undertook preliminary tests based on simulated data. These preliminary tests verified that most of the spatial modeling techniques we examined showed good type I error control and precise parameter estimates, at least when confronted with simplistic simulated data containing spatial autocorrelation in the errors. However, we found that for presence/absence data the results and conclusions were very variable between the different methods. This is likely due to the low information content of binary maps. Also, in contrast with previous studies, we found that autocovariate methods consistently underestimated the effects of environmental controls of species distributions. Given their widespread use, in particular for the modelling of species presence/absence data (e.g. climate envelope models), we argue that this warrants further study and caution in their use. To aid other ecologists in making use of the methods described, code to implement them in freely available software is provided in an electronic appendix. [source]

Statistical sensitivity analysis of packed column reactors for contaminated wastewater

ENVIRONMETRICS, Issue 8 2003
A. Fassò
Abstract In this article we consider the statistical sensitivity analysis of heavy metal biosorption in contaminated wastewater packed column reactors. In particular, the model describes the biosorption phenomenon using the Advection Dispersion Reaction equation under rapid local equilibrium. This allows computer simulation with random input parameters chosen from appropriate probability distributions. In order to have a statistical framework for analyzing the simulated data and assessing input importance, we introduce heteroskedastic and multivariate sensitivity analysis, which extends standard sensitivity analysis. Copyright © 2003 John Wiley & Sons, Ltd. [source]

A stronger latent-variable methodology to actual,ideal discrepancy

EUROPEAN JOURNAL OF PERSONALITY, Issue 7 2008
L. Francesca Scalas
Abstract We introduce a latent actual,ideal discrepancy (LAID) approach based on structural equation models (SEMs) with multiple indicators and empirically weighted variables. In Study 1, we demonstrate with simulated data, the superiority of a weighted approach to discrepancy in comparison to a classic unweighted one. In Study 2, we evaluate the effects of actual and ideal appearance on physical self-concept and self-esteem. Actual appearance contributes positively to physical self-concept and self-esteem, whereas ideal appearance contributes negatively. In support of multidimensional perspective, actual - and ideal -appearance effects on self-esteem are substantially,but not completely,mediated by physical self-concept. Whereas this pattern of results generalises across gender and age, multiple-group invariance tests show that the effect of actual appearance on physical self-concept is larger for women than for men. Copyright © 2008 John Wiley & Sons, Ltd. [source]

WHY DOES A METHOD THAT FAILS CONTINUE TO BE USED?

EVOLUTION, Issue 4 2009
THE ANSWER
It has been claimed that hundreds of researchers use nested clade phylogeographic analysis (NCPA) based on what the method promises rather than requiring objective validation of the method. The supposed failure of NCPA is based upon the argument that validating it by using positive controls ignored type I error, and that computer simulations have shown a high type I error. The first argument is factually incorrect: the previously published validation analysis fully accounted for both type I and type II errors. The simulations that indicate a 75% type I error rate have serious flaws and only evaluate outdated versions of NCPA. These outdated type I error rates fall precipitously when the 2003 version of single-locus NCPA is used or when the 2002 multilocus version of NCPA is used. It is shown that the tree-wise type I errors in single-locus NCPA can be corrected to the desired nominal level by a simple statistical procedure, and that multilocus NCPA reconstructs a simulated scenario used to discredit NCPA with 100% accuracy. Hence, NCPA is a not a failed method at all, but rather has been validated both by actual data and by simulated data in a manner that satisfies the published criteria given by its critics. The critics have come to different conclusions because they have focused on the pre-2002 versions of NCPA and have failed to take into account the extensive developments in NCPA since 2002. Hence, researchers can choose to use NCPA based upon objective critical validation that shows that NCPA delivers what it promises. [source]

Linkage disequilibrium estimates of contemporary Ne using highly variable genetic markers: a largely untapped resource for applied conservation and evolution

EVOLUTIONARY APPLICATIONS (ELECTRONIC), Issue 3 2010
Robin S. Waples
Abstract Genetic methods are routinely used to estimate contemporary effective population size (Ne) in natural populations, but the vast majority of applications have used only the temporal (two-sample) method. We use simulated data to evaluate how highly polymorphic molecular markers affect precision and bias in the single-sample method based on linkage disequilibrium (LD). Results of this study are as follows: (1) Low-frequency alleles upwardly bias , but a simple rule can reduce bias to [source]

Asymptotic tests of association with multiple SNPs in linkage disequilibrium

GENETIC EPIDEMIOLOGY, Issue 6 2009
Wei Pan
Abstract We consider detecting associations between a trait and multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium (LD). To maximize the use of information contained in multiple SNPs while minimizing the cost of large degrees of freedom (DF) in testing multiple parameters, we first theoretically explore the sum test derived under a working assumption of a common association strength between the trait and each SNP, testing on the corresponding parameter with only one DF. Under the scenarios that the association strengths between the trait and the SNPs are close to each other (and in the same direction), as considered by Wang and Elston [Am. J. Hum. Genet. [2007] 80:353,360], we show with simulated data that the sum test was powerful as compared to several existing tests; otherwise, the sum test might have much reduced power. To overcome the limitation of the sum test, based on our theoretical analysis of the sum test, we propose five new tests that are closely related to each other and are shown to consistently perform similarly well across a wide range of scenarios. We point out the close connection of the proposed tests to the Goeman test. Furthermore, we derive the asymptotic distributions of the proposed tests so that P -values can be easily calculated, in contrast to the use of computationally demanding permutations or simulations for the Goeman test. A distinguishing feature of the five new tests is their use of a diagonal working covariance matrix, rather than a full covariance matrix as used in the usual Wald or score test. We recommend the routine use of two of the new tests, along with several other tests, to detect disease associations with multiple linked SNPs. Genet. Epidemiol. 33:497,507, 2009. © 2009 Wiley-Liss, Inc. [source]

Quantification and correction of bias in tagging SNPs caused by insufficient sample size and marker density by means of haplotype-dropping,

GENETIC EPIDEMIOLOGY, Issue 1 2008
Mark M. Iles
Abstract Tagging single nucleotide polymorphisms (tSNPs) are commonly used to capture genetic diversity cost-effectively. It is important that the efficacy of tSNPs is correctly estimated, otherwise coverage may be inadequate and studies underpowered. Using data simulated under a coalescent model, we show that insufficient sample size can lead to overestimation of tSNP efficacy. Quantifying this we find that even when insufficient marker density is adjusted for, estimates of tSNP efficacy are up to 45% higher than the true values. Even with as many as 100 individuals, estimates of tSNP efficacy may be 9% higher than the true value. We describe a novel method for estimating tSNP efficacy accounting for limited sample size. The method is based on exclusion of haplotypes, incorporating a previous adjustment for insufficient marker density. We show that this method outperforms an existing Bootstrap approach. We compare the efficacy of multimarker and pairwise tSNP selection methods on real data. These confirm our findings with simulated data and suggest that pairwise methods are less sensitive to sample size, but more sensitive to marker density. We conclude that a combination of insufficient sample size and overfitting may cause overestimation of tSNP efficacy and underpowering of studies based on tSNPs. Our novel method corrects much of this bias and is superior to a previous method. However, sample sizes larger than previously suggested may be required for accurate estimation of tSNP efficacy. This has obvious ramifications for tSNP selection both in candidate regions and using HapMap or SNP chips for genomewide studies. Genet. Epidemiol. 31, 2007. © 2007 Wiley-Liss, Inc. [source]

Approaches to detecting gene × gene interaction in Genetic Analysis Workshop 14 pedigrees

GENETIC EPIDEMIOLOGY, Issue S1 2005
Brion S. Maher
Abstract Whether driven by the general lack of success in finding single-gene contributions to complex disease, by increased knowledge about the potential involvement of specific biological interactions in complex disease, or by recent dramatic increases in computational power, a large number of approaches to detect locus × locus interactions were recently proposed and implemented. The six Genetic Analysis Workshop 14 (GAW14) papers summarized here each applied either existing or refined approaches with the goal of detecting gene × gene, or locus × locus, interactions in the GAW14 data. Five of six papers analyzed the simulated data; the other analyzed the Collaborative Study on the Genetics of Alcoholism data. The analytic strategies implemented for detecting interactions included multifactor dimensionality reduction, conditional linkage analysis, nonparametric linkage correlation, two-locus parametric linkage analysis, and a joint test of linkage and association. Overall, most of the groups found limited success in consistently detecting all of the simulated interactions due, in large part, to the nature of the generating model. Genet. Epidemiol. 29(Suppl. 1):S116,S119, 2005. © 2005 Wiley-Liss, Inc. [source]

Genetic analysis of phenotypes derived from longitudinal data: Presentation Group 1 of Genetic Analysis Workshop 13

GENETIC EPIDEMIOLOGY, Issue S1 2003
Konstantin Strauch
Abstract The participants of Presentation Group 1 used the GAW13 data to derive new phenotypes, which were then analyzed for linkage and, in one case, for association to the genetic markers. Since the trait measurements ranged over longer time periods, the participants looked at the time dependence of particular traits in addition to the trait itself. The phenotypes analyzed with the Framingham data can be roughly divided into 1) body weight-related traits, which also include a type 2 diabetes progression trait, and 2) traits related to systolic blood pressure. Both trait classes are associated with metabolic syndrome. For traits related to body weight, linkage was consistently identified by at least two participating groups to genetic regions on chromosomes 4, 8, 11, and 18. For systolic blood pressure, or its derivatives, at least two groups obtained linkage for regions on chromosomes 4, 6, 8, 11, 14, 16, and 19. Five of the 13 participating groups focused on the simulated data. Due to the rather sparse grid of microsatellite markers, an association analysis for several traits was not successful. Linkage analysis of hypertension and body mass index using LODs and heterogeneity LODs (HLODs) had low power. For the glucose phenotype, a combination of random coefficient regression models and variance component linkage analysis turned out to be strikingly powerful in the identification of a trait locus simulated on chromosome 5. Haseman-Elston regression methods, applied to the same phenotype, had low power, but the above-mentioned chromosome 5 locus was not included in this analysis. Genet Epidemiol 25 (Suppl. 1):S5,S17, 2003. © 2003 Wiley-Liss, Inc. [source]

Evaluation of automated brain MR image segmentation and volumetry methods

HUMAN BRAIN MAPPING, Issue 4 2009
Frederick Klauschen
Abstract We compare three widely used brain volumetry methods available in the software packages FSL, SPM5, and FreeSurfer and evaluate their performance using simulated and real MR brain data sets. We analyze the accuracy of gray and white matter volume measurements and their robustness against changes of image quality using the BrainWeb MRI database. These images are based on "gold-standard" reference brain templates. This allows us to assess between- (same data set, different method) and also within-segmenter (same method, variation of image quality) comparability, for both of which we find pronounced variations in segmentation results for gray and white matter volumes. The calculated volumes deviate up to >10% from the reference values for gray and white matter depending on method and image quality. Sensitivity is best for SPM5, volumetric accuracy for gray and white matter was similar in SPM5 and FSL and better than in FreeSurfer. FSL showed the highest stability for white (<5%), FreeSurfer (6.2%) for gray matter for constant image quality BrainWeb data. Between-segmenter comparisons show discrepancies of up to >20% for the simulated data and 24% on average for the real data sets, whereas within-method performance analysis uncovered volume differences of up to >15%. Since the discrepancies between results reach the same order of magnitude as volume changes observed in disease, these effects limit the usability of the segmentation methods for following volume changes in individual patients over time and should be taken into account during the planning and analysis of brain volume studies. Hum Brain Mapp, 2009. © 2008 Wiley-Liss, Inc. [source]

Estimating the number of independent components for functional magnetic resonance imaging data

HUMAN BRAIN MAPPING, Issue 11 2007
Yi-Ou Li
Abstract Multivariate analysis methods such as independent component analysis (ICA) have been applied to the analysis of functional magnetic resonance imaging (fMRI) data to study brain function. Because of the high dimensionality and high noise level of the fMRI data, order selection, i.e., estimation of the number of informative components, is critical to reduce over/underfitting in such methods. Dependence among fMRI data samples in the spatial and temporal domain limits the usefulness of the practical formulations of information-theoretic criteria (ITC) for order selection, since they are based on likelihood of independent and identically distributed (i.i.d.) data samples. To address this issue, we propose a subsampling scheme to obtain a set of effectively i.i.d. samples from the dependent data samples and apply the ITC formulas to the effectively i.i.d. sample set for order selection. We apply the proposed method on the simulated data and show that it significantly improves the accuracy of order selection from dependent data. We also perform order selection on fMRI data from a visuomotor task and show that the proposed method alleviates the over-estimation on the number of brain sources due to the intrinsic smoothness and the smooth preprocessing of fMRI data. We use the software package ICASSO (Himberg et al. [ 2004]: Neuroimage 22:1214,1222) to analyze the independent component (IC) estimates at different orders and show that, when ICA is performed at overestimated orders, the stability of the IC estimates decreases and the estimation of task related brain activations show degradation. Hum Brain Mapp, 2007. © 2007 Wiley-Liss, Inc. [source]

Reproduction of temporal scaling by a rectangular pulses rainfall model

HYDROLOGICAL PROCESSES, Issue 3 2002
Jonas Olsson
Abstract The presence of scaling statistical properties in temporal rainfall has been well established in many empirical investigations during the latest decade. These properties have more and more come to be regarded as a fundamental feature of the rainfall process. How to best use the scaling properties for applied modelling remains to be assessed, however, particularly in the case of continuous rainfall time-series. One therefore is forced to use conventional time-series modelling, e.g. based on point process theory, which does not explicitly take scaling into account. In light of this, there is a need to investigate the degree to which point-process models are able to ,unintentionally' reproduce the empirical scaling properties. In the present study, four 25-year series of 20-min rainfall intensities observed in Arno River basin, Italy, were investigated. A Neyman,Scott rectangular pulses (NSRP) model was fitted to these series, so enabling the generation of synthetic time-series suitable for investigation. A multifractal scaling behaviour was found to characterize the raw data within a range of time-scales between approximately 20 min and 1 week. The main features of this behaviour were surprisingly well reproduced in the simulated data, although some differences were observed, particularly at small scales below the typical duration of a rain cell. This suggests the possibility of a combined use of the NSRP model and a scaling approach, in order to extend the NSRP range of applicability for simulation purposes. Copyright © 2002 John Wiley & Sons, Ltd. [source]

Smoothing Mechanisms in Defined Benefit Pension Accounting Standards: A Simulation Study,

ACCOUNTING PERSPECTIVES, Issue 2 2009
Cameron Morrill
ABSTRACT The accounting for defined benefit (DB) pension plans is complex and varies significantly across jurisdictions despite recent international convergence efforts. Pension costs are significant, and many worry that unfavorable accounting treatment could lead companies to terminate DB plans, a result that would have important social implications. A key difference in accounting standards relates to whether and how the effects of fluctuations in market and demographic variables on reported pension cost are "smoothed". Critics argue that smoothing mechanisms lead to incomprehensible accounting information and induce managers to make dysfunctional decisions. Furthermore, the effectiveness of these mechanisms may vary. We use simulated data to test the volatility, representational faithfulness, and predictive ability of pension accounting numbers under Canadian, British, and international standards (IFRS). We find that smoothed pension expense is less volatile, more predictive of future expense, and more closely associated with contemporaneous funding than is "unsmoothed" pension expense. The corridor method and market-related value approaches allowed under Canadian GAAP have virtually no smoothing effect incremental to the amortization of actuarial gains and losses. The pension accrual or deferred asset is highly correlated with the pension plan deficit/surplus. Our findings complement existing, primarily archival, pension accounting research and could provide guidance to standard-setters. [source]

Prior knowledge processing for initial state of Kalman filter

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, Issue 3 2010
E. Suzdaleva
Abstract The paper deals with a specification of the prior distribution of the initial state for Kalman filter. The subjective prior knowledge, used in state estimation, can be highly uncertain. In practice, incorporation of prior knowledge contributes to a good start of the filter. The present paper proposes a methodology for selection of the initial state distribution, which enables eliciting of prior knowledge from the available expert information. The proposed methodology is based on the use of the conjugate prior distribution for models belonging to the exponential family. The normal state-space model is used for demonstrating the methodology. The paper covers processing of the prior knowledge for state estimation, available in the form of simulated data. Practical experiments demonstrate the processing of prior knowledge from the urban traffic control area, which is the main application of the research. Copyright © 2009 John Wiley & Sons, Ltd. [source]

An equity-based passenger flow control model with application to Hong Kong-Shenzhen border-crossing

JOURNAL OF ADVANCED TRANSPORTATION, Issue 2 2002
Hai Yang
Cross-border passengers from Hong Kong to Shenzhen by the east Kowloon-Canton Railway (KCR) through the Lo Wu customs exceed nearly 200 thousand on a special day such as a day during the Chinese Spring Festival. Such heavy passenger demand often exceeds the processing and holding capacity of the Lo Wu customs for many hours a day. Thus, passengers must be metered off at all entrance stations along the KCR line through ticket rationing to restrain the number of passengers waiting at Lo Wu within its safe holding capacity. This paper proposes an optimal control strategy and model to deal with this passenger crowding and control problem. Because the maximum passenger checkout rate at Lo Wu is fixed, total passenger waiting time is not affected by the control strategy for given time-dependent arriving rates at each station. An equity-based control strategy is thus proposed to equalize the waiting times of passengers arriving at all stations at the same time. This equity is achieved through optimal allocation of the total quota of tickets to all entrance stations for each train service. The total ticket quota for each train service is determined such that the capacity constraint of the passenger queue at Lo Wu is satisfied. The control problem is formulated as a successive linear programming problem and demonstrated for the KCR system with partially simulated data. [source]

Fixed or random contemporary groups in genetic evaluation for litter size in pigs using a single trait repeatability animal model

JOURNAL OF ANIMAL BREEDING AND GENETICS, Issue 1 2003
D. Babot
Summary The importance of using fixed or random contemporary groups in the genetic evaluation of litter size in pigs was analysed by using farm and simulated data. Farm data were from four Spanish pig breeding populations, two Landrace (13 084 and 13 619 records) and two Large White (2762 and 8455 records). A simulated population (200 sows and 10 boars) selected for litter size, in which litter size was simulated using a repeatability animal model with random herd,year,season (HYS), was used to obtain simulated data. With farm data, the goodness-of-fit and the predictive ability of a repeatability animal model were calculated under several definitions of the HYS effect. A residual maximum likelihood estimator of the HYS variance in each population was obtained as well. In this sense, HYS was considered as either fixed or random with different number of individuals per level. Results from farm data showed that HYS variance was small in relation to the total variance (ranging from 0.01 to 0.04). The treatment of HYS effect as fixed, reduced the residual variance but the size of HYS levels does not explain by itself the goodness-of-fit of the model. The results obtained by simulation showed that the predictive ability of the model is better for random than for fixed HYS models. However, the improvement of predictive ability does not lead to a significant increase of the genetic response. Finally, results showed that random HYS models biased the estimates of genetic response when there is an environmental trend. Zusammenfassung Fixe oder zufällige Vergleichsgruppen bei der Zuchtwertschätzung für Wurfgröße beim Schwein mit einem Wiederholbarkeits-Tiermodell Der Einfluss von fixen oder zufälligen Vergleichgruppen bei der Zuchtwertschätzung für Wurfgröße beim Schwein wurde an realen Betriebsdaten und an simulierten Daten untersucht. Die Betriebsdaten stammen von vier spanischen Zuchtpopulationen, zwei Landrasse Populationen (13084 und 13619 Datensätze) und zwei Large White Populationen (2762 und 8455 Datensätze). Für die Simulation wurde eine Population (200 Sauen und 10 Eber), die auf Wurfgröße selektiert wurde, unter Berücksichtigung eines Wiederholbarkeitsmodelles und mit zufälligen Herden-Jahr-Saisonklassen simuliert. Anhand der Betriebsdaten wurde die Güte des Modells und die Vorhersagegenauigkeit des Wiederholbarkeitsmodelles mit verschiedenen Definitionen der Herden-Jahr-Saisonklassen geprüft. Mittels der REML-Methode wurden auch Varianzkomponenten für die Herden-Jahr-Saisonklassen geschätzt. Die Herden-Jahr-Saisonklassen wurden als fixer bzw. zufälliger Effekt mit unterschiedlicher Anzahl an Tieren pro Klasse im Modell berücksichtigt. Die Ergebnisse der Betriebsdaten ergaben, dass die Varianz für die Herden-Jahr-Saisonklassen nur einen kleinen Teil der Totalvarianz (von 0,01 bis 0,04) ausmachte. Mit den Herden-Jahr-Saisonklassen als fixer Effekt reduzierte sich die Restvarianz, aber die Größe der Herden-Jahr-Saisonklassen bestimmte nicht allein die Güte des Modells. Die Erhöhung der Vorhersagegenauigkeit ergab keinen signifikanten Anstieg des genetischen Fortschrittes. Abschließend bleibt festzustellen, dass Modelle mit zufälligen Herde-Jahr-Saisonklassen zu einem Bias des geschätzten genetischen Erfolges führten, wenn ein Umwelttrend vorhanden war. [source]

Indexing of powder diffraction patterns by iterative use of singular value decomposition

JOURNAL OF APPLIED CRYSTALLOGRAPHY, Issue 1 2003
A. A. Coelho
A fast method for indexing powder diffraction patterns has been developed for large and small lattices of all symmetries. The method is relatively insensitive to impurity peaks and missing high d -spacings: on simulated data, little effect in terms of successful indexing has been observed when one in three d -spacings are randomly removed. Comparison with three of the most popular indexing programs, namely ITO, DICVOL91 and TREOR90, has shown that the present method as implemented in the program TOPAS is more successful at indexing simulated data. Also significant is that the present method performs well on typically noisy data with large diffractometer zero errors. Critical to its success, the present method uses singular value decomposition in an iterative manner for solving linear equations relating hkl values to d -spacings. [source]

Modelling of small-angle X-ray scattering data using Hermite polynomials

JOURNAL OF APPLIED CRYSTALLOGRAPHY, Issue 4 2001
A. K. Swain
A new algorithm, called the term-selection algorithm (TSA), is derived to treat small-angle X-ray scattering (SAXS) data by fitting models to the scattering intensity using weighted Hermite polynomials. This algorithm exploits the orthogonal property of the Hermite polynomials and introduces an error-reduction ratio test to select the correct model terms or to determine which polynomials are to be included in the model and to estimate the associated unknown coefficients. With no a priori information about particle sizes, it is possible to evaluate the real-space distribution function as well as three- and one-dimensional correlation functions directly from the models fitted to raw experimental data. The success of this algorithm depends on the choice of a scale factor and the accuracy of orthogonality of the Hermite polynomials over a finite range of SAXS data. An algorithm to select a weighted orthogonal term is therefore derived to overcome the disadvantages of the TSA. This algorithm combines the properties and advantages of both weighted and orthogonal least-squares algorithms and is numerically more robust for the estimation of the parameters of the Hermite polynomial models. The weighting feature of the algorithm provides an additional degree of freedom to control the effects of noise and the orthogonal feature enables the reorthogonalization of the Hermite polynomials with respect to the weighting matrix. This considerably reduces the error in orthogonality of the Hermite polynomials. The performance of the algorithm has been demonstrated considering both simulated data and experimental data from SAXS measurements of dewaxed cotton fibre at different temperatures. [source]

Interactive curve resolution by using latent projections in polar coordinates

JOURNAL OF CHEMOMETRICS, Issue 1-2 2007
J. von Frese
Abstract The problem of resolving bilinear two-way data into the contributions from the underlying mixture components is of great interest for all hyphenated analytical techniques. The fact that the optimal solution to this problem at least to some extent depends on the nature of the data under study has lead to a numerous different approaches. One of the seminal publications in this area was contributed by Olav M. Kvalheim and Yi-Zeng Liang in 1992. They not only provided valuable Heuristic Evolving Latent Projections (HELP) but also enlightened many important aspects of curve resolution in this and numerous subsequent publications. Here we extend their key concept of HELP, that is the use of latent projective graphs for identifying one-component regions, by using polar coordinates for these analyses and thereby creating a simple, intuitive exploratory tool for directly solving the curve resolution problem for two and three components graphically. Our approach is demonstrated with simulated data, an example from reaction monitoring with broadband ultrafast spectroscopy and one chemometric standard data set. Copyright © 2007 John Wiley & Sons, Ltd. [source]

Constrained least squares methods for estimating reaction rate constants from spectroscopic data

JOURNAL OF CHEMOMETRICS, Issue 1 2002
Sabina Bijlsma
Abstract Model errors, experimental errors and instrumental noise influence the accuracy of reaction rate constant estimates obtained from spectral data recorded in time during a chemical reaction. In order to improve the accuracy, which can be divided into the precision and bias of reaction rate constant estimates, constraints can be used within the estimation procedure. The impact of different constraints on the accuracy of reaction rate constant estimates has been investigated using classical curve resolution (CCR). Different types of constraints can be used in CCR. For example, if pure spectra of reacting absorbing species are known in advance, this knowledge can be used explicitly. Also, the fact that pure spectra of reacting absorbing species are non-negative is a constraint that can be used in CCR. Experimental data have been obtained from UV-vis spectra taken in time of a biochemical reaction. From the experimental data, reaction rate constants and pure spectra were estimated with and without implementation of constraints in CCR. Because only the precision of reaction rate constant estimates could be investigated using the experimental data, simulations were set up that were similar to the experimental data in order to additionally investigate the bias of reaction rate constant estimates. From the results of the simulated data it is concluded that the use of constraints does not result self-evidently in an improvement in the accuracy of rate constant estimates. Guidelines for using constraints are given. Copyright © 2002 John Wiley & Sons, Ltd. [source]

Selecting significant factors by the noise addition method in principal component analysis

JOURNAL OF CHEMOMETRICS, Issue 7 2001
Brian K. Dable
Abstract The noise addition method (NAM) is presented as a tool for determining the number of significant factors in a data set. The NAM is compared to residual standard deviation (RSD), the factor indicator function (IND), chi-squared (,2) and cross-validation (CV) for establishing the number of significant factors in three data sets. The comparison and validation of the NAM are performed through Monte Carlo simulations with noise distributions of varying standard deviation, HPLC/UV-vis chromatographs of a mixture of aromatic hydrocarbons, and FIA of methyl orange. The NAM succeeds in correctly identifying the proper number of significant factors 98% of the time with the simulated data, 99% in the HPLC data sets and 98% with the FIA data. RSD and ,2 fail to choose the proper number of factors in all three data sets. IND identifies the correct number of factors in the simulated data sets but fails with the HPLC and FIA data sets. Both CV methods fail in the HPLC and FIA data sets. CV also fails for the simulated data sets, while the modified CV correctly chooses the proper number of factors an average of 80% of the time. Copyright © 2001 John Wiley & Sons, Ltd. [source]

Reliability and Attribute-Based Scoring in Cognitive Diagnostic Assessment

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 3 2009
Mark J. Gierl
The attribute hierarchy method (AHM) is a psychometric procedure for classifying examinees' test item responses into a set of structured attribute patterns associated with different components from a cognitive model of task performance. Results from an AHM analysis yield information on examinees' cognitive strengths and weaknesses. Hence, the AHM can be used for cognitive diagnostic assessment. The purpose of this study is to introduce and evaluate a new concept for assessing attribute reliability using the ratio of true score variance to observed score variance on items that probe specific cognitive attributes. This reliability procedure is evaluated and illustrated using both simulated data and student response data from a sample of algebra items taken from the March 2005 administration of the SAT. The reliability of diagnostic scores and the implications for practice are also discussed. [source]

An Application of Item Response Time: The Effort-Moderated IRT Model

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 1 2006
Steven L. Wise
The validity of inferences based on achievement test scores is dependent on the amount of effort that examinees put forth while taking the test. With low-stakes tests, for which this problem is particularly prevalent, there is a consequent need for psychometric models that can take into account differing levels of examinee effort. This article introduces the effort-moderated IRT model, which incorporates item response time into proficiency estimation and item parameter estimation. In two studies of the effort-moderated model when rapid guessing (i.e., reflecting low examinee effort) was present, one based on real data and the other on simulated data, the effort-moderated model performed better than the standard 3PL model. Specifically, it was found that the effort-moderated model (a) showed better model fit, (b) yielded more accurate item parameter estimates, (c) more accurately estimated test information, and (d) yielded proficiency estimates with higher convergent validity. [source]

Item Selection in Computerized Adaptive Testing: Should More Discriminating Items be Used First?

JOURNAL OF EDUCATIONAL MEASUREMENT, Issue 3 2001
Kit-Tai Hau
During computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. The traditional method of attaining the highest efficiency in ability estimation is to select items of maximum Fisher information at the currently estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies with simulated data addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with Sympson and Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order (i.e., less discriminating items first), as described in Chang and Ying's (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods, particularly for operational item pools when retired items cannot be totally replenished with similar highly discriminating items. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure, which can successfully even out the usage of all items. [source]

A theory of statistical models for Monte Carlo integration

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 3 2003
A. Kong
Summary. The task of estimating an integral by Monte Carlo methods is formulated as a statistical model using simulated observations as data. The difficulty in this exercise is that we ordinarily have at our disposal all of the information required to compute integrals exactly by calculus or numerical integration, but we choose to ignore some of the information for simplicity or computational feasibility. Our proposal is to use a semiparametric statistical model that makes explicit what information is ignored and what information is retained. The parameter space in this model is a set of measures on the sample space, which is ordinarily an infinite dimensional object. None-the-less, from simulated data the base-line measure can be estimated by maximum likelihood, and the required integrals computed by a simple formula previously derived by Vardi and by Lindsay in a closely related model for biased sampling. The same formula was also suggested by Geyer and by Meng and Wong using entirely different arguments. By contrast with Geyer's retrospective likelihood, a correct estimate of simulation error is available directly from the Fisher information. The principal advantage of the semiparametric model is that variance reduction techniques are associated with submodels in which the maximum likelihood estimator in the submodel may have substantially smaller variance than the traditional estimator. The method is applicable to Markov chain and more general Monte Carlo sampling schemes with multiple samplers. [source]