Individual Observations (individual + observation)

Distribution by Scientific Domains


Selected Abstracts


Advantages of mixed effects models over traditional ANOVA models in developmental studies: A worked example in a mouse model of fetal alcohol syndrome

DEVELOPMENTAL PSYCHOBIOLOGY, Issue 7 2007
Patricia E. Wainwright
Abstract Developmental studies in animals often violate the assumption of statistical independence of observations due to the hierarchical nature of the data (i.e., pups cluster by litter, correlation of individual observations over time). Mixed effect modeling (MEM) provides a robust analytical approach for addressing problems associated with hierarchical data. This article compares the application of MEM to traditional ANOVA models within the context of a developmental study of prenatal ethanol exposure in mice. The results of the MEM analyses supported the ANOVA results in showing that a large proportion of the variability in both behavioral score and brain weight could be explained by ethanol. The MEM also identified that there were significant interactions between ethanol and litter size in relation to behavioral scores and brain weight. In addition, the longitudinal modeling approach using linear MEM allowed us to model for flexible weight gain over time, as well as to provide precise estimates of these effects, which would be difficult in repeated measures ANOVA. © 2007 Wiley Periodicals, Inc. Dev Psychobiol 49: 664,674, 2007. [source]


Comparing weighted and unweighted analyses applied to data with a mix of pooled and individual observations

ENVIRONMENTAL TOXICOLOGY & CHEMISTRY, Issue 5 2010
Sarah G. Anderson
Abstract Smaller organisms may have too little tissue to allow assaying as individuals. To get a sufficient sample for assaying, a collection of smaller individual organisms is pooled together to produce a simple observation for modeling and analysis. When a dataset contains a mix of pooled and individual organisms, the variances of the observations are not equal. An unweighted regression method is no longer appropriate because it assumes equal precision among the observations. A weighted regression method is more appropriate and yields more precise estimates because it incorporates a weight to the pooled observations. To demonstrate the benefits of using a weighted analysis when some observations are pooled, the bias and confidence interval (CI) properties were compared using an ordinary least squares and a weighted least squares t -based confidence interval. The slope and intercept estimates were unbiased for both weighted and unweighted analyses. While CIs for the slope and intercept achieved nominal coverage, the CI lengths were smaller using a weighted analysis instead of an unweighted analysis, implying that a weighted analysis will yield greater precision. Environ. Toxicol. Chem. 2010;29:1168,1171. © 2010 SETAC [source]


Advanced Statistics:Statistical Methods for Analyzing Cluster and Cluster-randomized Data

ACADEMIC EMERGENCY MEDICINE, Issue 4 2002
Robert L. Wears MD
Abstract. Sometimes interventions in randomized clinical trials are not allocated to individual patients, but rather to patients in groups. This is called cluster allocation, or cluster randomization, and is particularly common in health services research. Similarly, in some types of observational studies, patients (or observations) are found in naturally occurring groups, such as neighborhoods. In either situation, observations within a cluster tend to be more alike than observations selected entirely at random. This violates the assumption of independence that is at the heart of common methods of statistical estimation and hypothesis testing. Failure to account for the dependence between individual observations and the cluster to which they belong can have profound implications on the design and analysis of such studies. Their p-values will be too small, confidence intervals too narrow, and sample size estimates too small, sometimes to a dramatic degree. This problem is similar to that caused by the more familiar "unit of analysis error" seen when observations are repeated on the same subjects, but are treated as independent. The purpose of this paper is to provide an introduction to the problem of clustered data in clinical research. It provides guidance and examples of methods for analyzing clustered data and calculating sample sizes when planning studies. The article concludes with some general comments on statistical software for cluster data and principles for planning, analyzing, and presenting such studies. [source]


The influences of data precision on the calculation of temperature percentile indices

INTERNATIONAL JOURNAL OF CLIMATOLOGY, Issue 3 2009
Xuebin Zhang
Abstract Percentile-based temperature indices are part of the suite of indices developed by the WMO CCl/CLIVAR/JCOMM Expert Team on Climate Change Detection and Indices. They have been used to analyse changes in temperature extremes for various parts of the world. We identify a bias in percentile-based indices which consist of annual counts of threshold exceedance. This bias occurs when there is insufficient precision in temperature data, and affects the estimation of the means and trends of percentile-based indices. Such imprecision occurs when temperature observations are truncated or rounded prior to being recorded and archived. The impacts on the indices depend upon the type of relation (i.e. temperature greater than or greater than or equal to) used to determine the exceedance rate. This problem can be solved when the loss of precision is not overly severe by adding a small random number to artificially restore data precision. While these adjustments do not improve the accuracy of individual observations, the exceedance rates that are computed from data adjusted in this way have properties, such as long-term mean and trend, which are similar to those directly estimated from data that are originally of the same precision as the adjusted data. Copyright © 2008 Royal Meteorological Society [source]


An efficient multivariate approach for estimating preference when individual observations are dependent

JOURNAL OF ANIMAL ECOLOGY, Issue 5 2008
Steinar Engen
Summary 1We discuss aspects of resource selection based on observing a given vector of resource variables for different individuals at discrete time steps. A new technique for estimating preference of habitat characteristics, applicable when there are multiple individual observations, is proposed. 2We first show how to estimate preference on the population and individual level when only a single site- or resource component is observed. A variance component model based on normal scores in used to estimate mean preference for the population as well as the heterogeneity among individuals defined by the intra-class correlation. 3Next, a general technique is proposed for time series of observations of a vector with several components, correcting for the effect of correlations between these. The preference of each single component is analyzed under the assumption of arbitrarily complex selection of the other components. This approach is based on the theory for conditional distributions in the multi-normal model. 4The method is demonstrated using a data set of radio-tagged dispersing juvenile goshawks and their site characteristics, and can be used as a general tool in resource or habitat selection analysis. [source]


Estimating the digestibility of Sahelian roughages from faecal crude protein concentration of cattle and small ruminants

JOURNAL OF ANIMAL PHYSIOLOGY AND NUTRITION, Issue 9-10 2006
E. Schlecht
Summary Studies on diet selection and feed intake of ruminants in extensive grazing systems often require the use of simple approaches to determine the organic matter digestibility (OMD) of the ingested feed. Therefore, we evaluated the validity of the one-factorial exponential regression established by Lukas et al. [Journal of Animal Science 83 (2005) 1332], which estimates OMD from the faecal crude protein (FCP) concentration. The equation was applied to two sets of data obtained with free grazing and pen-fed cattle, sheep and goats ingesting low and high amounts of green and dry vegetation of Sahelian pastures as well as millet leaves and cowpea hay. Data analysis showed that the livestock species did not influence the precision of estimation of OMD from FCP. For the linear regression between measured and estimated OMD (%) across n = 431 individual observations, a regression coefficient of r2 = 0.65 and a residual standard deviation (RSD) of 5.87 were obtained. The precision of estimation was influenced by the data set (p = 0.033), the type of feed (p < 0.001) and the feeding level (p = 0.009), and interactions occurred between type of feed and feeding level (p = 0.021). Adjusting the intercept and the slope of the established exponential function to the present data resulted in a compression of the curve; while r2 remained unchanged, the RSD of the regression between measured and estimated OMD was reduced, when compared with the results obtained from the equation of Lukas et al. (2005). Estimating OMD from treatment means of FCP greatly improved the correlation between measured and estimated OMD for both the established function and the newly fit equation. However, if anti-nutritional dietary factors increase the concentration of faecal nitrogen from feed or endogenous origin, the approach might considerably overestimate diet digestibility. [source]


Non-parametric statistical methods for multivariate calibration model selection and comparison,

JOURNAL OF CHEMOMETRICS, Issue 12 2003
Edward V. Thomas
Abstract Model selection is an important issue when constructing multivariate calibration models using methods based on latent variables (e.g. partial least squares regression and principal component regression). It is important to select an appropriate number of latent variables to build an accurate and precise calibration model. Inclusion of too few latent variables can result in a model that is inaccurate over the complete space of interest. Inclusion of too many latent variables can result in a model that produces noisy predictions through incorporation of low-order latent variables that have little or no predictive value. Commonly used metrics for selecting the number of latent variables are based on the predicted error sum of squares (PRESS) obtained via cross-validation. In this paper a new approach for selecting the number of latent variables is proposed. In this new approach the prediction errors of individual observations (obtained from cross-validation) are compared across models incorporating varying numbers of latent variables. Based on these comparisons, non-parametric statistical methods are used to select the simplest model (least number of latent variables) that provides prediction quality that is indistinguishable from that provided by more complex models. Unlike methods based on PRESS, this new approach is robust to the effects of anomalous observations. More generally, the same approach can be used to compare the performance of any models that are applied to the same data set where reference values are available. The proposed methodology is illustrated with an industrial example involving the prediction of gasoline octane numbers from near-infrared spectra. Published in 2004 by John Wiley & Sons, Ltd. [source]


Bayesian measures of model complexity and fit

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 4 2002
David J. Spiegelhalter
Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ,hat' matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis. [source]


New pulsating white dwarfs in cataclysmic variables,

MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY: LETTERS (ELECTRONIC), Issue 1 2006
R. Nilsson
ABSTRACT The number of discovered non-radially pulsating white dwarfs (WDs) in cataclysmic variables (CVs) is increasing rapidly by the aid of the Sloan Digital Sky Survey (SDSS). We performed photometric observations of two additional objects, SDSS J133941.11+484727.5 (SDSS 1339), independently discovered as a pulsator by Gänsicke et al., and SDSS J151413.72+454911.9, which we identified as a CV/ZZ Ceti hybrid. In this Letter we present the results of the remote observations of these targets performed with the Nordic Optical Telescope (NOT) during the Nordic,Baltic Research School at Mol,tai Observatory, and follow-up observations executed by NOT in service mode. We also present three candidates we found to be non-pulsating. The results of our observations show that the main pulsation frequencies agree with those found in previous CV/ZZ Ceti hybrids, but specifically for SDSS 1339 the principal period differs slightly between individual observations and also from the recent independent observation by Gänsicke et al. Analysis of SDSS colour data for the small sample of pulsating and non-pulsating CV/ZZ Ceti hybrids found so far seems to indicate that the r,i colour could be a good marker for the instability strip of this class of pulsating WDs. [source]


On the exponentially weighted moving variance

NAVAL RESEARCH LOGISTICS: AN INTERNATIONAL JOURNAL, Issue 7 2009
Longcheen Huwang
Abstract MacGregor and Harris (J Quality Technol 25 (1993) 106,118) proposed the exponentially weighted mean squared deviation (EWMS) and the exponentially weighted moving variance (EWMV) charts as ways of monitoring process variability. These two charts are particularly useful for individual observations where no estimate of variability is available from replicates. However, the control charts derived by using the approximate distributions of the EWMS and EWMV statistics are difficult to interpret in terms of the average run length (ARL). Furthermore, both control charting schemes are biased procedures. In this article, we propose two new control charts by applying a normal approximation to the distributions of the logarithms of the weighted sum of chi squared random variables, which are respectively functions of the EWMS and EWMV statistics. These new control charts are easy to interpret in terms of the ARL. On the basis of the simulation studies, we demonstrate that the proposed charts are superior to the EWMS and EWMV charts and they both are nearly unbiased for the commonly used smoothing constants. We also compare the performance of the proposed charts with that of the change point (CP) CUSUM chart of Acosta-Mejia (1995). The design of the proposed control charts is discussed. An example is also given to illustrate the applicability of the proposed control charts. © 2009 Wiley Periodicals, Inc. Naval Research Logistics, 2009 [source]


Effectiveness of the training program for workers at construction sites of the high-speed railway line between Torino and Novara: Impact on injury rates

AMERICAN JOURNAL OF INDUSTRIAL MEDICINE, Issue 12 2009
A. Bena MD
Abstract Background There are very few published studies evaluating the impact of safety and health training on injury outcomes in the construction industry. The aim of this study was to assess the impact of the training program on injury rates at a major railway construction project. Methods The population consisted of 2,795 workers involved in a safety training program at the construction sites of the high-speed railway line Torino,Novara. Two types of analyses were carried out in order to assess the effectiveness of the training program in reducing the number of injuries: (i) a pre,post analysis, which took into account the fact that workers were enrolled at different times and the training intervention did not occur at the same time for all subjects; (ii) an interrupted time-series model, which corrected for the time trend and considered the autocorrelation between individual observations. Results Twenty-nine percent of workers who spent at least 1 day at the construction sites attended at least one training module. Pre,post analysis: At the end of the training program, the incidence of occupational injuries had fallen by 16% after the basic training module and by 25% following the specific modules. Time-series model: Training led to a 6% reduction in injury rates, which was not statistically significant. Conclusions The training program that was implemented had a moderately positive impact on the health of workers. Further studies are being conducted to obtain a more complete assessment of the actual effectiveness of the program in reducing the incidence of injuries. Am. J. Ind. Med. 52:965,972, 2009. © 2009 Wiley-Liss, Inc. [source]


The Atkinson Inequality Measure and its Sampling Properties: Bayesian and Classical Approaches

AUSTRALIAN ECONOMIC PAPERS, Issue 3 2004
Duangkamon Chotikapanich
This paper examines several Bayesian methods of obtaining posterior probability density functions of the Atkinson inequality measure and its associated social welfare function, in the context of grouped income distribution data. The methods are compared with asymptotic standard errors. The role of the number of income classes is investigated using a simulated distribution. If only a small number of groups is available in published data, there is a clear gain from generating the posterior probability density function when using an explicit income distribution assumption. Even with a small number of groups, the Bayesian approach gives results that are close to the sample values obtained using the corresponding individual observations. [source]