Home About us Contact

Variance Estimation (variance + estimation)

Distribution by Scientific Domains

Mathematics and Statistics	86%

Selected Abstracts

ERROR VARIANCE ESTIMATION FOR THE SINGLE-INDEX MODEL

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2010
K. B. Kulasekera
Summary Single-index models provide one way of reducing the dimension in regression analysis. The statistical literature has focused mainly on estimating the index coefficients, the mean function, and their asymptotic properties. For accurate statistical inference it is equally important to estimate the error variance of these models. We examine two estimators of the error variance in a single-index model and compare them with a few competing estimators with respect to their corresponding asymptotic properties. Using a simulation study, we evaluate the finite-sample performance of our estimators against their competitors. [source]

VARIANCE ESTIMATION IN TWO-PHASE SAMPLING

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2009
M.A. Hidiroglou
Summary Two-phase sampling is often used for estimating a population total or mean when the cost per unit of collecting auxiliary variables, x, is much smaller than the cost per unit of measuring a characteristic of interest, y. In the first phase, a large sample s1 is drawn according to a specific sampling design p(s1), and auxiliary data x are observed for the units i,s1. Given the first-phase sample s1, a second-phase sample s2 is selected from s1 according to a specified sampling design {p(s2,s1) }, and (y, x) is observed for the units i,s2. In some cases, the population totals of some components of x may also be known. Two-phase sampling is used for stratification at the second phase or both phases and for regression estimation. Horvitz,Thompson-type variance estimators are used for variance estimation. However, the Horvitz,Thompson (Horvitz & Thompson, J. Amer. Statist. Assoc. 1952) variance estimator in uni-phase sampling is known to be highly unstable and may take negative values when the units are selected with unequal probabilities. On the other hand, the Sen,Yates,Grundy variance estimator is relatively stable and non-negative for several unequal probability sampling designs with fixed sample sizes. In this paper, we extend the Sen,Yates,Grundy (Sen, J. Ind. Soc. Agric. Statist. 1953; Yates & Grundy, J. Roy. Statist. Soc. Ser. B 1953) variance estimator to two-phase sampling, assuming fixed first-phase sample size and fixed second-phase sample size given the first-phase sample. We apply the new variance estimators to two-phase sampling designs with stratification at the second phase or both phases. We also develop Sen,Yates,Grundy-type variance estimators of the two-phase regression estimators that make use of the first-phase auxiliary data and known population totals of some of the auxiliary variables. [source]

A Note on Variance Estimation of the Aalen,Johansen Estimator of the Cumulative Incidence Function in Competing Risks, with a View towards Left-Truncated Data

BIOMETRICAL JOURNAL, Issue 1 2010
Arthur Allignol
Abstract The Aalen,Johansen estimator is the standard nonparametric estimator of the cumulative incidence function in competing risks. Estimating its variance in small samples has attracted some interest recently, together with a critique of the usual martingale-based estimators. We show that the preferred estimator equals a Greenwood-type estimator that has been derived as a recursion formula using counting processes and martingales in a more general multistate framework. We also extend previous simulation studies on estimating the variance of the Aalen,Johansen estimator in small samples to left-truncated observation schemes, which may conveniently be handled within the counting processes framework. This investigation is motivated by a real data example on spontaneous abortion in pregnancies exposed to coumarin derivatives, where both competing risks and left-truncation have recently been shown to be crucial methodological issues (Meister and Schaefer (2008), Reproductive Toxicology26, 31,35). Multistate-type software and data are available online to perform the analyses. The Greenwood-type estimator is recommended for use in practice. [source]

Variance estimation for spatially balanced samples of environmental resources

ENVIRONMETRICS, Issue 6 2003
Don L. Stevens Jr
Abstract The spatial distribution of a natural resource is an important consideration in designing an efficient survey or monitoring program for the resource. We review a unified strategy for designing probability samples of discrete, finite resource populations, such as lakes within some geographical region; linear populations, such as a stream network in a drainage basin, and continuous, two-dimensional populations, such as forests. The strategy can be viewed as a generalization of spatial stratification. In this article, we develop a local neighborhood variance estimator based on that perspective, and examine its behavior via simulation. The simulations indicate that the local neighborhood estimator is unbiased and stable. The Horvitz,Thompson variance estimator based on assuming independent random sampling (IRS) may be two times the magnitude of the local neighborhood estimate. An example using data from a generalized random-tessellation stratified design on the Oahe Reservoir resulted in local variance estimates being 22 to 58 percent smaller than Horvitz,Thompson IRS variance estimates. Variables with stronger spatial patterns had greater reductions in variance, as expected. Copyright © 2003 John Wiley & Sons, Ltd. [source]

Variance estimation for two-phase stratified sampling

THE CANADIAN JOURNAL OF STATISTICS, Issue 4 2000
David A. Binder
Abstract The authors consider variance estimation for the generalized regression estimator in a two-phase context when the first-phase sample has been restratified using information gathered from the first-phase sample. Simple computational expressions for variance estimation are provided for the double expansion estimator and the reweighted expansion estimator of Kott & Stukel (1997). These estimators are compared using data from the Canadian Retail Commodity Survey. RÉSUMÉ Les auteurs s'intéressent à l'estimation de la variance de l'estimateur de régression généralisé pour un plan de sondage à deux phases dans le cas où l'échantillon de première phase a été stratifié à partir d'information auxiliaire disponible pour cette phase. Des expressions simples sont fournies pour l'estimation de la variance de l'estimateur doublement dilaté et de l'estimateur repondéré de Kott & Stukel (1997). Ces estimations sont companées au moyen de données provenant de l'Enqu,te canadienne sur les marchandises de détail [source]

Sampling and variance estimation on continuous domains

ENVIRONMETRICS, Issue 6 2006
Cynthia Cooper
Abstract This paper explores fundamental concepts of design- and model-based approaches to sampling and estimation for a response defined on a continuous domain. The paper discusses the concepts in design-based methods as applied in a continuous domain, the meaning of model-based sampling, and the interpretation of the design-based variance of a model-based estimate. A model-assisted variance estimator is examined for circumstances for which a direct design-based estimator may be inadequate or not available. The alternative model-assisted variance estimator is demonstrated in simulations on a realization of a response generated by a process with exponential covariance structure. The empirical results demonstrate that the model-assisted variance estimator is less biased and more efficient than Horvitz,Thompson and Yates,Grundy variance estimators applied to a continuous-domain response. Copyright © 2006 John Wiley & Sons, Ltd. [source]

Gene-dropping vs. empirical variance estimation for allele-sharing linkage statistics

GENETIC EPIDEMIOLOGY, Issue 8 2006
Jeesun Jung
Abstract In this study, we compare the statistical properties of a number of methods for estimating P -values for allele-sharing statistics in non-parametric linkage analysis. Some of the methods are based on the normality assumption, using different variance estimation methods, and others use simulation (gene-dropping) to find empirical distributions of the test statistics. For variance estimation methods, we consider the perfect variance approximation and two empirical variance estimates. The simulation-based methods are gene-dropping with and without conditioning on the observed founder alleles. We also consider the Kong and Cox linear and exponential models and a Monte Carlo method modified from a method for finding genome-wide significance levels. We discuss the analytical properties of these various P -value estimation methods and then present simulation results comparing them. Assuming that the sample sizes are large enough to justify a normality assumption for the linkage statistic, the best P -value estimation method depends to some extent on the (unknown) genetic model and on the types of pedigrees in the sample. If the sample sizes are not large enough to justify a normality assumption, then gene-dropping is the best choice. We discuss the differences between conditional and unconditional gene-dropping. Genet. Epidemiol. 2006. © 2006 Wiley-Liss, Inc. [source]

Using data augmentation to correct for non-ignorable non-response when surrogate data are available: an application to the distribution of hourly pay

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 3 2006
Gabriele B. Durrant
Summary., The paper develops a data augmentation method to estimate the distribution function of a variable, which is partially observed, under a non-ignorable missing data mechanism, and where surrogate data are available. An application to the estimation of hourly pay distributions using UK Labour Force Survey data provides the main motivation. In addition to considering a standard parametric data augmentation method, we consider the use of hot deck imputation methods as part of the data augmentation procedure to improve the robustness of the method. The method proposed is compared with standard methods that are based on an ignorable missing data mechanism, both in a simulation study and in the Labour Force Survey application. The focus is on reducing bias in point estimation, but variance estimation using multiple imputation is also considered briefly. [source]

Using Difference-Based Methods for Inference in Regression with Fractionally Integrated Processes

JOURNAL OF TIME SERIES ANALYSIS, Issue 6 2007
Wen-Jen Tsay
Abstract., This paper suggests a difference-based method for inference in the regression model involving fractionally integrated processes. Under suitable regularity conditions, our method can effectively deal with the inference problems associated with the regression model consisting of nonstationary, stationary and intermediate memory regressors, simultaneously. Although the difference-based method provides a very flexible modelling framework for empirical studies, the implementation of this method is extremely easy, because it completely avoids the difficult problems of choosing a kernel function, a bandwidth parameter, or an autoregressive lag length for the long-run variance estimation. The asymptotic local power of our method is investigated with a sequence of local data-generating processes (DGP) in what Davidson and MacKinnon [Canadian Journal of Economics. (1985) Vol. 18, pp. 38,57] call ,regression direction'. The simulation results indicate that the size control of our method is excellent even when the sample size is only 100, and the pattern of power performance is highly consistent with the theoretical finding from the asymptotic local power analysis conducted in this paper. [source]

Variance estimation for two-phase stratified sampling

Spatial averaging of ensemble-based background-error variances

THE QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, Issue 633 2008
Laure Raynaud
Abstract It is common to compute background-error variances from an ensemble of forecasts, in order to calculate either climatological or flow-dependent estimates. However, the finite size of the ensemble induces a sampling noise, which degrades the accuracy of the variance estimation. An idealized 1D framework is firstly considered, to show that the spatial structure of sampling noise is relatively small-scale, and is closely related to the background-error correlations. This motivates investigations on local spatial averaging, which is here applied to ensemble-based variance fields in this 1D context. It is shown that a spatial averaging, manually optimized, helps to significantly reduce the sampling noise. This provides estimates which are as accurate as those derived from a much bigger ensemble. The dependencies of this optimization on the error correlation length-scale and on the heterogeneity of the variance and length-scale fields are also illustrated. These results are next confirmed in a more realistic 2D problem, by considering the current operational version of the Arpège background-error covariance matrix. Finally, the possibility to objectively and automatically optimize the filtering is explored. The idea is to apply the usual linear estimation theory and to use signal/noise ratios in order to calculate an optimal filter. The efficiency of this objective filtering is illustrated in the idealized 1D framework. Copyright © 2008 Royal Meteorological Society [source]

VARIANCE ESTIMATION IN TWO-PHASE SAMPLING

A Point Estimator for the Time Course of Drug Release

BIOMETRICAL JOURNAL, Issue 1 2009
Stephan Koehne-Voss
Abstract Procedures for deconvolution of pharmacokinetic data are routinely used in the pharmaceutical industry to determine drug release and absorption which is essential in designing optimized drug formulations. Although these procedures are described extensively in the pharmacokinetic literature, they have been studied less from a statistical point of view and variance estimation has not been addressed. We discuss the statistical properties of a numerical procedure for deconvolution. Based on a point-area deconvolution method we define an estimator for the function that describes the time course of drug release from a drug formulation. Asymptotic distributions are derived and several methods of variance and interval estimation are compared (© 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]

Variable Selection for Semiparametric Mixed Models in Longitudinal Studies

BIOMETRICS, Issue 1 2010
Xiao Ni
Summary We propose a double-penalized likelihood approach for simultaneous model selection and estimation in semiparametric mixed models for longitudinal data. Two types of penalties are jointly imposed on the ordinary log-likelihood: the roughness penalty on the nonparametric baseline function and a nonconcave shrinkage penalty on linear coefficients to achieve model sparsity. Compared to existing estimation equation based approaches, our procedure provides valid inference for data with missing at random, and will be more efficient if the specified model is correct. Another advantage of the new procedure is its easy computation for both regression components and variance parameters. We show that the double-penalized problem can be conveniently reformulated into a linear mixed model framework, so that existing software can be directly used to implement our method. For the purpose of model inference, we derive both frequentist and Bayesian variance estimation for estimated parametric and nonparametric components. Simulation is used to evaluate and compare the performance of our method to the existing ones. We then apply the new method to a real data set from a lactation study. [source]

Growth curve models for stochastic modeling and analyzing of natural disinfection of wastewater

ENVIRONMETRICS, Issue 8 2006
Wolfgang Bischoff
Abstract This work is motivated by a study on the natural disinfection of wastewater in marine environment for ocean outfall systems without chlorination. In the study of the disinfection on wastewater in marine environment two natural factors, consisting of light intensity and salinity, one controllable factor, the volumetric mixing ratio of seawater to wastewater, and one random effect factor, the existence of predators, were investigated. Our problem and data are modeled by a growth curve model with an unknown random parameter that can be described by a mixed model with the factors mentioned above as covariates. For our model we determine the optimal variance estimations. Finally, we apply our model with these optimal estimated variance components to the data obtained from the real experiments. Copyright © 2006 John Wiley & Sons, Ltd. [source]