Unbiased Estimators (unbiased + estimator)

Distribution by Scientific Domains


Selected Abstracts


A cost analysis of ranked set sampling to estimate a population mean

ENVIRONMETRICS, Issue 3 2005
Rebecca A. Buchanan
Abstract Ranked set sampling (RSS) can be a useful environmental sampling method when measurement costs are high but ranking costs are low. RSS estimates of the population mean can have higher precision than estimates from a simple random sample (SRS) of the same size, leading to potentially lower sampling costs from RSS than from SRS for a given precision. However, RSS introduces ranking costs not present in SRS; these costs must be considered in determining whether RSS is cost effective. We use a simple cost model to determine the minimum ratio of measurement to ranking costs (cost ratio) necessary in order for RSS to be as cost effective as SRS for data from the normal, exponential, and lognormal distributions. We consider both equal and unequal RSS allocations and two types of estimators of the mean: the typical distribution-free (DF) estimator and the best linear unbiased estimator (BLUE). The minimum cost ratio necessary for RSS to be as cost effective as SRS depends on the underlying distribution of the data, as well as the allocation and type of estimator used. Most minimum necessary cost ratios are in the range of 1,6, and are lower for BLUEs than for DF estimators. The higher the prior knowledge of the distribution underlying the data, the lower the minimum necessary cost ratio and the more attractive RSS is over SRS. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Iterative generalized cross-validation for fusing heteroscedastic data of inverse ill-posed problems

GEOPHYSICAL JOURNAL INTERNATIONAL, Issue 1 2009
Peiliang Xu
SUMMARY The method of generalized cross-validation (GCV) has been widely used to determine the regularization parameter, because the criterion minimizes the average predicted residuals of measured data and depends solely on data. The data-driven advantage is valid only if the variance,covariance matrix of the data can be represented as the product of a given positive definite matrix and a scalar unknown noise variance. In practice, important geophysical inverse ill-posed problems have often been solved by combining different types of data. The stochastic model of measurements in this case contains a number of different unknown variance components. Although the weighting factors, or equivalently the variance components, have been shown to significantly affect joint inversion results of geophysical ill-posed problems, they have been either assumed to be known or empirically chosen. No solid statistical foundation is available yet to correctly determine the weighting factors of different types of data in joint geophysical inversion. We extend the GCV method to accommodate both the regularization parameter and the variance components. The extended version of GCV essentially consists of two steps, one to estimate the variance components by fixing the regularization parameter and the other to determine the regularization parameter by using the GCV method and by fixing the variance components. We simulate two examples: a purely mathematical integral equation of the first kind modified from the first example of Phillips (1962) and a typical geophysical example of downward continuation to recover the gravity anomalies on the surface of the Earth from satellite measurements. Based on the two simulated examples, we extensively compare the iterative GCV method with existing methods, which have shown that the method works well to correctly recover the unknown variance components and determine the regularization parameter. In other words, our method lets data speak for themselves, decide the correct weighting factors of different types of geophysical data, and determine the regularization parameter. In addition, we derive an unbiased estimator of the noise variance by correcting the biases of the regularized residuals. A simplified formula to save the time of computation is also given. The two new estimators of the noise variance are compared with six existing methods through numerical simulations. The simulation results have shown that the two new estimators perform as well as Wahba's estimator for highly ill-posed problems and outperform any existing methods for moderately ill-posed problems. [source]


On Estimation in M/G/c/c Queues

INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, Issue 6 2001
Mei Ling Huang
We derive the minimum variance unbiased estimator (MVUE) and the maximum likelihood estimator (MLE) of the stationary probability function (pf) of the number of customers in a collection of independent M/G/c/c subsystems. It is assumed that the offered load and number of servers in each subsystem are unknown. We assume that observations of the total number of customers in the system are utilized because it may be impractical or impossible to observe individual server occupancies. Both estimators depend on the R distribution (the distribution of the sum of independent right truncated Poisson random variables) and R numbers. [source]


A new ranked set sample estimator of variance

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 2 2002
Steven N. MacEachern
Summary. We develop an unbiased estimator of the variance of a population based on a ranked set sample. We show that this new estimator is better than estimating the variance based on a simple random sample and more efficient than the estimator based on a ranked set sample proposed by Stokes. Also, a test to determine the effectiveness of the judgment ordering process is proposed. [source]


Prediction Variance and Information Worth of Observations in Time Series

JOURNAL OF TIME SERIES ANALYSIS, Issue 4 2000
Mohsen Pourahmadi
The problem of developing measures of worth of observations in time series has not received much attention in the literature. Any meaningful measure of worth should naturally depend on the position of the observation as well as the objectives of the analysis, namely parameter estimation or prediction of future values. We introduce a measure that quantifies worth of a set of observations for the purpose of prediction of outcomes of stationary processes. The worth is measured as the change in the information content of the entire past due to exclusion or inclusion of a set of observations. The information content is quantified by the mutual information, which is the information theoretic measure of dependency. For Gaussian processes, the measure of worth turns out to be the relative change in the prediction error variance due to exclusion or inclusion of a set of observations. We provide formulae for computing predictive worth of a set of observations for Gaussian autoregressive moving-average processs. For non-Gaussian processes, however, a simple function of its entropy provides a lower bound for the variance of prediction error in the same manner that Fisher information provides a lower bound for the variance of an unbiased estimator via the Cramer-Rao inequality. Statistical estimation of this lower bound requires estimation of the entropy of a stationary time series. [source]


Observational biases in Lagrangian reconstructions of cosmic velocity fields

MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, Issue 4 2008
G. Lavaux
ABSTRACT Lagrangian reconstruction of large-scale peculiar velocity fields can be strongly affected by observational biases. We develop a thorough analysis of these systematic effects by relying on specially selected mock catalogues. For the purpose of this paper, we use the Monge,Ampère,Kantorovitch (MAK) reconstruction method, although any other Lagrangian reconstruction method should be sensitive to the same problems. We extensively study the uncertainty in the mass-to-light assignment due to incompleteness (missing luminous mass tracers), and the poorly determined relation between mass and luminosity. The impact of redshift distortion corrections is analysed in the context of MAK and we check the importance of edge and finite-volume effects on the reconstructed velocities. Using three mock catalogues with different average densities, we also study the effect of cosmic variance. In particular, one of them presents the same global features as found in observational catalogues that extend to 80 h,1 Mpc scales. We give recipes, checked using the aforementioned mock catalogues, to handle these particular observational effects, after having introduced them into the mock catalogues so as to quantitatively mimic the most densely sampled currently available galaxy catalogue of the nearby Universe. Once biases have been taken care of, the typical resulting error in reconstructed velocities is typically about a quarter of the overall velocity dispersion, and without significant bias. We finally model our reconstruction errors to propose an improved Bayesian approach to measure ,m in an unbiased way by comparing the reconstructed velocities to the measured ones in distance space, even though they may be plagued by large errors. We show that, in the context of observational data, it is possible to build a nearly unbiased estimator of ,m using MAK reconstruction. [source]


A NOTE ON SAMPLING DESIGNS FOR RANDOM PROCESSES WITH NO QUADRATIC MEAN DERIVATIVE

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 3 2006
Bhramar Mukherjee
Summary Several authors have previously discussed the problem of obtaining asymptotically optimal design sequences for estimating the path of a stochastic process using intricate analytical techniques. In this note, an alternative treatment is provided for obtaining asymptotically optimal sampling designs for estimating the path of a second order stochastic process with known covariance function. A simple estimator is proposed which is asymptotically equivalent to the full-fledged best linear unbiased estimator and the entire asymptotics are carried out through studying this estimator. The current approach lends an intuitive statistical perspective to the entire estimation problem. [source]


Ratio estimators in adaptive cluster sampling

ENVIRONMETRICS, Issue 6 2007
Arthur L. Dryver
Abstract In most surveys data are collected on many items rather than just the one variable of primary interest. Making the most use of the information collected is a issue of both practical and theoretical interest. Ratio estimates for the population mean or total are often more efficient. Unfortunately, ratio estimation is straightforward with simple random sampling, but this is often not the case when more complicated sampling designs are used, such as adaptive cluster sampling. A serious concern with ratio estimates introduced with many complicated designs is lack of independence, a necessary assumption. In this article, we propose two new ratio estimators under adaptive cluster sampling, one of which is unbiased for adaptive cluster sampling designs. The efficiencies of the new estimators to existing unbiased estimators, which do not utilize the auxiliary information, for adaptive cluster sampling and the conventional ratio estimation under simple random sampling without replacement are compared in this article. Related result shows the proposed estimators can be considered as a robust alternative of the conventional ratio estimator, especially when the correlation between the variable of interest and the auxiliary variable is not high enough for the conventional ratio estimator to have satisfactory performance. Copyright © 2007 John Wiley & Sons, Ltd. [source]


How do we tell which estimates of past climate change are correct?,

INTERNATIONAL JOURNAL OF CLIMATOLOGY, Issue 10 2009
Steven C. Sherwood
Abstract Estimates of past climate change often involve teasing small signals from imperfect instrumental or proxy records. Success is often evaluated on the basis of the spatial or temporal consistency of the resulting reconstruction, or on the apparent prediction error on small space and time scales. However, inherent methodological trade-offs illustrated here can cause climate signal accuracy to be unrelated, or even inversely related, to such performance measures. This is a form of the classic conflict in statistics between minimum variance and unbiased estimators. Comprehensive statistical simulations based on climate model output are probably the best way to reliably assess whether methods of reconstructing climate from sparse records, such as radiosondes or paleoclimate proxies, actually work on longer time scales. Copyright © 2008 Royal Meteorological Society [source]


Estimation Optimality of Corrected AIC and Modified Cp in Linear Regression

INTERNATIONAL STATISTICAL REVIEW, Issue 2 2006
Simon L. Davies
Summary Model selection criteria often arise by constructing unbiased or approximately unbiased estimators of measures known as expected overall discrepancies (Linhart & Zucchini, 1986, p. 19). Such measures quantify the disparity between the true model (i.e., the model which generated the observed data) and a fitted candidate model. For linear regression with normally distributed error terms, the "corrected" Akaike information criterion and the "modified" conceptual predictive statistic have been proposed as exactly unbiased estimators of their respective target discrepancies. We expand on previous work to additionally show that these criteria achieve minimum variance within the class of unbiased estimators. Résumé Les critères de modèle de sélection naissent souvent de la construction de mesures d'estimation impartiales, ou approximativement impartiales, connues comme divergences globales prévues. De telles mesures quantifient la disparité entre le vrai modèle (c'est-à-dire le modèle qui a produit les données observées) et un modèle candidat correspondant. En ce qui concerne les applications de régression linéaires contenant des erreurs distribuées normalement, le modèle de critère d'information "corrigé" Akaike et le modèle conceptuel de statistique de prévision "modifié" ont été proposés comme étant des instruments exacts de mesures d'estimation impartiales de leurs objectifs respectifs de divergences. En nous appuyant sur les travaux précédents et en les développant, nous proposons de démontrer, en outre, que ces critères réalisent une variance minimum au sein de la classe des instruments de mesures d'estimation impartiales. [source]


Improved unbiased estimators in adaptive cluster sampling

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 1 2005
Arthur L. Dryver
Summary., The usual design-unbiased estimators in adaptive cluster sampling are easy to compute but are not functions of the minimal sufficient statistic and hence can be improved. Improved unbiased estimators obtained by conditioning on sufficient statistics,not necessarily minimal,are described. First, estimators that are as easy to compute as the usual design-unbiased estimators are given. Estimators obtained by conditioning on the minimal sufficient statistic which are more difficult to compute are also discussed. Estimators are compared in examples. [source]


On Estimating Conditional Mean-Squared Prediction Error in Autoregressive Models

JOURNAL OF TIME SERIES ANALYSIS, Issue 4 2003
CHING-KANG ING
Abstract. Zhang and Shaman considered the problem of estimating the conditional mean-squared prediciton error (CMSPE) for a Gaussian autoregressive (AR) process. They used the final prediction error (FPE) of Akaike to estimate CMSPE and proposed that FPE's effectiveness be judged by its asymptotic correlation with CMSPE. However, as pointed out by Kabaila and He, the derivation of this correlation by Zhang and Shaman is incomplete, and the performance of FPE in estimating CMSPE is also poor in Kabaila and He's simulation study. Kabaila and He further proposed an alternative estimator of CMSPE, V, in the stationary AR(1) model. They reported that V has a larger normalized correlation with CMSPE through Monte Carlo simulation results. In this paper, we propose a generalization of V, V,, in the higher-order AR model, and obtain the asymptotic correlation of FPE and V, with CMSPE. We show that the limit of the normalized correlation of V, with CMSPE is larger than that of FPE with CMSPE, and hence Kabaila and He's finding is justified theoretically. In addition, the performances of the above estimators of CMSPE are re-examined in terms of mean-squared errors (MSE). Our main conclusion is that from the MSE point of view, V, is the best choice among a family of asymptotically unbiased estimators of CMSPE including FPE and V, as its special cases. [source]


Youden Index and Optimal Cut-Point Estimated from Observations Affected by a Lower Limit of Detection

BIOMETRICAL JOURNAL, Issue 3 2008
Marcus D. Ruopp
Abstract The receiver operating characteristic (ROC) curve is used to evaluate a biomarker's ability for classifying disease status. The Youden Index (J), the maximum potential effectiveness of a biomarker, is a common summary measure of the ROC curve. In biomarker development, levels may be unquantifiable below a limit of detection (LOD) and missing from the overall dataset. Disregarding these observations may negatively bias the ROC curve and thus J. Several correction methods have been suggested for mean estimation and testing; however, little has been written about the ROC curve or its summary measures. We adapt non-parametric (empirical) and semi-parametric (ROC-GLM [generalized linear model]) methods and propose parametric methods (maximum likelihood (ML)) to estimate J and the optimal cut-point (c *) for a biomarker affected by a LOD. We develop unbiased estimators of J and c * via ML for normally and gamma distributed biomarkers. Alpha level confidence intervals are proposed using delta and bootstrap methods for the ML, semi-parametric, and non-parametric approaches respectively. Simulation studies are conducted over a range of distributional scenarios and sample sizes evaluating estimators' bias, root-mean square error, and coverage probability; the average bias was less than one percent for ML and GLM methods across scenarios and decreases with increased sample size. An example using polychlorinated biphenyl levels to classify women with and without endometriosis illustrates the potential benefits of these methods. We address the limitations and usefulness of each method in order to give researchers guidance in constructing appropriate estimates of biomarkers' true discriminating capabilities. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


Parameter Estimation for Partially Complete Time and Type of Failure Data

BIOMETRICAL JOURNAL, Issue 2 2004
Debasis Kundu
Abstract The theory of competing risks has been developed to asses a specific risk in presence of other risk factors. In this paper we consider the parametric estimation of different failure modes under partially complete time and type of failure data using latent failure times and cause specific hazard functions models. Uniformly minimum variance unbiased estimators and maximum likelihood estimators are obtained when latent failure times and cause specific hazard functions are exponentially distributed. We also consider the case when they follow Weibull distributions. One data set is used to illustrate the proposed techniques. (© 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]


Genetic estimates of contemporary effective population size in an endangered butterfly indicate a possible role for genetic compensation

EVOLUTIONARY APPLICATIONS (ELECTRONIC), Issue 1 2010
Emily V. Saarinen
Abstract The effective population size (Ne) is a critical evolutionary and conservation parameter that can indicate the adaptive potential of populations. Robust estimates of Ne of endangered taxa have been previously hampered by estimators that are sensitive to sample size. We estimated Ne on two remaining populations of the endangered Miami blue butterfly, a formerly widespread taxon in Florida. Our goal was to determine the consistency of various temporal and point estimators on inferring Ne and to determine the utility of this information for understanding the role of genetic stochasticity. We found that recently developed ,unbiased estimators' generally performed better than some older methods in that the former had more realistic Ne estimates and were more consistent with what is known about adult population size. Overall, Ne/N ratios based on census point counts were high. We suggest that this pattern may reflect genetic compensation caused by reduced reproductive variance due to breeding population size not being limited by resources. Assuming Ne and N are not heavily biased, it appears that the lack of gene flow between distant populations may be a greater genetic threat in the short term than the loss of heterozygosity due to inbreeding. [source]