Distribution by Scientific Domains
Distribution within Mathematics and Statistics

Kinds of Estimator

  • alternative estimator
  • bayesian estimator
  • chao estimator
  • consistent estimator
  • corresponding estimator
  • density estimator
  • different estimator
  • efficient estimator
  • error estimator
  • kaplan-meier estimator
  • kernel density estimator
  • kernel estimator
  • least square estimator
  • least-square estimator
  • likelihood estimator
  • linear estimator
  • maximum likelihood estimator
  • moment estimator
  • new estimator
  • nonparametric estimator
  • other estimator
  • parameter estimator
  • petersen estimator
  • posteriori error estimator
  • proposed estimator
  • quasi-maximum likelihood estimator
  • ratio estimator
  • regression estimator
  • relatedness estimator
  • resulting estimator
  • robust estimator
  • semiparametric estimator
  • shrinkage estimator
  • simple estimator
  • speed estimator
  • square estimator
  • state estimator
  • unbiased estimator
  • variance estimator
  • volatility estimator
  • zhu error estimator

  • Selected Abstracts


    EVOLUTION, Issue 10 2003
    Michael J. Hickerson
    Abstract We present moments and likelihood methods that estimate a DNA substitution rate from a group of closely related sister species pairs separated at an assumed time, and we test these methods with simulations. The methods also estimate ancestral population size and can test whether there is a significant difference among the ancestral population sizes of the sister species pairs. Estimates presented in the literature often ignore the ancestral coalescent prior to speciation and therefore should be biased upward. The simulations show that both methods yield accurate estimates given sample sizes of five or more species pairs and that better likelihood estimates are obtained if there is no significant difference among ancestral population sizes. The model presented here indicates that the larger than expected variation found in multitaxa datasets can be explained by variation in the ancestral coalescence and the Poisson mutation process. In this context, observed variation can often be accounted for by variation in ancestral population sizes rather than invoking variation in other parameters, such as divergence time or mutation rate. The methods are applied to data from two groups of species pairs (sea urchins and Alpheus snapping shrimp) that are thought to have separated by the rise of Panama three million years ago. [source]

    A Statistical Estimator of the Spatial Distribution of the Water-Table Altitude

    GROUND WATER, Issue 1 2003
    Nicasio Sepúlveda
    An algorithm was designed to statistically estimate the areal distribution of water-table altitude. The altitude of the water table was bounded below by the minimum water-table surface and above by the land surface. Using lake elevations and stream stages, and interpolating between lakes and streams, the minimum water-table surface was generated. A multiple linear regression among the minimum water-table altitude, the difference between land-surface and minimum water-table altitudes, and the water-level measurements from surficial aquifer system wells resulted in a consistently high correlation for all groups of physiographic regions in Florida. A simple linear regression between land-surface and water-level measurements resulted in a root-mean-square residual of 4.23 m, with residuals ranging from , 8.78 to 41.54 m. A simple linear regression between the minimum water table and the water-level measurements resulted in a root-mean-square residual of 1.45 m, with residuals ranging from ,7.39 to 4.10 m. The application of the multiple linear regression presented herein resulted in a root-mean-square residual of 1.05 m, with residuals ranging from , 5.24 to 5.63 m. Results from complete and partial F tests rejected the hypothesis of eliminating any of the regressors in the multiple linear regression presented in this study. [source]

    The impact of detection and treatment on lifetime medical costs for patients with precancerous polyps and colorectal cancer

    HEALTH ECONOMICS, Issue 12 2009
    David H. Howard
    Abstract Understanding the costs associated with early detection of disease is important for determining the fiscal implications of government-funded screening programs. We estimate the lifetime medical costs for patients with screen-detected versus undetected polyps and early-stage colorectal cancer. Typically, cost,effectiveness studies of screening account only for the direct costs of screening and cancer care. Our estimates include costs for unrelated conditions. We applied the Kaplan,Meier Smoothing Estimator to estimate lifetime costs for beneficiaries with screen-detected polyps and cancer. Phase-specific costs and survival probabilities were calculated from the Surveillance, Epidemiology, and End Results-Medicare database for Medicare beneficiaries aged ,65. We estimate costs from the point of detection onward; therefore, our results do not include the costs associated with screening. We used a modified version of the model to estimate what lifetime costs for these patients would have been if the polyps or cancer remained undetected, based on assumptions about the ,lead time' for polyps and early-stage cancer. For younger patients, polyp removal is cost saving. Treatment of early-stage cancer is cost increasing. Copyright © 2009 John Wiley & Sons, Ltd. [source]

    Sensitivity analysis on stochastic equilibrium transportation networks using genetic algorithm

    Halim Ceylan
    Abstract This study deals with the sensitivity analysis of an equilibrium transportation networks using genetic algorithm approach and uses the bi-level iterative sensitivity algorithm. Therefore, integrated Genetic Algorithm-TRANSYT and Path Flow Estimator (GATPFE) is developed for signalized road networks for various level of perceived travel time in order to test the sensitivity of perceived travel time error in an urban stochastic road networks. Level of information provided to drivers correspondingly affects the signal timing parameters and hence the Stochastic User Equilibrium (SUE) link flows. When the information on road system is increased, the road users try to avoid conflicting links. Therefore, the stochastic equilibrium assignment concept tends to be user equilibrium. The GATPFE is used to solve the bi-level problem, where the Area Traffic Control (ATC) is the upper-level and the SUE assignment is the lower-level. The GATPFE is tested for six-junction network taken from literature. The results show that the integrated GATPFE can be applied to carry out sensitivity analysis at the equilibrium network design problems for various level of information and it simultaneously optimize the signal timings (i.e. network common cycle time, signal stage and offsets between junctions). [source]

    Peak morphological diversity in an ecotone unveiled in the chukar partridge by a novel Estimator in a Dependent Sample (EDS)

    Salit Kark
    Summary 1Areas of environmental transition (i.e. ecotones) have recently been shown to play an important role in the maintenance of genetic diversity, divergence and in speciation processes. We test the hypothesis that ecotone populations maintain high phenotypic diversity compared to other populations across the distribution range. 2Focusing on the chukar partridge (Alectoris chukar Gray), we study trends in morphological diversity across a steep ecotone within the species native range in Israel and Sinai. Using 35 traits and 23 ratios between traits, we apply a novel weighted average statistic that we term ,Estimator in a Dependent Sample' (EDS). This estimator enables us to compare levels of diversity across populations using multiple-correlated traits and is especially useful when sample sizes are small. 3We provide a program for calculating the EDS and a bootstrapping procedure to describe its confidence interval and standard deviation. This estimator can be applied widely in a range of studies using multiple-correlated traits in evolutionary biology, ecology, morphology, behaviour, palaeontology, developmental biology and genetics. 4Our results indicate that within-population diversity peaks in chukar populations located in the Mediterranean-desert ecotone in Israel. However, had we not included the ecotone region in our study, we would have drawn different conclusions regarding patterns of morphological diversity across the range. We suggest that ecotones should be given higher priority in future research and conservation planning, potentially serving as within-species diversity hotspots. [source]

    Applying the Liu-Agresti Estimator of the Cumulative Common Odds Ratio to DIF Detection in Polytomous Items

    Randall D. Penfield
    Liu and Agresti (1996) proposed a Mantel and Haenszel-type (1959) estimator of a common odds ratio for several 2 × J tables, where the J columns are ordinal levels of a response variable. This article applies the Liu-Agresti estimator to the case of assessing differential item functioning (DIF) in items having an ordinal response variable. A simulation study was conducted to investigate the accuracy of the Liu-Agresti estimator in relation to other statistical DIF detection procedures. The results of the simulation study indicate that the Liu-Agresti estimator is a viable alternative to other DIF detection statistics. [source]

    Delaunay Tessellation Field Estimator analysis of the PSCz local Universe: density field and cosmic flow

    Emilio Romano-Díaz
    ABSTRACT We apply the Delaunay Tessellation Field Estimator (DTFE) to reconstruct and analyse the matter distribution and cosmic velocity flows in the local Universe on the basis of the PSCz galaxy survey. The prime objective of this study is the production of optimal resolution 3D maps of the volume-weighted velocity and density fields throughout the nearby universe, the basis for a detailed study of the structure and dynamics of the cosmic web at each level probed by underlying galaxy sample. Fully volume-covering 3D maps of the density and (volume-weighted) velocity fields in the cosmic vicinity, out to a distance of 150 h,1 Mpc, are presented. Based on the Voronoi and Delaunay tessellation defined by the spatial galaxy sample, DTFE involves the estimate of density values on the basis of the volume of the related Delaunay tetrahedra and the subsequent use of the Delaunay tessellation as natural multidimensional (linear) interpolation grid for the corresponding density and velocity fields throughout the sample volume. The linearized model of the spatial galaxy distribution and the corresponding peculiar velocities of the PSCz galaxy sample, produced by Branchini et al., forms the input sample for the DTFE study. The DTFE maps reproduce the high-density supercluster regions in optimal detail, both their internal structure as well as their elongated or flattened shape. The corresponding velocity flows trace the bulk and shear flows marking the region extending from the Pisces,Perseus supercluster, via the Local Superclusters, towards the Hydra,Centaurus and the Shapley concentration. The most outstanding and unique feature of the DTFE maps is the sharply defined radial outflow regions in and around underdense voids, marking the dynamical importance of voids in the local Universe. The maximum expansion rate of voids defines a sharp cut-off in the DTFE velocity divergence probability distribution function. We found that on the basis of this cut-off DTFE manages to consistently reproduce the value of ,m, 0.35 underlying the linearized velocity data set. [source]

    The Bias of the RSR Estimator and the Accuracy of Some Alternatives

    William N. Goetzmann
    This paper analyzes the implications of cross-sectional heteroskedasticity in the repeat sales regression (RSR). RSR estimators are essentially geometric averages of individual asset returns because of the logarithmic transformation of price relatives. We show that the cross-sectional variance of asset returns affects the magnitude of the bias in the average return estimate for each period, while reducing the bias for the surrounding periods. It is not easy to use an approximation method to correct the bias problem. We suggest an unbiased maximum likelihood alternative to the RSR that directly estimates index returns, which we term MLRSR. The unbiased MLRSR estimators are analogous to the RSR estimators but are arithmetic averages of individual asset returns. Simulations show that these estimators are robust to time-varying cross-sectional variance and that the MLRSR may be more accurate than RSR and some alternative methods. [source]

    Heterogeneity in dynamic discrete choice models

    Martin Browning
    Summary, We consider dynamic discrete choice models with heterogeneity in both the levels parameter and the state dependence parameter. We first present an empirical analysis that motivates the theoretical analysis which follows. The theoretical analysis considers a simple two-state, first-order Markov chain model without covariates in which both transition probabilities are heterogeneous. Using such a model we are able to derive exact small sample results for bias and mean squared error (MSE). We discuss the maximum likelihood approach and derive two novel estimators. The first is a bias corrected version of the Maximum Likelihood Estimator (MLE) although the second, which we term MIMSE, minimizes the integrated mean square error. The MIMSE estimator is always well defined, has a closed-form expression and inherits the desirable large sample properties of the MLE. Our main finding is that in almost all short panel contexts the MIMSE significantly outperforms the other two estimators in terms of MSE. A final section extends the MIMSE estimator to allow for exogenous covariates. [source]

    A Note on Variance Estimation of the Aalen,Johansen Estimator of the Cumulative Incidence Function in Competing Risks, with a View towards Left-Truncated Data

    Arthur Allignol
    Abstract The Aalen,Johansen estimator is the standard nonparametric estimator of the cumulative incidence function in competing risks. Estimating its variance in small samples has attracted some interest recently, together with a critique of the usual martingale-based estimators. We show that the preferred estimator equals a Greenwood-type estimator that has been derived as a recursion formula using counting processes and martingales in a more general multistate framework. We also extend previous simulation studies on estimating the variance of the Aalen,Johansen estimator in small samples to left-truncated observation schemes, which may conveniently be handled within the counting processes framework. This investigation is motivated by a real data example on spontaneous abortion in pregnancies exposed to coumarin derivatives, where both competing risks and left-truncation have recently been shown to be crucial methodological issues (Meister and Schaefer (2008), Reproductive Toxicology26, 31,35). Multistate-type software and data are available online to perform the analyses. The Greenwood-type estimator is recommended for use in practice. [source]

    A Point Estimator for the Time Course of Drug Release

    Stephan Koehne-Voss
    Abstract Procedures for deconvolution of pharmacokinetic data are routinely used in the pharmaceutical industry to determine drug release and absorption which is essential in designing optimized drug formulations. Although these procedures are described extensively in the pharmacokinetic literature, they have been studied less from a statistical point of view and variance estimation has not been addressed. We discuss the statistical properties of a numerical procedure for deconvolution. Based on a point-area deconvolution method we define an estimator for the function that describes the time course of drug release from a drug formulation. Asymptotic distributions are derived and several methods of variance and interval estimation are compared (© 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]

    An ,Unconditional-like' Structure for the Conditional Estimator of Odds Ratio from 2 × 2 Tables

    James A. Hanley
    Abstract In the estimation of the odds ratio (OR ), the conditional maximum-likelihood estimate (cMLE ) is preferred to the more readily computed unconditional one (uMLE ). However, the exact cMLE does not have a closed form to help divine it from the uMLE or to understand in what circumstances the difference between the two is appreciable. Here, the cMLE is shown to have the same ,ratio of cross-products' structure as its unconditional counterpart, but with two of the cell frequencies augmented, so as to shrink the unconditional estimator towards unity. The augmentation involves a factor, similar to the finite population correction, derived from the minimum of the marginal totals. (© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]

    A Semiparametric Estimator of the Crossing Point in the Two-Sample Linear Shift Function: Application to Crossing Lifetime Distributions

    Chi-tsung Wu
    Abstract Let X and Y be two random variables with continuous distribution functions F and G. Consider two independent observations X1, , , Xm from F and Y1, , , Yn from G. Moreover, suppose there exists a unique x* such that F(x) > G(x) for x < x* and F(x) < G(x) for x > x* or vice versa. A semiparametric model with a linear shift function (Doksum, 1974) that is equivalent to a location-scale model (Hsieh, 1995) will be assumed and an empirical process approach (Hsieh, 1995) is used to estimate the parameters of the shift function. Then, the estimated shift function is set to zero, and the solution is defined to be an estimate of the crossing-point x*. An approximate confidence band of the linear shift function at the crossing-point x* is also presented, which is inverted to yield an approximate confidence interval for the crossing-point. Finally, the lifetime of guinea pigs in days observed in a treatment-control experiment in Bjerkedal (1960) is used to demonstrate our procedure for estimating the crossing-point. (© 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]

    A Generalized Estimator of the Attributable Benefit of an Optimal Treatment Regime

    BIOMETRICS, Issue 2 2010
    Jason Brinkley
    Summary For many diseases where there are several treatment options often there is no consensus on the best treatment to give individual patients. In such cases, it may be necessary to define a strategy for treatment assignment; that is, an algorithm that dictates the treatment an individual should receive based on their measured characteristics. Such a strategy or algorithm is also referred to as a treatment regime. The optimal treatment regime is the strategy that would provide the most public health benefit by minimizing as many poor outcomes as possible. Using a measure that is a generalization of attributable risk (AR) and notions of potential outcomes, we derive an estimator for the proportion of events that could have been prevented had the optimal treatment regime been implemented. Traditional AR studies look at the added risk that can be attributed to exposure of some contaminant; here we will instead study the benefit that can be attributed to using the optimal treatment strategy. We will show how regression models can be used to estimate the optimal treatment strategy and the attributable benefit of that strategy. We also derive the large sample properties of this estimator. As a motivating example, we will apply our methods to an observational study of 3856 patients treated at the Duke University Medical Center with prior coronary artery bypass graft surgery and further heart-related problems requiring a catheterization. The patients may be treated with either medical therapy alone or a combination of medical therapy and percutaneous coronary intervention without a general consensus on which is the best treatment for individual patients. [source]

    Exploiting Gene-Environment Independence for Analysis of Case,Control Studies: An Empirical Bayes-Type Shrinkage Estimator to Trade-Off between Bias and Efficiency

    BIOMETRICS, Issue 3 2008
    Bhramar Mukherjee
    Summary Standard prospective logistic regression analysis of case,control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, under the assumption of gene-environment independence, modern "retrospective" methods, including the "case-only" approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel empirical Bayes-type shrinkage estimator to analyze case,control data that can relax the gene-environment independence assumption in a data-adaptive fashion. In the special case, involving a binary gene and a binary exposure, the method leads to an estimator of the interaction log odds ratio parameter in a simple closed form that corresponds to an weighted average of the standard case-only and case,control estimators. We also describe a general approach for deriving the new shrinkage estimator and its variance within the retrospective maximum-likelihood framework developed by Chatterjee and Carroll (2005, Biometrika92, 399,418). Both simulated and real data examples suggest that the proposed estimator strikes a balance between bias and efficiency depending on the true nature of the gene-environment association and the sample size for a given study. [source]

    Applications and Extensions of Chao's Moment Estimator for the Size of a Closed Population

    BIOMETRICS, Issue 4 2007
    Louis-Paul Rivest
    Summary This article revisits Chao's (1989, Biometrics45, 427,438) lower bound estimator for the size of a closed population in a mark,recapture experiment where the capture probabilities vary between animals (model Mh). First, an extension of the lower bound to models featuring a time effect and heterogeneity in capture probabilities (Mth) is proposed. The biases of these lower bounds are shown to be a function of the heterogeneity parameter for several loglinear models for Mth. Small-sample bias reduction techniques for Chao's lower bound estimator are also derived. The application of the loglinear model underlying Chao's estimator when heterogeneity has been detected in the primary periods of a robust design is then investigated. A test for the null hypothesis that Chao's loglinear model provides unbiased abundance estimators is provided. The strategy of systematically using Chao's loglinear model in the primary periods of a robust design where heterogeneity has been detected is investigated in a Monte Carlo experiment. Its impact on the estimation of the population sizes and of the survival rates is evaluated in a Monte Carlo experiment. [source]

    Weighted Normality-Based Estimator in Correcting Correlation Coefficient Estimation Between Incomplete Nutrient Measurements

    BIOMETRICS, Issue 1 2000
    C. Y. Wang
    Summary. Consider the problem of estimating the correlation between two nutrient measurements, such as the percent energy from fat obtained from a food frequency questionnaire (FFQ) and that from repeated food records or 24-hour recalls. Under a classical additive model for repeated food records, it is known that there is an attenuation effect on the correlation estimation if the sample average of repeated food records for each subject is used to estimate the underlying long-term average. This paper considers the case in which the selection probability of a subject for participation in the calibration study, in which repeated food records are measured, depends on the corresponding FFQ value, and the repeated longitudinal measurement errors have an autoregressive structure. This paper investigates a normality-based estimator and compares it with a simple method of moments. Both methods are consistent if the first two moments of nutrient measurements exist. Furthermore, joint estimating equations are applied to estimate the correlation coefficient and related nuisance parameters simultaneously. This approach provides a simple sandwich formula for the covariance estimation of the estimator. Finite sample performance is examined via a simulation study, and the proposed weighted normality-based estimator performs well under various distributional assumptions. The methods are applied to real data from a dietary assessment study. [source]

    Cross Section and Panel Data Estimators for Nonseparable Models with Endogenous Regressors

    ECONOMETRICA, Issue 4 2005
    Joseph G. Altonji
    We propose two new methods for estimating models with nonseparable errors and endogenous regressors. The first method estimates a local average response. One estimates the response of the conditional mean of the dependent variable to a change in the explanatory variable while conditioning on an external variable and then undoes the conditioning. The second method estimates the nonseparable function and the joint distribution of the observable and unobservable explanatory variables. An external variable is used to impose an equality restriction, at two points of support, on the conditional distribution of the unobservable random term given the regressor and the external variable. Our methods apply to cross sections, but our lead examples involve panel data cases in which the choice of the external variable is guided by the assumption that the distribution of the unobservable variables is exchangeable in the values of the endogenous variable for members of a group. [source]

    Maximum likelihood estimators of population parameters from doubly left-censored samples

    ENVIRONMETRICS, Issue 8 2006
    Abou El-Makarim A. Aboueissa
    Abstract Left-censored data often arise in environmental contexts with one or more detection limits, DLs. Estimators of the parameters are derived for left-censored data having two detection limits: DL1 and DL2 assuming an underlying normal distribution. Two different approaches for calculating the maximum likelihood estimates (MLE) are given and examined. These methods also apply to lognormally distributed environmental data with two distinct detection limits. The performance of the new estimators is compared utilizing many simulated data sets. Examples are given illustrating the use of these methods utilizing a computer program given in the Appendix. Copyright © 2006 John Wiley & Sons, Ltd. [source]

    Radar-guided interpolation of climatological precipitation data

    Arthur T. DeGaetano
    Abstract A refined approach for interpolating daily precipitation accumulations is presented, which combines radar-based information to characterize the spatial distribution and gross accumulation of precipitation with observed daily rain-gauge data to adjust for spatially varying errors in the radar estimates. Considering the rain gauge observations to be true values at each measurement location, daily radar errors are calculated at these points. These errors are then interpolated back to the radar grid, providing a spatially varying daily adjustment that can be applied across the radar domain. In contrast to similar techniques that are employed at hourly intervals to adjust radar-rainfall estimates operationally, this refined approach is intended to provide high-spatial-resolution precipitation data for climatological purposes, such as drought and environmental monitoring, retrospective impact analyses, and (when time series of sufficient length become available) assessment of temporal precipitation variations at high-spatial-resolution. Compared to the Multisensor Precipitation Estimators (MPEs) used operationally, the refined method yields lower cross-validated interpolation errors regardless of season or daily precipitation amount. Comparisons between cross-validated radar estimates aggregated to monthly totals with operational (non-cross-validated) Parameter-elevation Regressions on Independent Slopes Model (PRISM) precipitation estimates are also favourable. The new method provides a radar-based alternative to similar climatologies based on the spatial interpolation of gauge data alone (e.g. PRISM). Copyright © 2008 Royal Meteorological Society [source]

    Basic ingredients of free energy calculations: A review

    Clara D. Christ
    Abstract Methods to compute free energy differences between different states of a molecular system are reviewed with the aim of identifying their basic ingredients and their utility when applied in practice to biomolecular systems. A free energy calculation is comprised of three basic components: (i) a suitable model or Hamiltonian, (ii) a sampling protocol with which one can generate a representative ensemble of molecular configurations, and (iii) an estimator of the free energy difference itself. Alternative sampling protocols can be distinguished according to whether one or more states are to be sampled. In cases where only a single state is considered, six alternative techniques could be distinguished: (i) changing the dynamics, (ii) deforming the energy surface, (iii) extending the dimensionality, (iv) perturbing the forces, (v) reducing the number of degrees of freedom, and (vi) multi-copy approaches. In cases where multiple states are to be sampled, the three primary techniques are staging, importance sampling, and adiabatic decoupling. Estimators of the free energy can be classified as global methods that either count the number of times a given state is sampled or use energy differences. Or, they can be classified as local methods that either make use of the force or are based on transition probabilities. Finally, this overview of the available techniques and how they can be best used in a practical context is aimed at helping the reader choose the most appropriate combination of approaches for the biomolecular system, Hamiltonian and free energy difference of interest. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2010 [source]

    Improved unbiased estimators in adaptive cluster sampling

    Arthur L. Dryver
    Summary., The usual design-unbiased estimators in adaptive cluster sampling are easy to compute but are not functions of the minimal sufficient statistic and hence can be improved. Improved unbiased estimators obtained by conditioning on sufficient statistics,not necessarily minimal,are described. First, estimators that are as easy to compute as the usual design-unbiased estimators are given. Estimators obtained by conditioning on the minimal sufficient statistic which are more difficult to compute are also discussed. Estimators are compared in examples. [source]

    Asymptotic bias in the linear mixed effects model under non-ignorable missing data mechanisms

    Chandan Saha
    Summary., In longitudinal studies, missingness of data is often an unavoidable problem. Estimators from the linear mixed effects model assume that missing data are missing at random. However, estimators are biased when this assumption is not met. In the paper, theoretical results for the asymptotic bias are established under non-ignorable drop-out, drop-in and other missing data patterns. The asymptotic bias is large when the drop-out subjects have only one or no observation, especially for slope-related parameters of the linear mixed effects model. In the drop-in case, intercept-related parameter estimators show substantial asymptotic bias when subjects enter late in the study. Eight other missing data patterns are considered and these produce asymptotic biases of a variety of magnitudes. [source]

    Receiver operating characteristic surfaces in the presence of verification bias

    Yueh-Yun Chi
    Summary., In diagnostic medicine, the receiver operating characteristic (ROC) surface is one of the established tools for assessing the accuracy of a diagnostic test in discriminating three disease states, and the volume under the ROC surface has served as a summary index for diagnostic accuracy. In practice, the selection for definitive disease examination may be based on initial test measurements and induces verification bias in the assessment. We propose a non-parametric likelihood-based approach to construct the empirical ROC surface in the presence of differential verification, and to estimate the volume under the ROC surface. Estimators of the standard deviation are derived by both the Fisher information and the jackknife method, and their relative accuracy is evaluated in an extensive simulation study. The methodology is further extended to incorporate discrete baseline covariates in the selection process, and to compare the accuracy of a pair of diagnostic tests. We apply the proposed method to compare the diagnostic accuracy between mini-mental state examination and clinical evaluation of dementia, in discriminating between three disease states of Alzheimer's disease. [source]

    Computer Algebra Derivation of the Bias of Linear Estimators of Autoregressive Models

    Y. Zhang
    Abstract., A symbolic method which can be used to obtain the asymptotic bias and variance coefficients to order O(1/n) for estimators in stationary time series is discussed. Using this method, the large-sample bias of the Burg estimator in the AR(p) for p = 1, 2, 3 is shown to be equal to that of the least squares estimators in both the known and unknown mean cases. Previous researchers have only been able to obtain simulation results for the Burg estimator's bias because this problem is too intractable without using computer algebra. The asymptotic bias coefficient to O(1/n) of Yule,Walker as well as least squares estimates is also derived in AR(3) models. Our asymptotic results show that for the AR(3), just as in the AR(2), the Yule,Walker estimates have a large bias when the parameters are near the nonstationary boundary. The least squares and Burg estimates are much better in this situation. Simulation results confirm our findings. [source]

    MSM Estimators of European Options on Assets with Jumps

    João Amaro de Matos
    This paper shows that, under some regularity conditions, the method of simulated moments estimator of European option pricing models developed by Bossaerts and Hillion (1993) can be extended to the case where the prices of the underlying asset follow Lévy processes, which allow for jumps, with no losses on their asymptotic properties, still allowing for the joint test of the model. [source]

    Comparing Accuracy in an Unpaired Post-market Device Study with Incomplete Disease Assessment

    Todd A. Alonzo
    Abstract The sensitivity and specificity of a new medical device are often compared relative to that of an existing device by calculating ratios of sensitivities and specificities. Although it would be ideal for all study subjects to receive the gold standard so true disease status was known for all subjects, it is often not feasible or ethical to obtain disease status for everyone. This paper proposes two unpaired designs where each subject is only administered one of the devices and device results dictate which subjects are to receive disease verification. Estimators of the ratio of accuracy and corresponding confidence intervals are proposed for these designs as well as sample size formulae. Simulation studies are performed to investigate the small sample bias of the estimators and the performance of the variance estimators and sample size formulae. The sample size formulae are applied to the design of a cervical cancer study to compare the accuracy of a new device with the conventional Pap smear. [source]

    Efficiency of Functional Regression Estimators for Combining Multiple Laser Scans of cDNA Microarrays

    C. A. Glasbey
    Abstract The first stage in the analysis of cDNA microarray data is estimation of the level of expression of each gene, from laser scans of hybridised microarrays. Typically, data are used from a single scan, although, if multiple scans are available, there is the opportunity to reduce sampling error by using all of them. Combining multiple laser scans can be formulated as multivariate functional regression through the origin. Maximum likelihood estimation fails, but many alternative estimators exist, one of which is to maximise the likelihood of a Gaussian structural regression model. We show by simulation that, surprisingly, this estimator is efficient for our problem, even though the distribution of gene expression values is far from Gaussian. Further, it performs well if errors have a heavier tailed distribution or the model includes intercept terms, but not necessarily in other regions of parameter space. Finally, we show that by combining multiple laser scans we increase the power to detect differential expression of genes. (© 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source]

    Real-Time OD Estimation Using Automatic Vehicle Identification and Traffic Count Data

    Michael P. Dixon
    A key input to many advanced traffic management operations strategies are origin,destination (OD) matricies. In order to examine the possibility of estimating OD matricies in real-time, two constrained OD estimators, based on generalized least squares and Kalman filtering, were developed and tested. A one-at-a-time processing method was introduced to provide an efficient organized framework for incorporating observations from multiple data sources in real-time. The estimators were tested under different conditions based on the type of prior OD information available, the type of assignment available, and the type of link volume model used. The performance of the Kalman filter estimators also was compared to that of the generalized least squares estimator to provide insight regarding their performance characteristics relative to one another for given scenarios. Automatic vehicle identification (AVI) tag counts were used so that observed and estimated OD parameters could be compared. While the approach was motivated using AVI data, the methodology can be generalized to any situation where traffic counts are available and origin volumes can be estimated reliably. The primary means by which AVI data was utilized was through the incorporation of prior observed OD information as measurements, the inclusion of a deterministic link volume component that makes use of OD data extracted from the latest time interval from which all trips have been completed, and through the use of link choice proportions estimated based on link travel time data. It was found that utilizing prior observed OD data along with link counts improves estimator accuracy relative to OD estimation based exclusively on link counts. [source]

    Significance of Specimen Databases from Taxonomic Revisions for Estimating and Mapping the Global Species Diversity of Invertebrates and Repatriating Reliable Specimen Data

    More specifically, we demonstrate for a specimen database assembled during a revision of the robber-fly genus Euscelidia (Asilidae, Diptera) how nonparametric species richness estimators (Chao1, incidence-based coverage estimator, second-order jackknife) can be used to (1) estimate global species diversity, (2) direct future collecting to areas that are undersampled and/or likely to be rich in new species, and (3) assess whether the plant-based global biodiversity hotspots of Myers et al. (2000) contain a significant proportion of invertebrates. During the revision of Euscelidia, the number of known species more than doubled, but estimation of species richness revealed that the true diversity of the genus was likely twice as high. The same techniques applied to subsamples of the data indicated that much of the unknown diversity will be found in the Oriental region. Assessing the validity of biodiversity hotspots for invertebrates is a formidable challenge because it is difficult to decide whether species are hotspot endemics, and lists of observed species dramatically underestimate true diversity. Lastly, conservation biologists need a specimen database analogous to GenBank for collecting specimen records. Such a database has a three-fold advantage over information obtained from digitized museum collections: (1) it is shown for Euscelidia that a large proportion of unrevised museum specimens are misidentified; (2) only the specimen lists in revisionary studies cover a wide variety of private and public collections; and (3) obtaining specimen records from revisions is cost-effective. Resumen:,Sostuvimos que los millones de registros de especimenes publicados en miles de revisiones taxonómicas en décadas anteriores son una fuente de información costo-efectiva de importancia crítica para la incorporación de invertebrados en decisiones de investigación y conservación. Más específicamente, para una base de datos de especimenes de moscas del género Euscelidia (Asilidae, Diptera) demostramos como se pueden utilizar estimadores no paramétricos de riqueza de especies (Chao 1, estimador de cobertura basado en incidencia, navaja de segundo orden) para (1) estimar la diversidad global de especies, (2) dirigir colecciones futuras a áreas que están sub-muestreadas y/o probablemente tengan especies nuevas y (3) evaluar si los sitios globales de importancia para la biodiversidad basados en plantas de Myers et al. (2000) contienen una proporción significativa de invertebrados. Durante la revisión de Euscelidia el número de especies conocidas fue más del doble, pero la estimación de riqueza de especies reveló que la diversidad real del género probablemente también era el doble. Las mismas técnicas aplicadas a las sub-muestras de datos indicaron que gran parte de la diversidad no conocida se encontrará en la Región Oriental. La evaluación de la validez de sitios de importancia para la biodiversidad de invertebrados es un reto formidable porque es difícil decidir si las especies son endémicas de esos sitios y si las listas de especies observadas subestiman dramáticamente la diversidad real. Finalmente, los biólogos de la conservación requieren de una base de datos de especimenes análoga a GenBank, para obtener registros de especimenes. Dicha base de datos tiene una triple ventaja sobre la información obtenida de colecciones de museos digitalizadas. (1) Se muestra para Euscelidia que una gran proporción de especimenes de museo no revisados están mal identificados. (2) Sólo las listas de especimenes en estudios de revisión cubren una amplia variedad de colecciones privadas y públicas. (3) La obtención de registros en revisiones es costo-efectiva. [source]