Variance Estimator (variance + estimator)

Distribution by Scientific Domains


Selected Abstracts


Generalized Method of Moments With Many Weak Moment Conditions

ECONOMETRICA, Issue 3 2009
Whitney K. Newey
Using many moment conditions can improve efficiency but makes the usual generalized method of moments (GMM) inferences inaccurate. Two-step GMM is biased. Generalized empirical likelihood (GEL) has smaller bias, but the usual standard errors are too small in instrumental variable settings. In this paper we give a new variance estimator for GEL that addresses this problem. It is consistent under the usual asymptotics and, under many weak moment asymptotics, is larger than usual and is consistent. We also show that the Kleibergen (2005) Lagrange multiplier and conditional likelihood ratio statistics are valid under many weak moments. In addition, we introduce a jackknife GMM estimator, but find that GEL is asymptotically more efficient under many weak moments. In Monte Carlo examples we find that t -statistics based on the new variance estimator have nearly correct size in a wide range of cases. [source]


Sampling and variance estimation on continuous domains

ENVIRONMETRICS, Issue 6 2006
Cynthia Cooper
Abstract This paper explores fundamental concepts of design- and model-based approaches to sampling and estimation for a response defined on a continuous domain. The paper discusses the concepts in design-based methods as applied in a continuous domain, the meaning of model-based sampling, and the interpretation of the design-based variance of a model-based estimate. A model-assisted variance estimator is examined for circumstances for which a direct design-based estimator may be inadequate or not available. The alternative model-assisted variance estimator is demonstrated in simulations on a realization of a response generated by a process with exponential covariance structure. The empirical results demonstrate that the model-assisted variance estimator is less biased and more efficient than Horvitz,Thompson and Yates,Grundy variance estimators applied to a continuous-domain response. Copyright © 2006 John Wiley & Sons, Ltd. [source]


Variance estimation for spatially balanced samples of environmental resources

ENVIRONMETRICS, Issue 6 2003
Don L. Stevens Jr
Abstract The spatial distribution of a natural resource is an important consideration in designing an efficient survey or monitoring program for the resource. We review a unified strategy for designing probability samples of discrete, finite resource populations, such as lakes within some geographical region; linear populations, such as a stream network in a drainage basin, and continuous, two-dimensional populations, such as forests. The strategy can be viewed as a generalization of spatial stratification. In this article, we develop a local neighborhood variance estimator based on that perspective, and examine its behavior via simulation. The simulations indicate that the local neighborhood estimator is unbiased and stable. The Horvitz,Thompson variance estimator based on assuming independent random sampling (IRS) may be two times the magnitude of the local neighborhood estimate. An example using data from a generalized random-tessellation stratified design on the Oahe Reservoir resulted in local variance estimates being 22 to 58 percent smaller than Horvitz,Thompson IRS variance estimates. Variables with stronger spatial patterns had greater reductions in variance, as expected. Copyright © 2003 John Wiley & Sons, Ltd. [source]


A measure of disclosure risk for microdata

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 4 2002
C. J. Skinner
Summary. Protection against disclosure is important for statistical agencies releasing microdata files from sample surveys. Simple measures of disclosure risk can provide useful evidence to support decisions about release. We propose a new measure of disclosure risk: the probability that a unique match between a microdata record and a population unit is correct. We argue that this measure has at least two advantages. First, we suggest that it may be a more realistic measure of risk than two measures that are currently used with census data. Second, we show that consistent inference (in a specified sense) may be made about this measure from sample data without strong modelling assumptions. This is a surprising finding, in its contrast with the properties of the two ,similar' established measures. As a result, this measure has potentially useful applications to sample surveys. In addition to obtaining a simple consistent predictor of the measure, we propose a simple variance estimator and show that it is consistent. We also consider the extension of inference to allow for certain complex sampling schemes. We present a numerical study based on 1991 census data for about 450 000 enumerated individuals in one area of Great Britain. We show that the theoretical results on the properties of the point predictor of the measure of risk and its variance estimator hold to a good approximation for these data. [source]


Analysis of longitudinal multiple-source binary data using generalized estimating equations

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 1 2004
Liam M. O'Brien
Summary., We present a multivariate logistic regression model for the joint analysis of longitudinal multiple-source binary data. Longitudinal multiple-source binary data arise when repeated binary measurements are obtained from two or more sources, with each source providing a measure of the same underlying variable. Since the number of responses on each subject is relatively large, the empirical variance estimator performs poorly and cannot be relied on in this setting. Two methods for obtaining a parsimonious within-subject association structure are considered. An additional complication arises with estimation, since maximum likelihood estimation may not be feasible without making unrealistically strong assumptions about third- and higher order moments. To circumvent this, we propose the use of a generalized estimating equations approach. Finally, we present an analysis of multiple-informant data obtained longitudinally from a psychiatric interventional trial that motivated the model developed in the paper. [source]


Improving robust model selection tests for dynamic models

THE ECONOMETRICS JOURNAL, Issue 2 2010
Hwan-sik Choi
Summary, We propose an improved model selection test for dynamic models using a new asymptotic approximation to the sampling distribution of a new test statistic. The model selection test is applicable to dynamic models with very general selection criteria and estimation methods. Since our test statistic does not assume the exact form of a true model, the test is essentially non-parametric once competing models are estimated. For the unknown serial correlation in data, we use a Heteroscedasticity/Autocorrelation-Consistent (HAC) variance estimator, and the sampling distribution of the test statistic is approximated by the fixed- b,asymptotic approximation. The asymptotic approximation depends on kernel functions and bandwidth parameters used in HAC estimators. We compare the finite sample performance of the new test with the bootstrap methods as well as with the standard normal approximations, and show that the fixed- b,asymptotics and the bootstrap methods are markedly superior to the standard normal approximation for a moderate sample size for time series data. An empirical application for foreign exchange rate forecasting models is presented, and the result shows the normal approximation to the distribution of the test statistic considered appears to overstate the data's ability to distinguish between two competing models. [source]


Statistical Methods for the Analysis of Genetic Association Studies

ANNALS OF HUMAN GENETICS, Issue 2 2006
G. Y. Zou
Summary This paper applies a retrospective logistic regression model (Prentice, 1976) using a sandwich variance estimator (White, 1982; Zeger et al. 1985) to genetic association studies in which alleles are treated as dependent variables. The validity of switching the positions of allele and trait variables in the regression model is ensured by the invariance property of the odds ratio. The approach is shown to be able to accommodate many commonly seen designs, matched or unmatched alike, having either binary or quantitative traits. The resultant score statistic has potentially higher power than those that have previously appeared in the genetics literature. As a regression model in general, this approach may also be applied to incorporate covariates. Numerical examples implemented with standard software are presented. [source]


VARIANCE ESTIMATION IN TWO-PHASE SAMPLING

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2009
M.A. Hidiroglou
Summary Two-phase sampling is often used for estimating a population total or mean when the cost per unit of collecting auxiliary variables, x, is much smaller than the cost per unit of measuring a characteristic of interest, y. In the first phase, a large sample s1 is drawn according to a specific sampling design p(s1), and auxiliary data x are observed for the units i,s1. Given the first-phase sample s1, a second-phase sample s2 is selected from s1 according to a specified sampling design {p(s2,s1) }, and (y, x) is observed for the units i,s2. In some cases, the population totals of some components of x may also be known. Two-phase sampling is used for stratification at the second phase or both phases and for regression estimation. Horvitz,Thompson-type variance estimators are used for variance estimation. However, the Horvitz,Thompson (Horvitz & Thompson, J. Amer. Statist. Assoc. 1952) variance estimator in uni-phase sampling is known to be highly unstable and may take negative values when the units are selected with unequal probabilities. On the other hand, the Sen,Yates,Grundy variance estimator is relatively stable and non-negative for several unequal probability sampling designs with fixed sample sizes. In this paper, we extend the Sen,Yates,Grundy (Sen, J. Ind. Soc. Agric. Statist. 1953; Yates & Grundy, J. Roy. Statist. Soc. Ser. B 1953) variance estimator to two-phase sampling, assuming fixed first-phase sample size and fixed second-phase sample size given the first-phase sample. We apply the new variance estimators to two-phase sampling designs with stratification at the second phase or both phases. We also develop Sen,Yates,Grundy-type variance estimators of the two-phase regression estimators that make use of the first-phase auxiliary data and known population totals of some of the auxiliary variables. [source]


Cox Regression in Nested Case,Control Studies with Auxiliary Covariates

BIOMETRICS, Issue 2 2010
Mengling Liu
Summary Nested case,control (NCC) design is a popular sampling method in large epidemiological studies for its cost effectiveness to investigate the temporal relationship of diseases with environmental exposures or biological precursors. Thomas' maximum partial likelihood estimator is commonly used to estimate the regression parameters in Cox's model for NCC data. In this article, we consider a situation in which failure/censoring information and some crude covariates are available for the entire cohort in addition to NCC data and propose an improved estimator that is asymptotically more efficient than Thomas' estimator. We adopt a projection approach that, heretofore, has only been employed in situations of random validation sampling and show that it can be well adapted to NCC designs where the sampling scheme is a dynamic process and is not independent for controls. Under certain conditions, consistency and asymptotic normality of the proposed estimator are established and a consistent variance estimator is also developed. Furthermore, a simplified approximate estimator is proposed when the disease is rare. Extensive simulations are conducted to evaluate the finite sample performance of our proposed estimators and to compare the efficiency with Thomas' estimator and other competing estimators. Moreover, sensitivity analyses are conducted to demonstrate the behavior of the proposed estimator when model assumptions are violated, and we find that the biases are reasonably small in realistic situations. We further demonstrate the proposed method with data from studies on Wilms' tumor. [source]


Estimating the Encounter Rate Variance in Distance Sampling

BIOMETRICS, Issue 1 2009
Rachel M. Fewster
Summary The dominant source of variance in line transect sampling is usually the encounter rate variance. Systematic survey designs are often used to reduce the true variability among different realizations of the design, but estimating the variance is difficult and estimators typically approximate the variance by treating the design as a simple random sample of lines. We explore the properties of different encounter rate variance estimators under random and systematic designs. We show that a design-based variance estimator improves upon the model-based estimator of Buckland et al. (2001, Introduction to Distance Sampling. Oxford: Oxford University Press, p. 79) when transects are positioned at random. However, if populations exhibit strong spatial trends, both estimators can have substantial positive bias under systematic designs. We show that poststratification is effective in reducing this bias. [source]


A Semiparametric Estimate of Treatment Effects with Censored Data

BIOMETRICS, Issue 3 2001
Ronghui Xu
Summary. A semiparametric estimate of an average regression effect with right-censored failure time data has recently been proposed under the Cox-type model where the regression effect ,(t) is allowed to vary with time. In this article, we derive a simple algebraic relationship between this average regression effect and a measurement of group differences in K -sample transformation models when the random error belongs to the Gp family of Harrington and Fleming (1982, Biometrika69, 553,566), the latter being equivalent to the conditional regression effect in a gamma frailty model. The models considered here are suitable for the attenuating hazard ratios that often arise in practice. The results reveal an interesting connection among the above three classes of models as alternatives to the proportional hazards assumption and add to our understanding of the behavior of the partial likelihood estimate under nonproportional hazards. The algebraic relationship provides a simple estimator under the transformation model. We develop a variance estimator based on the empirical influence function that is much easier to compute than the previously suggested resampling methods. When there is truncation in the right tail of the failure times, we propose a method of bias correction to improve the coverage properties of the confidence intervals. The estimate, its estimated variance, and the bias correction term can all be calculated with minor modifications to standard software for proportional hazards regression. [source]


Nonparametric Estimation in a Cure Model with Random Cure Times

BIOMETRICS, Issue 1 2001
Rebecca A. Betensky
Summary. Acute respiratory distress syndrome (ARDS) is a life-threatening acute condition that sometimes follows pneumonia or surgery. Patients who recover and leave the hospital are considered to have been cured at the time they leave the hospital. These data differ from typical data in which cure is a possibility: death times are not observed for patients who are cured and cure times are observed and vary among patients. Here we apply a competing risks model to these data and show it to be equivalent to a mixture model, the more common approach for cure data. Further, we derive an estimator for the variance of the cumulative incidence function from the competing risks model, and thus for the cure rate, based on elementary calculations. We compare our variance estimator to Gray's (1988, Annals of Statistics16, 1140,1154) estimator, which is based on counting process theory. We find our estimator to be slightly more accurate in small samples. We apply these results to data from an ARDS clinical trial. [source]


Catch Estimation in the Presence of Declining Catch Rate Due to Gear Saturation

BIOMETRICS, Issue 1 2001
Philip C. Dauk
Summary. One strategy for estimating total catch is to employ two separate surveys that independently estimate total fishing effort and catch rate with the estimator for total catch formed by their product. Survey designs for estimating catch rate often involve interviewing the fishermen during their fishing episodes. Such roving designs result in incomplete episode data and characteristically have employed a model in which the catch rate is assumed to be constant over time. This article extends the problem to that of estimating total catch in the presence of a declining catch rate due, e.g., to gear saturation. Using a gill net fishery as an example, a mean-of-ratios type of estimator for the catch rate together with its variance estimator are developed. Their performance is examined using simulations, with special attention given to effects of restrictions on the roving survey window. Finally, data from a Fraser River gill net fishery are used to illustrate the use of the proposed estimator and to compare results with those from an estimator based on a constant catch rate. [source]


Capture,Recapture When Time and Behavioral Response Affect Capture Probabilities

BIOMETRICS, Issue 2 2000
Anne Chao
Summary. We consider a capture,recapture model in which capture probabilities vary with time and with behavioral response. Two inference procedures are developed under the assumption that recapture probabilities bear a constant relationship to initial capture probabilities. These two procedures are the maximum likelihood method (both unconditional and conditional types are discussed) and an approach based on optimal estimating functions. The population size estimators derived from the two procedures are shown to be asymptotically equivalent when population size is large enough. The performance and relative merits of various population size estimators for finite cases are discussed. The bootstrap method is suggested for constructing a variance estimator and confidence interval. An example of the deer mouse analyzed in Otis et al. (1978, Wildlife Monographs62, 93) is given for illustration. [source]


Sampling and variance estimation on continuous domains

ENVIRONMETRICS, Issue 6 2006
Cynthia Cooper
Abstract This paper explores fundamental concepts of design- and model-based approaches to sampling and estimation for a response defined on a continuous domain. The paper discusses the concepts in design-based methods as applied in a continuous domain, the meaning of model-based sampling, and the interpretation of the design-based variance of a model-based estimate. A model-assisted variance estimator is examined for circumstances for which a direct design-based estimator may be inadequate or not available. The alternative model-assisted variance estimator is demonstrated in simulations on a realization of a response generated by a process with exponential covariance structure. The empirical results demonstrate that the model-assisted variance estimator is less biased and more efficient than Horvitz,Thompson and Yates,Grundy variance estimators applied to a continuous-domain response. Copyright © 2006 John Wiley & Sons, Ltd. [source]


Estimation of rate ratio and relative difference in matched-pairs under inverse sampling

ENVIRONMETRICS, Issue 6 2001
Kung-Jong Lui
Abstract To increase the efficiency of a study and to eliminate the effects of some nuisance confounders, we may consider employing a matched-pair design. Under the commonly assumed quadrinomial sampling, in which the total number of matched-pairs is fixed, we note that the maximum likelihood estimator (MLE) of rate ratio (RR) has an infinitely large bias and no finite variance, and so does the MLE of relative difference (RD). To avoid this theoretical concern, this paper suggests use of an inverse sampling and notes that the MLEs of these parameters, which are actually of the same forms as those under the quadrinomial sampling, are also the uniformly minimum variance estimators (UMVUEs) under the proposed samplings. This paper further derives the exact variances of these MLEs and the corresponding UMVUEs of these variances. Finally, this paper includes a discussion on interval estimation of the RR and RD using these results as well. Copyright © 2001 John Wiley & Sons, Ltd. [source]


The estimation of sibling genetic risk parameters revisited

GENETIC EPIDEMIOLOGY, Issue 4 2004
Guohua Zou
Abstract This report points out that some sibling genetic risk parameters can be regarded as the ratios of the characteristic values in the ascertainment subpopulation. Based on this observation, we reconsider Olson and Cordell's ([2000] Genet. Epidemiol. 18:217,235) and Cordell and Olson's ([2000] Genet. Epidemiol. 18:307,321) estimators, and re-derive these estimators. Furthermore, we provide the closed-form variance estimators. Simulation results suggest that our proposed estimators perform very well, and single ascertainment may be better than complete ascertainment for estimating these genetic parameters. © 2004 Wiley-Liss, Inc. [source]


Combining standardized time series area and Cramér,von Mises variance estimators

NAVAL RESEARCH LOGISTICS: AN INTERNATIONAL JOURNAL, Issue 4 2007
David Goldsman
Abstract We propose three related estimators for the variance parameter arising from a steady-state simulation process. All are based on combinations of standardized-time-series area and Cramér,von Mises (CvM) estimators. The first is a straightforward linear combination of the area and CvM estimators; the second resembles a Durbin,Watson statistic; and the third is related to a jackknifed version of the first. The main derivations yield analytical expressions for the bias and variance of the new estimators. These results show that the new estimators often perform better than the pure area, pure CvM, and benchmark nonoverlapping and overlapping batch means estimators, especially in terms of variance and mean squared error. We also give exact and Monte Carlo examples illustrating our findings.© 2007 Wiley Periodicals, Inc. Naval Research Logistics, 2007 [source]


Exact expected values of variance estimators for simulation

NAVAL RESEARCH LOGISTICS: AN INTERNATIONAL JOURNAL, Issue 4 2007
Tûba Aktaran-Kalayc
Abstract We formulate exact expressions for the expected values of selected estimators of the variance parameter (that is, the sum of covariances at all lags) of a steady-state simulation output process. Given in terms of the autocovariance function of the process, these expressions are derived for variance estimators based on the simulation analysis methods of nonoverlapping batch means, overlapping batch means, and standardized time series. Comparing estimator performance in a first-order autoregressive process and the M/M/1 queue-waiting-time process, we find that certain standardized time series estimators outperform their competitors as the sample size becomes large. © 2007 Wiley Periodicals, Inc. Naval Research Logistics, 2007 [source]


Replicated batch means for steady-state simulations

NAVAL RESEARCH LOGISTICS: AN INTERNATIONAL JOURNAL, Issue 6 2006
Nilay Tan, k Argon
Abstract This paper studies a new steady-state simulation output analysis method called replicated batch means in which a small number of replications are conducted and the observations in these replications are grouped into batches. This paper also introduces and compares methods for selecting the initial state of each replication. More specifically, we show that confidence intervals constructed by the replicated batch means method are valid for large batch sizes and derive expressions for the expected values and variances of the steady-state mean and variance estimators for stationary processes and large sample sizes. We then use these expressions, analytical examples, and numerical experiments to compare the replicated batch means method with the standard batch means and multiple replications methods. The numerical results, which are obtained from an AR(1) process and a small, nearly-decomposable Markov chain, show that the multiple replications method often gives confidence intervals with poorer coverage than the standard and replicated batch means methods and that the replicated batch means method, implemented with good choices of initialization method and number of replications, provides confidence interval coverages that range from being comparable with to being noticeably better than coverages obtained by the standard batch means method. © 2006 Wiley Periodicals, Inc. Naval Research Logistics, 2006 [source]


VARIANCE ESTIMATION IN TWO-PHASE SAMPLING

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2009
M.A. Hidiroglou
Summary Two-phase sampling is often used for estimating a population total or mean when the cost per unit of collecting auxiliary variables, x, is much smaller than the cost per unit of measuring a characteristic of interest, y. In the first phase, a large sample s1 is drawn according to a specific sampling design p(s1), and auxiliary data x are observed for the units i,s1. Given the first-phase sample s1, a second-phase sample s2 is selected from s1 according to a specified sampling design {p(s2,s1) }, and (y, x) is observed for the units i,s2. In some cases, the population totals of some components of x may also be known. Two-phase sampling is used for stratification at the second phase or both phases and for regression estimation. Horvitz,Thompson-type variance estimators are used for variance estimation. However, the Horvitz,Thompson (Horvitz & Thompson, J. Amer. Statist. Assoc. 1952) variance estimator in uni-phase sampling is known to be highly unstable and may take negative values when the units are selected with unequal probabilities. On the other hand, the Sen,Yates,Grundy variance estimator is relatively stable and non-negative for several unequal probability sampling designs with fixed sample sizes. In this paper, we extend the Sen,Yates,Grundy (Sen, J. Ind. Soc. Agric. Statist. 1953; Yates & Grundy, J. Roy. Statist. Soc. Ser. B 1953) variance estimator to two-phase sampling, assuming fixed first-phase sample size and fixed second-phase sample size given the first-phase sample. We apply the new variance estimators to two-phase sampling designs with stratification at the second phase or both phases. We also develop Sen,Yates,Grundy-type variance estimators of the two-phase regression estimators that make use of the first-phase auxiliary data and known population totals of some of the auxiliary variables. [source]


Comparing Accuracy in an Unpaired Post-market Device Study with Incomplete Disease Assessment

BIOMETRICAL JOURNAL, Issue 3 2009
Todd A. Alonzo
Abstract The sensitivity and specificity of a new medical device are often compared relative to that of an existing device by calculating ratios of sensitivities and specificities. Although it would be ideal for all study subjects to receive the gold standard so true disease status was known for all subjects, it is often not feasible or ethical to obtain disease status for everyone. This paper proposes two unpaired designs where each subject is only administered one of the devices and device results dictate which subjects are to receive disease verification. Estimators of the ratio of accuracy and corresponding confidence intervals are proposed for these designs as well as sample size formulae. Simulation studies are performed to investigate the small sample bias of the estimators and the performance of the variance estimators and sample size formulae. The sample size formulae are applied to the design of a cervical cancer study to compare the accuracy of a new device with the conventional Pap smear. [source]


Using Regression Models to Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models

BIOMETRICS, Issue 3 2009
Michael Rosenblum
Summary Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, because commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this article, we focus on hypothesis tests of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such hypothesis tests have correct type I error for large samples. Our main result is that for a surprisingly large class of commonly used regression models, standard regression-based hypothesis tests (but using robust variance estimators) are guaranteed to have correct type I error for large samples, even when the models are incorrectly specified. To the best of our knowledge, this robustness of such model-based hypothesis tests to incorrectly specified models was previously unknown for Poisson regression models and for other commonly used models we consider. Our results have practical implications for understanding the reliability of commonly used, model-based tests for analyzing randomized trials. [source]


Estimating the Encounter Rate Variance in Distance Sampling

BIOMETRICS, Issue 1 2009
Rachel M. Fewster
Summary The dominant source of variance in line transect sampling is usually the encounter rate variance. Systematic survey designs are often used to reduce the true variability among different realizations of the design, but estimating the variance is difficult and estimators typically approximate the variance by treating the design as a simple random sample of lines. We explore the properties of different encounter rate variance estimators under random and systematic designs. We show that a design-based variance estimator improves upon the model-based estimator of Buckland et al. (2001, Introduction to Distance Sampling. Oxford: Oxford University Press, p. 79) when transects are positioned at random. However, if populations exhibit strong spatial trends, both estimators can have substantial positive bias under systematic designs. We show that poststratification is effective in reducing this bias. [source]


On a Supplemented Case,Control Design

BIOMETRICS, Issue 2 2005
Ruth M. Pfeiffer
Summary The supplemented case,control design consists of a case,control sample and of an additional sample of disease-free subjects who arise from a given stratum of one of the measured exposures in the case,control study. The supplemental data might, for example, arise from a population survey conducted independently of the case,control study. This design improves precision of estimates of main effects and especially of joint exposures, particularly when joint exposures are uncommon and the prevalence of one of the exposures is low. We first present a pseudo-likelihood estimator (PLE) that is easy to compute. We further adapt two-phase design methods to find maximum likelihood estimates (MLEs) for the log odds ratios for this design and derive asymptotic variance estimators that appropriately account for the differences in sampling schemes of this design from that of the traditional two-phase design. As an illustration of our design we present a study that was conducted to assess the influence to joint exposure of hepatitis-B virus (HBV) and hepatitis-C virus (HCV) infection on the risk of hepatocellular carcinoma in data from Qidong County, Jiangsu Province, China. [source]