Home About us Contact | |||

## Sample Performance (sample + performance)
Kinds of Sample Performance
## Selected Abstracts## Forecasting and Finite Sample Performance of Short Rate Models: International Evidence, INTERNATIONAL REVIEW OF FINANCE, Issue 3-4 2005SIRIMON TREEPONGKARUNAABSTRACT This paper evaluates the forecasting and finite sample performance of short-term interest rate models in a number of countries. Specifically, we run a series of in-sample and out-of-sample tests for both the conditional mean and volatility of one-factor short rate models, and compare the results to the random walk model. Overall, we find that the out-of-sample forecasting performance of one-factor short rate models is poor, stemming from the inability of the models to accommodate jumps and discontinuities in the time series data. In addition, we perform a series of Monte Carlo analyses similar to Chapman and Pearson to document the finite sample performance of the short rate models when ,3 is not restricted to be equal to one. Our results indicate the potential dangers of over-parameterization and highlight the limitations of short-term interest rate models. [source] ## Panel Data Discrete Choice Models with Lagged Dependent Variables ECONOMETRICA, Issue 4 2000Bo E. HonoréIn this paper, we consider identification and estimation in panel data discrete choice models when the explanatory variable set includes strictly exogenous variables, lags of the endogenous dependent variable as well as unobservable individual-specific effects. For the binary logit model with the dependent variable lagged only once, Chamberlain (1993) gave conditions under which the model is not identified. We present a stronger set of conditions under which the parameters of the model are identified. The identification result suggests estimators of the model, and we show that these are consistent and asymptotically normal, although their rate of convergence is slower than the inverse of the square root of the sample size. We also consider identification in the semiparametric case where the logit assumption is relaxed. We propose an estimator in the spirit of the conditional maximum score estimator (Manski (1987)) and we show that it is consistent. In addition, we discuss an extension of the identification result to multinomial discrete choice models, and to the case where the dependent variable is lagged twice. Finally, we present some Monte Carlo evidence on the small sample performance of the proposed estimators for the binary response model. [source] ## Successful amplification of degraded DNA for use with high-throughput SNP genotyping platforms, HUMAN MUTATION, Issue 12 2008Simon MeadAbstract Highly accurate and high-throughput SNP genotyping platforms are increasingly popular but the performance of suboptimal DNA samples remains unclear. The aim of our study was to determine the best platform, amplification technique, and loading concentration to maximize genotype accuracy and call rate using degraded samples. We amplified high-molecular weight genomic DNA samples recently extracted from whole blood and degraded DNA samples extracted from 50-year-old patient sera. Two whole-genome amplification (WGA) methodologies were used: an isothermal multiple displacement amplification method (MDA) and a fragmentation-PCR,based method (GenomePlex® [GPLEX]; Sigma-Aldrich, St. Louis, MO). Duplicate runs were performed on genome-wide dense SNP arrays (Nsp-Mendel; Affymetrix) and custom SNP platforms based on molecular inversion probes (Targeted Genotyping [TG]; Affymetrix) and BeadArray technology (Golden Gate [GG]; Illumina). Miscalls and no-calls on Mendel arrays were correlated with each other, with confidence scores from the Bayesian calling algorithm, and with average probe intensity. Degraded DNA amplified with MDA gave low call rates and concordance across all platforms at standard loading concentrations. The call rate with MDA on GG was improved when a 5,×,concentration of amplified DNA was used. The GPLEX amplification gave high call rate and concordance for degraded DNA at standard and higher loading concentrations on both TG and GG platforms. Based on these analyses, after standard filtering for SNP and sample performance, we were able to achieve a mean call rate of 99.7% and concordance 99.7% using degraded samples amplified by GPLEX on GG technology at 2,×,loading concentration. These findings may be useful for investigators planning case-control association studies with patient samples of suboptimal quality. Hum Mutat 0, 1,7, 2008. © 2008 Wiley-Liss, Inc. [source] ## Forecasting and Finite Sample Performance of Short Rate Models: International Evidence, INTERNATIONAL REVIEW OF FINANCE, Issue 3-4 2005SIRIMON TREEPONGKARUNAABSTRACT This paper evaluates the forecasting and finite sample performance of short-term interest rate models in a number of countries. Specifically, we run a series of in-sample and out-of-sample tests for both the conditional mean and volatility of one-factor short rate models, and compare the results to the random walk model. Overall, we find that the out-of-sample forecasting performance of one-factor short rate models is poor, stemming from the inability of the models to accommodate jumps and discontinuities in the time series data. In addition, we perform a series of Monte Carlo analyses similar to Chapman and Pearson to document the finite sample performance of the short rate models when ,3 is not restricted to be equal to one. Our results indicate the potential dangers of over-parameterization and highlight the limitations of short-term interest rate models. [source] ## A semiparametric model for binary response and continuous outcomes under index heteroscedasticity JOURNAL OF APPLIED ECONOMETRICS, Issue 5 2009Roger KleinThis paper formulates a likelihood-based estimator for a double-index, semiparametric binary response equation. A novel feature of this estimator is that it is based on density estimation under local smoothing. While the proofs differ from those based on alternative density estimators, the finite sample performance of the estimator is significantly improved. As binary responses often appear as endogenous regressors in continuous outcome equations, we also develop an optimal instrumental variables estimator in this context. For this purpose, we specialize the double-index model for binary response to one with heteroscedasticity that depends on an index different from that underlying the ,mean response'. We show that such (multiplicative) heteroscedasticity, whose form is not parametrically specified, effectively induces exclusion restrictions on the outcomes equation. The estimator developed exploits such identifying information. We provide simulation evidence on the favorable performance of the estimators and illustrate their use through an empirical application on the determinants, and affect, of attendance at a government-financed school. Copyright © 2009 John Wiley & Sons, Ltd. [source] ## A self-normalized approach to confidence interval construction in time series JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 3 2010Xiaofeng ShaoSummary., We propose a new method to construct confidence intervals for quantities that are associated with a stationary time series, which avoids direct estimation of the asymptotic variances. Unlike the existing tuning-parameter-dependent approaches, our method has the attractive convenience of being free of any user-chosen number or smoothing parameter. The interval is constructed on the basis of an asymptotically distribution-free self-normalized statistic, in which the normalizing matrix is computed by using recursive estimates. Under mild conditions, we establish the theoretical validity of our method for a broad class of statistics that are functionals of the empirical distribution of fixed or growing dimension. From a practical point of view, our method is conceptually simple, easy to implement and can be readily used by the practitioner. Monte Carlo simulations are conducted to compare the finite sample performance of the new method with those delivered by the normal approximation and the block bootstrap approach. [source] ## Detecting changes in the mean of functional observations JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 5 2009István BerkesSummary., Principal component analysis has become a fundamental tool of functional data analysis. It represents the functional data as Xi(t)=,(t)+,1,l<,,i, l+ vl(t), where , is the common mean, vl are the eigenfunctions of the covariance operator and the ,i, l are the scores. Inferential procedures assume that the mean function ,(t) is the same for all values of i. If, in fact, the observations do not come from one population, but rather their mean changes at some point(s), the results of principal component analysis are confounded by the change(s). It is therefore important to develop a methodology to test the assumption of a common functional mean. We develop such a test using quantities which can be readily computed in the R package fda. The null distribution of the test statistic is asymptotically pivotal with a well-known asymptotic distribution. The asymptotic test has excellent finite sample performance. Its application is illustrated on temperature data from England. [source] ## Sure independence screening for ultrahigh dimensional feature space JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 5 2008Jianqing FanSummary., Variable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using L1 -regularization and showed that it achieves the ideal risk up to a logarithmic factor log (p). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor log (p) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated. [source] ## Mixture cure survival models with dependent censoring JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 3 2007Yi LiSummary., The paper is motivated by cure detection among the prostate cancer patients in the National Institutes of Health surveillance epidemiology and end results programme, wherein the main end point (e.g. deaths from prostate cancer) and the censoring causes (e.g. deaths from heart diseases) may be dependent. Although many researchers have studied the mixture survival model to analyse survival data with non-negligible cure fractions, none has studied the mixture cure model in the presence of dependent censoring. To account for such dependence, we propose a more general cure model that allows for dependent censoring. We derive the cure models from the perspective of competing risks and model the dependence between the censoring time and the survival time by using a class of Archimedean copula models. Within this framework, we consider the parameter estimation, the cure detection and the two-sample comparison of latency distributions in the presence of dependent censoring when a proportion of patients is deemed cured. Large sample results by using martingale theory are obtained. We examine the finite sample performance of the proposed methods via simulation and apply them to analyse the surveillance epidemiology and end results prostate cancer data. [source] ## Estimation of integrated squared density derivatives from a contaminated sample JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 4 2002A. DelaigleSummary. We propose a kernel estimator of integrated squared density derivatives, from a sample that has been contaminated by random noise. We derive asymptotic expressions for the bias and the variance of the estimator and show that the squared bias term dominates the variance term. This coincides with results that are available for non-contaminated observations. We then discuss the selection of the bandwidth parameter when estimating integrated squared density derivatives based on contaminated data. We propose a data-driven bandwidth selection procedure of the plug-in type and investigate its finite sample performance via a simulation study. [source] ## Generalized least squares with misspecified serial correlation structures JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 3 2001Sergio G. KoreishaSummary. The regression literature contains hundreds of studies on serially correlated disturbances. Most of these studies assume that the structure of the error covariance matrix , is known or can be estimated consistently from data. Surprisingly, few studies investigate the properties of estimated generalized least squares (GLS) procedures when the structure of , is incorrectly identified and the parameters are inefficiently estimated. We compare the finite sample efficiencies of ordinary least squares (OLS), GLS and incorrect GLS (IGLS) estimators. We also prove new theorems establishing theoretical efficiency bounds for IGLS relative to GLS and OLS. Results from an exhaustive simulation study are used to evaluate the finite sample performance and to demonstrate the robustness of IGLS estimates vis-ŕ-vis OLS and GLS estimates constructed for models with known and estimated (but correctly identified) ,. Some of our conclusions for finite samples differ from established asymptotic results. [source] ## Estimation of the location and exponent of the spectral singularity of a long memory process JOURNAL OF TIME SERIES ANALYSIS, Issue 1 2004Javier HidalgoAbstract., We consider the estimation of the location of the pole and memory parameter ,0 and d of a covariance stationary process with spectral density We investigate optimal rates of convergence for the estimators of ,0 and d, and the consequence that the lack of knowledge of ,0 has on the estimation of the memory parameter d. We present estimators which achieve the optimal rates. A small Monte-Carlo study is included to illustrate the finite sample performance of our estimators. [source] ## Testing for Multicointegration in Panel Data with Common Factors, OXFORD BULLETIN OF ECONOMICS & STATISTICS, Issue 2006Vanessa Berenguer-RicoAbstract This paper addresses the concept of multicointegration in a panel data framework and builds upon the panel data cointegration procedures developed in Pedroni [Econometric Theory (2004), Vol. 20, pp. 597,625]. When individuals are either cross-section independent, or cross-section dependence can be removed by cross-section demeaning, our approach can be applied to the wider framework of mixed I(2) and I(1) stochastic processes. The paper also deals with the issue of cross-section dependence using approximate common-factor models. Finite sample performance is investigated through Monte Carlo simulations. Finally, we illustrate the use of the procedure investigating an inventories, sales and production relationship for a panel of US industries. [source] ## Specification and estimation of social interaction models with network structures THE ECONOMETRICS JOURNAL, Issue 2 2010Lung-fei LeeSummary, This paper considers the specification and estimation of social interaction models with network structures and the presence of endogenous, contextual and correlated effects. With macro group settings, group-specific fixed effects are also incorporated in the model. The network structure provides information on the identification of the various interaction effects. We propose a quasi-maximum likelihood approach for the estimation of the model. We derive the asymptotic distribution of the proposed estimator, and provide Monte Carlo evidence on its small sample performance. [source] ## Improving robust model selection tests for dynamic models THE ECONOMETRICS JOURNAL, Issue 2 2010Hwan-sik ChoiSummary, We propose an improved model selection test for dynamic models using a new asymptotic approximation to the sampling distribution of a new test statistic. The model selection test is applicable to dynamic models with very general selection criteria and estimation methods. Since our test statistic does not assume the exact form of a true model, the test is essentially non-parametric once competing models are estimated. For the unknown serial correlation in data, we use a Heteroscedasticity/Autocorrelation-Consistent (HAC) variance estimator, and the sampling distribution of the test statistic is approximated by the fixed- b,asymptotic approximation. The asymptotic approximation depends on kernel functions and bandwidth parameters used in HAC estimators. We compare the finite sample performance of the new test with the bootstrap methods as well as with the standard normal approximations, and show that the fixed- b,asymptotics and the bootstrap methods are markedly superior to the standard normal approximation for a moderate sample size for time series data. An empirical application for foreign exchange rate forecasting models is presented, and the result shows the normal approximation to the distribution of the test statistic considered appears to overstate the data's ability to distinguish between two competing models. [source] ## Testing Equality between Two Diagnostic Procedures in Paired-Sample Ordinal Data BIOMETRICAL JOURNAL, Issue 6 2004Kung-Jong LuiAbstract When a new diagnostic procedure is developed, it is important to assess whether the diagnostic accuracy of the new procedure is different from that of the standard procedure. For paired-sample ordinal data, this paper develops two test statistics for testing equality of the diagnostic accuracy between two procedures without assuming any parametric models. One is derived on the basis of the probability of correctly identifying the case for a randomly selected pair of a case and a non-case over all possible cutoff points, and the other is derived on the basis of the sensitivity and specificity directly. To illustrate the practical use of the proposed test procedures, this paper includes an example regarding the use of digitized and plain films for screening breast cancer. This paper also applies Monte Carlo simulation to evaluate the finite sample performance of the two statistics developed here and notes that they can perform well in a variety of situations. (© 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source] ## Detecting Genomic Aberrations Using Products in a Multiscale Analysis BIOMETRICS, Issue 3 2010Xuesong YuSummary Genomic instability, such as copy-number losses and gains, occurs in many genetic diseases. Recent technology developments enable researchers to measure copy numbers at tens of thousands of markers simultaneously. In this article, we propose a nonparametric approach for detecting the locations of copy-number changes and provide a measure of significance for each change point. The proposed test is based on seeking scale-based changes in the sequence of copy numbers, which is ordered by the marker locations along the chromosome. The method leads to a natural way to estimate the null distribution for the test of a change point and adjusted,p -values for the significance of a change point using a step-down maxT permutation algorithm to control the family-wise error rate. A simulation study investigates the finite sample performance of the proposed method and compares it with a more standard sequential testing method. The method is illustrated using two real data sets. [source] ## Cox Regression in Nested Case,Control Studies with Auxiliary Covariates BIOMETRICS, Issue 2 2010Mengling LiuSummary Nested case,control (NCC) design is a popular sampling method in large epidemiological studies for its cost effectiveness to investigate the temporal relationship of diseases with environmental exposures or biological precursors. Thomas' maximum partial likelihood estimator is commonly used to estimate the regression parameters in Cox's model for NCC data. In this article, we consider a situation in which failure/censoring information and some crude covariates are available for the entire cohort in addition to NCC data and propose an improved estimator that is asymptotically more efficient than Thomas' estimator. We adopt a projection approach that, heretofore, has only been employed in situations of random validation sampling and show that it can be well adapted to NCC designs where the sampling scheme is a dynamic process and is not independent for controls. Under certain conditions, consistency and asymptotic normality of the proposed estimator are established and a consistent variance estimator is also developed. Furthermore, a simplified approximate estimator is proposed when the disease is rare. Extensive simulations are conducted to evaluate the finite sample performance of our proposed estimators and to compare the efficiency with Thomas' estimator and other competing estimators. Moreover, sensitivity analyses are conducted to demonstrate the behavior of the proposed estimator when model assumptions are violated, and we find that the biases are reasonably small in realistic situations. We further demonstrate the proposed method with data from studies on Wilms' tumor. [source] ## Incorporating Correlation for Multivariate Failure Time Data When Cluster Size Is Large BIOMETRICS, Issue 2 2010L. XueSummary We propose a new estimation method for multivariate failure time data using the quadratic inference function (QIF) approach. The proposed method efficiently incorporates within-cluster correlations. Therefore, it is more efficient than those that ignore within-cluster correlation. Furthermore, the proposed method is easy to implement. Unlike the weighted estimating equations in Cai and Prentice (1995,,Biometrika,82, 151,164), it is not necessary to explicitly estimate the correlation parameters. This simplification is particularly useful in analyzing data with large cluster size where it is difficult to estimate intracluster correlation. Under certain regularity conditions, we show the consistency and asymptotic normality of the proposed QIF estimators. A chi-squared test is also developed for hypothesis testing. We conduct extensive Monte Carlo simulation studies to assess the finite sample performance of the proposed methods. We also illustrate the proposed methods by analyzing primary biliary cirrhosis (PBC) data. [source] ## Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data BIOMETRICS, Issue 1 2010Yisheng LiSummary We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods. [source] ## Variable Selection in the Cox Regression Model with Covariates Missing at Random BIOMETRICS, Issue 1 2010Ramon I. GarciaSummary We consider variable selection in the Cox regression model (Cox, 1975,,Biometrika,362, 269,276) with covariates missing at random. We investigate the smoothly clipped absolute deviation penalty and adaptive least absolute shrinkage and selection operator (LASSO) penalty, and propose a unified model selection and estimation procedure. A computationally attractive algorithm is developed, which simultaneously optimizes the penalized likelihood function and penalty parameters. We also optimize a model selection criterion, called the,ICQ,statistic (Ibrahim, Zhu, and Tang, 2008,,Journal of the American Statistical Association,103, 1648,1658), to estimate the penalty parameters and show that it consistently selects all important covariates. Simulations are performed to evaluate the finite sample performance of the penalty estimates. Also, two lung cancer data sets are analyzed to demonstrate the proposed methodology. [source] ## Structural Nested Mean Models for Assessing Time-Varying Effect Moderation BIOMETRICS, Issue 1 2010Daniel AlmirallSummary This article considers the problem of assessing causal effect moderation in longitudinal settings in which treatment (or exposure) is time varying and so are the covariates said to moderate its effect.,Intermediate causal effects,that describe time-varying causal effects of treatment conditional on past covariate history are introduced and considered as part of Robins' structural nested mean model. Two estimators of the intermediate causal effects, and their standard errors, are presented and discussed: The first is a proposed two-stage regression estimator. The second is Robins' G-estimator. The results of a small simulation study that begins to shed light on the small versus large sample performance of the estimators, and on the bias,variance trade-off between the two estimators are presented. The methodology is illustrated using longitudinal data from a depression study. [source] ## Diagnosis of Random-Effect Model Misspecification in Generalized Linear Mixed Models for Binary Response BIOMETRICS, Issue 2 2009Xianzheng HuangSummary Generalized linear mixed models (GLMMs) are widely used in the analysis of clustered data. However, the validity of likelihood-based inference in such analyses can be greatly affected by the assumed model for the random effects. We propose a diagnostic method for random-effect model misspecification in GLMMs for clustered binary response. We provide a theoretical justification of the proposed method and investigate its finite sample performance via simulation. The proposed method is applied to data from a longitudinal respiratory infection study. [source] ## Doubly Robust Estimation in Missing Data and Causal Inference Models BIOMETRICS, Issue 4 2005Heejung BangSummary The goal of this article is to construct doubly robust (DR) estimators in ignorable missing data and causal inference models. In a missing data model, an estimator is DR if it remains consistent when either (but not necessarily both) a model for the missingness mechanism or a model for the distribution of the complete data is correctly specified. Because with observational data one can never be sure that either a missingness model or a complete data model is correct, perhaps the best that can be hoped for is to find a DR estimator. DR estimators, in contrast to standard likelihood-based or (nonaugmented) inverse probability-weighted estimators, give the analyst two chances, instead of only one, to make a valid inference. In a causal inference model, an estimator is DR if it remains consistent when either a model for the treatment assignment mechanism or a model for the distribution of the counterfactual data is correctly specified. Because with observational data one can never be sure that a model for the treatment assignment mechanism or a model for the counterfactual data is correct, inference based on DR estimators should improve upon previous approaches. Indeed, we present the results of simulation studies which demonstrate that the finite sample performance of DR estimators is as impressive as theory would predict. The proposed method is applied to a cardiovascular clinical trial. [source] ## Weighted Normality-Based Estimator in Correcting Correlation Coefficient Estimation Between Incomplete Nutrient Measurements BIOMETRICS, Issue 1 2000C. Y. WangSummary. Consider the problem of estimating the correlation between two nutrient measurements, such as the percent energy from fat obtained from a food frequency questionnaire (FFQ) and that from repeated food records or 24-hour recalls. Under a classical additive model for repeated food records, it is known that there is an attenuation effect on the correlation estimation if the sample average of repeated food records for each subject is used to estimate the underlying long-term average. This paper considers the case in which the selection probability of a subject for participation in the calibration study, in which repeated food records are measured, depends on the corresponding FFQ value, and the repeated longitudinal measurement errors have an autoregressive structure. This paper investigates a normality-based estimator and compares it with a simple method of moments. Both methods are consistent if the first two moments of nutrient measurements exist. Furthermore, joint estimating equations are applied to estimate the correlation coefficient and related nuisance parameters simultaneously. This approach provides a simple sandwich formula for the covariance estimation of the estimator. Finite sample performance is examined via a simulation study, and the proposed weighted normality-based estimator performs well under various distributional assumptions. The methods are applied to real data from a dietary assessment study. [source] ## Regression analysis based on semicompeting risks data JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 1 2008Jin-Jian HsiehSummary., Semicompeting risks data are commonly seen in biomedical applications in which a terminal event censors a non-terminal event. Possible dependent censoring complicates statistical analysis. We consider regression analysis based on a non-terminal event, say disease progression, which is subject to censoring by death. The methodology proposed is developed for discrete covariates under two types of assumption. First, separate copula models are assumed for each covariate group and then a flexible regression model is imposed on the progression time which is of major interest. Model checking procedures are also proposed to help to choose a best-fitted model. Under a two-sample setting, Lin and co-workers proposed a competing method which requires an additional marginal assumption on the terminal event and implicitly assumes that the dependence structures in the two groups are the same. Using simulations, we compare the two approaches on the basis of their finite sample performances and robustness properties under model misspecification. The method proposed is applied to a bone marrow transplant data set. [source] ## Regularized Estimation for the Accelerated Failure Time Model BIOMETRICS, Issue 2 2009T. CaiSummary In the presence of high-dimensional predictors, it is challenging to develop reliable regression models that can be used to accurately predict future outcomes. Further complications arise when the outcome of interest is an event time, which is often not fully observed due to censoring. In this article, we develop robust prediction models for event time outcomes by regularizing the Gehan's estimator for the accelerated failure time (AFT) model (Tsiatis, 1996, Annals of Statistics18, 305,328) with least absolute shrinkage and selection operator (LASSO) penalty. Unlike existing methods based on the inverse probability weighting and the Buckley and James estimator (Buckley and James, 1979, Biometrika66, 429,436), the proposed approach does not require additional assumptions about the censoring and always yields a solution that is convergent. Furthermore, the proposed estimator leads to a stable regression model for prediction even if the AFT model fails to hold. To facilitate the adaptive selection of the tuning parameter, we detail an efficient numerical algorithm for obtaining the entire regularization path. The proposed procedures are applied to a breast cancer dataset to derive a reliable regression model for predicting patient survival based on a set of clinical prognostic factors and gene signatures. Finite sample performances of the procedures are evaluated through a simulation study. [source] |