Model Misspecification (model + misspecification)

Distribution by Scientific Domains


Selected Abstracts


MODEL MISSPECIFICATION: WHY AGGREGATION OF OFFENSES IN FEDERAL SENTENCING EQUATIONS IS PROBLEMATIC

CRIMINOLOGY, Issue 4 2003
CELESTA A. ALBONETTI
This paper addresses two concerns that arise from Steffensmeier and Demuth (2001) analysis of federal sentencing and their misrepresentation of my analyses of sentence severity (Albonetti, 1997). My primary concern is to alert researchers to the importance of controlling for the guidelines offense that drives the sentencing process under the Federal Sentencing Guidelines. My second concern is to correct Steffensmeier and Demuth's (2001) errors in interpretation of my earlier findings of the effect of guidelines offense severity on length of imprisonment. [source]


On Latent-Variable Model Misspecification in Structural Measurement Error Models for Binary Response

BIOMETRICS, Issue 3 2009
Xianzheng Huang
Summary We consider structural measurement error models for a binary response. We show that likelihood-based estimators obtained from fitting structural measurement error models with pooled binary responses can be far more robust to covariate measurement error in the presence of latent-variable model misspecification than the corresponding estimators from individual responses. Furthermore, despite the loss in information, pooling can provide improved parameter estimators in terms of mean-squared error. Based on these and other findings, we create a new diagnostic method to detect latent-variable model misspecification in structural measurement error models with individual binary response. We use simulation and data from the Framingham Heart Study to illustrate our methods. [source]


Diagnosis of Random-Effect Model Misspecification in Generalized Linear Mixed Models for Binary Response

BIOMETRICS, Issue 2 2009
Xianzheng Huang
Summary Generalized linear mixed models (GLMMs) are widely used in the analysis of clustered data. However, the validity of likelihood-based inference in such analyses can be greatly affected by the assumed model for the random effects. We propose a diagnostic method for random-effect model misspecification in GLMMs for clustered binary response. We provide a theoretical justification of the proposed method and investigate its finite sample performance via simulation. The proposed method is applied to data from a longitudinal respiratory infection study. [source]


A score for Bayesian genome screening

GENETIC EPIDEMIOLOGY, Issue 3 2003
E. Warwick Daw
Abstract Bayesian Monte Carlo Markov chain (MCMC) techniques have shown promise in dissecting complex genetic traits. The methods introduced by Heath ([1997], Am. J. Hum. Genet. 61:748,760), and implemented in the program Loki, have been able to localize genes for complex traits in both real and simulated data sets. Loki estimates the posterior probability of quantitative trait loci (QTL) at locations on a chromosome in an iterative MCMC process. Unfortunately, interpretation of the results and assessment of their significance have been difficult. Here, we introduce a score, the log of the posterior placement probability ratio (LOP), for assessing oligogenic QTL detection and localization. The LOP is the log of the posterior probability of linkage to the real chromosome divided by the posterior probability of linkage to an unlinked pseudochromosome, with marker informativeness similar to the marker data on the real chromosome. Since the LOP cannot be calculated exactly, we estimate it in simultaneous MCMC on both real and pseudochromosomes. We investigate empirically the distributional properties of the LOP in the presence and absence of trait genes. The LOP is not subject to trait model misspecification in the way a lod score may be, and we show that the LOP can detect linkage for loci of small effect when the lod score cannot. We show how, in the absence of linkage, an empirical distribution of the LOP may be estimated by simulation and used to provide an assessment of linkage detection significance. Genet Epidemiol 24:181,190, 2003. © 2003 Wiley-Liss, Inc. [source]


Robustness of alternative non-linearity tests for SETAR models

JOURNAL OF FORECASTING, Issue 3 2004
Wai-Sum Chan
Abstract In recent years there has been a growing interest in exploiting potential forecast gains from the non-linear structure of self-exciting threshold autoregressive (SETAR) models. Statistical tests have been proposed in the literature to help analysts check for the presence of SETAR-type non-linearities in an observed time series. It is important to study the power and robustness properties of these tests since erroneous test results might lead to misspecified prediction problems. In this paper we investigate the robustness properties of several commonly used non-linearity tests. Both the robustness with respect to outlying observations and the robustness with respect to model specification are considered. The power comparison of these testing procedures is carried out using Monte Carlo simulation. The results indicate that all of the existing tests are not robust to outliers and model misspecification. Finally, an empirical application applies the statistical tests to stock market returns of the four little dragons (Hong Kong, South Korea, Singapore and Taiwan) in East Asia. The non-linearity tests fail to provide consistent conclusions most of the time. The results in this article stress the need for a more robust test for SETAR-type non-linearity in time series analysis and forecasting. Copyright © 2004 John Wiley & Sons, Ltd. [source]


Hierarchical related regression for combining aggregate and individual data in studies of socio-economic disease risk factors

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 1 2008
Christopher Jackson
Summary., To obtain information about the contribution of individual and area level factors to population health, it is desirable to use both data collected on areas, such as censuses, and on individuals, e.g. survey and cohort data. Recently developed models allow us to carry out simultaneous regressions on related data at the individual and aggregate levels. These can reduce ,ecological bias' that is caused by confounding, model misspecification or lack of information and increase power compared with analysing the data sets singly. We use these methods in an application investigating individual and area level sociodemographic predictors of the risk of hospital admissions for heart and circulatory disease in London. We discuss the practical issues that are encountered in this kind of data synthesis and demonstrate that this modelling framework is sufficiently flexible to incorporate a wide range of sources of data and to answer substantive questions. Our analysis shows that the variations that are observed are mainly attributable to individual level factors rather than the contextual effect of deprivation. [source]


Regression analysis based on semicompeting risks data

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 1 2008
Jin-Jian Hsieh
Summary., Semicompeting risks data are commonly seen in biomedical applications in which a terminal event censors a non-terminal event. Possible dependent censoring complicates statistical analysis. We consider regression analysis based on a non-terminal event, say disease progression, which is subject to censoring by death. The methodology proposed is developed for discrete covariates under two types of assumption. First, separate copula models are assumed for each covariate group and then a flexible regression model is imposed on the progression time which is of major interest. Model checking procedures are also proposed to help to choose a best-fitted model. Under a two-sample setting, Lin and co-workers proposed a competing method which requires an additional marginal assumption on the terminal event and implicitly assumes that the dependence structures in the two groups are the same. Using simulations, we compare the two approaches on the basis of their finite sample performances and robustness properties under model misspecification. The method proposed is applied to a bone marrow transplant data set. [source]


Choice of parametric models in survival analysis: applications to monotherapy for epilepsy and cerebral palsy

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), Issue 2 2003
G. P. S. Kwong
Summary. In the analysis of medical survival data, semiparametric proportional hazards models are widely used. When the proportional hazards assumption is not tenable, these models will not be suitable. Other models for covariate effects can be useful. In particular, we consider accelerated life models, in which the effect of covariates is to scale the quantiles of the base-line distribution. Solomon and Hutton have suggested that there is some robustness to misspecification of survival regression models. They showed that the relative importance of covariates is preserved under misspecification with assumptions of small coefficients and orthogonal transformation of covariates. We elucidate these results by applications to data from five trials which compare two common anti-epileptic drugs (carbamazepine versus sodium valporate monotherapy for epilepsy) and to survival of a cohort of people with cerebral palsy. Results on the robustness against model misspecification depend on the assumptions of small coefficients and on the underlying distribution of the data. These results hold in cerebral palsy but do not hold in epilepsy data which have early high hazard rates. The orthogonality of coefficients is not important. However, the choice of model is important for an estimation of the magnitude of effects, particularly if the base-line shape parameter indicates high initial hazard rates. [source]


for misspecified regression models

THE CANADIAN JOURNAL OF STATISTICS, Issue 4 2003
Peilin Shi
Abstract The authors propose minimax robust designs for regression models whose response function is possibly misspecified. These designs, which minimize the maximum of the mean squared error matrix, can control the bias caused by model misspecification and provide the desired efficiency through one parameter. The authors call on a nonsmooth optimization technique to derive these designs analytically. Their results extend those of Heo, Schmuland & Wiens (2001). The authors also discuss several examples for approximately polynomial regression. Les auteurs proposent des plans minimax robustes pour des modèles de régression dont la fonction réponse pourrait ,tre mal spécifiée. Ces plans, qui minimisent le maximum de la matrice des erreurs quadratiques, permettent de contr,ler le biais d, à une mauvaise spécification du modèle tout en garantissant l'efficacité désirée au moyen d'un paramètre. Les auteurs se servent d'une technique d'optimisation non lisse pour préciser la forme analytique de ces plans. Leurs résultats généralisent ceux de Heo, Schmuland & Wiens (2001). Les auteurs présentent en outre plusieurs exemples touchant la régression approximativement polynomiale. [source]


Negative Market Volatility Risk Premium: Evidence from the LIFFE Equity Index Options,

ASIA-PACIFIC JOURNAL OF FINANCIAL STUDIES, Issue 5 2009
Bing-Huei Lin
Abstract We provide non-parametric empirical evidence regarding negative volatility risk premium using LIFFE equity index options. In addition, we incorporate the moment-adjusted option delta hedge ratio to mitigate the effect of model misspecification. From the results, we observe several interesting phenomena. First, the delta-hedged gains are negative. Second, with a correction for model misspecification, higher-order moments measures show less significance and the volatility risk premium still plays a key role in affecting delta-hedged gains. All empirical evidence supports the existence of negative volatility risk premium in LIFFE equity index options. [source]


Longitudinal Studies of Binary Response Data Following Case,Control and Stratified Case,Control Sampling: Design and Analysis

BIOMETRICS, Issue 2 2010
Jonathan S. Schildcrout
Summary We discuss design and analysis of longitudinal studies after case,control sampling, wherein interest is in the relationship between a longitudinal binary response that is related to the sampling (case,control) variable, and a set of covariates. We propose a semiparametric modeling framework based on a marginal longitudinal binary response model and an ancillary model for subjects' case,control status. In this approach, the analyst must posit the population prevalence of being a case, which is then used to compute an offset term in the ancillary model. Parameter estimates from this model are used to compute offsets for the longitudinal response model. Examining the impact of population prevalence and ancillary model misspecification, we show that time-invariant covariate parameter estimates, other than the intercept, are reasonably robust, but intercept and time-varying covariate parameter estimates can be sensitive to such misspecification. We study design and analysis issues impacting study efficiency, namely: choice of sampling variable and the strength of its relationship to the response, sample stratification, choice of working covariance weighting, and degree of flexibility of the ancillary model. The research is motivated by a longitudinal study following case,control sampling of the time course of attention deficit hyperactivity disorder (ADHD) symptoms. [source]


A Global Sensitivity Test for Evaluating Statistical Hypotheses with Nonidentifiable Models

BIOMETRICS, Issue 2 2010
D. Todem
Summary We consider the problem of evaluating a statistical hypothesis when some model characteristics are nonidentifiable from observed data. Such a scenario is common in meta-analysis for assessing publication bias and in longitudinal studies for evaluating a covariate effect when dropouts are likely to be nonignorable. One possible approach to this problem is to fix a minimal set of sensitivity parameters conditional upon which hypothesized parameters are identifiable. Here, we extend this idea and show how to evaluate the hypothesis of interest using an infimum statistic over the whole support of the sensitivity parameter. We characterize the limiting distribution of the statistic as a process in the sensitivity parameter, which involves a careful theoretical analysis of its behavior under model misspecification. In practice, we suggest a nonparametric bootstrap procedure to implement this infimum test as well as to construct confidence bands for simultaneous pointwise tests across all values of the sensitivity parameter, adjusting for multiple testing. The methodology's practical utility is illustrated in an analysis of a longitudinal psychiatric study. [source]


On Latent-Variable Model Misspecification in Structural Measurement Error Models for Binary Response

BIOMETRICS, Issue 3 2009
Xianzheng Huang
Summary We consider structural measurement error models for a binary response. We show that likelihood-based estimators obtained from fitting structural measurement error models with pooled binary responses can be far more robust to covariate measurement error in the presence of latent-variable model misspecification than the corresponding estimators from individual responses. Furthermore, despite the loss in information, pooling can provide improved parameter estimators in terms of mean-squared error. Based on these and other findings, we create a new diagnostic method to detect latent-variable model misspecification in structural measurement error models with individual binary response. We use simulation and data from the Framingham Heart Study to illustrate our methods. [source]


Latent-Model Robustness in Joint Models for a Primary Endpoint and a Longitudinal Process

BIOMETRICS, Issue 3 2009
Xianzheng Huang
Summary Joint modeling of a primary response and a longitudinal process via shared random effects is widely used in many areas of application. Likelihood-based inference on joint models requires model specification of the random effects. Inappropriate model specification of random effects can compromise inference. We present methods to diagnose random effect model misspecification of the type that leads to biased inference on joint models. The methods are illustrated via application to simulated data, and by application to data from a study of bone mineral density in perimenopausal women and data from an HIV clinical trial. [source]


Diagnosis of Random-Effect Model Misspecification in Generalized Linear Mixed Models for Binary Response

BIOMETRICS, Issue 2 2009
Xianzheng Huang
Summary Generalized linear mixed models (GLMMs) are widely used in the analysis of clustered data. However, the validity of likelihood-based inference in such analyses can be greatly affected by the assumed model for the random effects. We propose a diagnostic method for random-effect model misspecification in GLMMs for clustered binary response. We provide a theoretical justification of the proposed method and investigate its finite sample performance via simulation. The proposed method is applied to data from a longitudinal respiratory infection study. [source]


Resampling-Based Multiple Testing Methods with Covariate Adjustment: Application to Investigation of Antiretroviral Drug Susceptibility

BIOMETRICS, Issue 2 2008
Yang Yang
Summary Identifying genetic mutations that cause clinical resistance to antiretroviral drugs requires adjustment for potential confounders, such as the number of active drugs in a HIV-infected patient's regimen other than the one of interest. Motivated by this problem, we investigated resampling-based methods to test equal mean response across multiple groups defined by HIV genotype, after adjustment for covariates. We consider construction of test statistics and their null distributions under two types of model: parametric and semiparametric. The covariate function is explicitly specified in the parametric but not in the semiparametric approach. The parametric approach is more precise when models are correctly specified, but suffer from bias when they are not; the semiparametric approach is more robust to model misspecification, but may be less efficient. To help preserve type I error while also improving power in both approaches, we propose resampling approaches based on matching of observations with similar covariate values. Matching reduces the impact of model misspecification as well as imprecision in estimation. These methods are evaluated via simulation studies and applied to a data set that combines results from a variety of clinical studies of salvage regimens. Our focus is on relating HIV genotype to viral susceptibility to abacavir after adjustment for the number of active antiretroviral drugs (excluding abacavir) in the patient's regimen. [source]


Marginalized Models for Moderate to Long Series of Longitudinal Binary Response Data

BIOMETRICS, Issue 2 2007
Jonathan S. Schildcrout
Summary Marginalized models (Heagerty, 1999, Biometrics55, 688,698) permit likelihood-based inference when interest lies in marginal regression models for longitudinal binary response data. Two such models are the marginalized transition and marginalized latent variable models. The former captures within-subject serial dependence among repeated measurements with transition model terms while the latter assumes exchangeable or nondiminishing response dependence using random intercepts. In this article, we extend the class of marginalized models by proposing a single unifying model that describes both serial and long-range dependence. This model will be particularly useful in longitudinal analyses with a moderate to large number of repeated measurements per subject, where both serial and exchangeable forms of response correlation can be identified. We describe maximum likelihood and Bayesian approaches toward parameter estimation and inference, and we study the large sample operating characteristics under two types of dependence model misspecification. Data from the Madras Longitudinal Schizophrenia Study (Thara et al., 1994, Acta Psychiatrica Scandinavica90, 329,336) are analyzed. [source]


Model-Checking Techniques Based on Cumulative Residuals

BIOMETRICS, Issue 1 2002
D. Y. Lin
Summary. Residuals have long been used for graphical and numerical examinations of the adequacy of regression models. Conventional residual analysis based on the plots of raw residuals or their smoothed curves is highly subjective, whereas most numerical goodness-of-fit tests provide little information about the nature of model misspecification. In this paper, we develop objective and informative model-checking techniques by taking the cumulative sums of residuals over certain coordinates (e.g., covariates or fitted values) or by considering some related aggregates of residuals, such as moving sums and moving averages. For a variety of statistical models and data structures, including generalized linear models with independent or dependent observations, the distributions of these stochastic processes under the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be easily generated by computer simulation. Each observed process can then be compared, both graphically and numerically, with a number of realizations from the Gaussian process. Such comparisons enable one to assess objectively whether a trend seen in a residual plot reflects model misspecification or natural variation. The proposed techniques are particularly useful in checking the functional form of a covariate and the link function. Illustrations with several medical studies are provided. [source]