Home About us Contact

Regression Parameters (regression + parameter)

Distribution by Scientific Domains

Mathematics and Statistics	51%
Medical Sciences	11%
Chemistry	11%
Humanities and Social Sciences	7%
Business, Economics, Finance and Accounting	7%

Selected Abstracts

Regression modelling of correlated data in ecology: subject-specific and population averaged response patterns

JOURNAL OF APPLIED ECOLOGY, Issue 5 2009
John Fieberg
Summary 1.,Statistical methods that assume independence among observations result in optimistic estimates of uncertainty when applied to correlated data, which are ubiquitous in applied ecological research. Mixed effects models offer a potential solution and rely on the assumption that latent or unobserved characteristics of individuals (i.e. random effects) induce correlation among repeated measurements. However, careful consideration must be given to the interpretation of parameters when using a nonlinear link function (e.g. logit). Mixed model regression parameters reflect the change in the expected response within an individual associated with a change in that individual's covariates [i.e. a subject-specific (SS) interpretation], which may not address a relevant scientific question. In particular, a SS interpretation is not natural for covariates that do not vary within individuals (e.g. gender). 2.,An alternative approach combines the solution to an unbiased estimating equation with robust measures of uncertainty to make inferences regarding predictor,outcome relationships. Regression parameters describe changes in the average response among groups of individuals differing in their covariates [i.e. a population-averaged (PA) interpretation]. 3.,We compare these two approaches [mixed models and generalized estimating equations (GEE)] with illustrative examples from a 3-year study of mallard (Anas platyrhynchos) nest structures. We observe that PA and SS responses differ when modelling binary data, with PA parameters behaving like attenuated versions of SS parameters. Differences between SS and PA parameters increase with the size of among-subject heterogeneity captured by the random effects variance component. Lastly, we illustrate how PA inferences can be derived (post hoc) from fitted generalized and nonlinear-mixed models. 4.,Synthesis and applications. Mixed effects models and GEE offer two viable approaches to modelling correlated data. The preferred method should depend primarily on the research question (i.e. desired parameter interpretation), although operating characteristics of the associated estimation procedures should also be considered. Many applied questions in ecology, wildlife management and conservation biology (including the current illustrative examples) focus on population performance measures (e.g. mean survival or nest success rates) as a function of general landscape features, for which the PA model interpretation, not the more commonly used SS model interpretation may be more natural. [source]

Regression residuals and test statistics: Assessing naive outlier deletion

THE CANADIAN JOURNAL OF STATISTICS, Issue 2 2000
Debbie J. Dupuis
Abstract The authors derive the joint distributions of a studentized deleted residual and various regression quantities, calculated with all the data or with one case deleted. They show that the correlation between the studentized deleted residual and the deleted test statistic has an interesting interpretation in terms of well-known regression quantities. These results allow them to examine the effect of applying some naive outlier deletion methods before making inferences about a regression parameter. Les auteurs déterminent la loi conjointe d'un résidu studentisé retranché et de certaines variables classiques en régression, calculées avec ou sans l'observation retranchée. IIs montrent que la corrélation entre le résidu studentisé retranché et la statistique de test obtenue sans cette observation possède une interprétation intéressante en fonction de statistiques bien connues. Ces résultats leur permettent d'étudier comment l'application de certaines techniques naïves de détection de valeurs aberrantes peut affecter l'inférence concernant un paramètre de régression. [source]

A multivariate logistic regression equation to screen for dysglycaemia: development and validation

DIABETIC MEDICINE, Issue 5 2005
B. P. Tabaei
Abstract Aims To develop and validate an empirical equation to screen for dysglycaemia [impaired fasting glucose (IFG), impaired glucose tolerance (IGT) and undiagnosed diabetes]. Methods A predictive equation was developed using multiple logistic regression analysis and data collected from 1032 Egyptian subjects with no history of diabetes. The equation incorporated age, sex, body mass index (BMI), post-prandial time (self-reported number of hours since last food or drink other than water), systolic blood pressure, high-density lipoprotein (HDL) cholesterol and random capillary plasma glucose as independent covariates for prediction of dysglycaemia based on fasting plasma glucose (FPG) , 6.1 mmol/l and/or plasma glucose 2 h after a 75-g oral glucose load (2-h PG) , 7.8 mmol/l. The equation was validated using a cross-validation procedure. Its performance was also compared with static plasma glucose cut-points for dysglycaemia screening. Results The predictive equation was calculated with the following logistic regression parameters: P = 1 + 1/(1 + e,X) = where X = ,8.3390 + 0.0214 (age in years) + 0.6764 (if female) + 0.0335 (BMI in kg/m2) + 0.0934 (post-prandial time in hours) + 0.0141 (systolic blood pressure in mmHg) , 0.0110 (HDL in mmol/l) + 0.0243 (random capillary plasma glucose in mmol/l). The cut-point for the prediction of dysglycaemia was defined as a probability , 0.38. The equation's sensitivity was 55%, specificity 90% and positive predictive value (PPV) 65%. When applied to a new sample, the equation's sensitivity was 53%, specificity 89% and PPV 63%. Conclusions This multivariate logistic equation improves on currently recommended methods of screening for dysglycaemia and can be easily implemented in a clinical setting using readily available clinical and non-fasting laboratory data and an inexpensive hand-held programmable calculator. [source]

Sample Splitting and Threshold Estimation

ECONOMETRICA, Issue 3 2000
Bruce E. Hansen
Threshold models have a wide variety of applications in economics. Direct applications include models of separating and multiple equilibria. Other applications include empirical sample splitting when the sample split is based on a continuously-distributed variable such as firm size. In addition, threshold models may be used as a parsimonious strategy for nonparametric function estimation. For example, the threshold autoregressive model (TAR) is popular in the nonlinear time series literature. Threshold models also emerge as special cases of more complex statistical frameworks, such as mixture models, switching models, Markov switching models, and smooth transition threshold models. It may be important to understand the statistical properties of threshold models as a preliminary step in the development of statistical tools to handle these more complicated structures. Despite the large number of potential applications, the statistical theory of threshold estimation is undeveloped. It is known that threshold estimates are super-consistent, but a distribution theory useful for testing and inference has yet to be provided. This paper develops a statistical theory for threshold estimation in the regression context. We allow for either cross-section or time series observations. Least squares estimation of the regression parameters is considered. An asymptotic distribution theory for the regression estimates (the threshold and the regression slopes) is developed. It is found that the distribution of the threshold estimate is nonstandard. A method to construct asymptotic confidence intervals is developed by inverting the likelihood ratio statistic. It is shown that this yields asymptotically conservative confidence regions. Monte Carlo simulations are presented to assess the accuracy of the asymptotic approximations. The empirical relevance of the theory is illustrated through an application to the multiple equilibria growth model of Durlauf and Johnson (1995). [source]

Assessment of short-term association between health outcomes and ozone concentrations using a Markov regression model

ENVIRONMETRICS, Issue 3 2003
Abdelkrim Zeghnoun
Abstract Longitudinal binary data are often used in panel studies where short-term associations between air pollutants and respiratory health outcomes are investigated. A Markov regression model in which the transition probabilities depend on the covariates, as well as the past responses, was used to study the short-term association between daily ozone (O3) concentrations and respiratory health outcomes in a panel of schoolchildren in Armentières, Northern France. The results suggest that there was a small but statistically significant association between O3 and children's cough episodes. A 10,,g/m3 increase in O3 concentrations was associated with a 13.9,% increase in cough symptoms (CI,95%,=,1.2,28.1%). The use of a Markov regression model can be useful as it permits one to address easily both the regression objective and the stochastic dependence between successive observations. However, it is important to verify the sensitivity of the Markov regression parameters to the time-dependence structure. In this study, it was found that, although what happened on the previous day was a strong predictor of what happened on the current day, this did not contradict the O3 -respiratory symptom associations. Compared to the Markov regression model, the signs of the parameter estimates of marginal and random-intercept models remain the same. The magnitudes of the O3 effects were also essentially the same in the three models, whose confidence intervals overlapped. Copyright © 2003 John Wiley & Sons, Ltd. [source]

Confidence intervals for the calibration estimator with environmental applications

ENVIRONMETRICS, Issue 1 2002
I. Müller
Abstract The article investigates different estimation techniques in the simple linear controlled calibration model and provides different types of confidence limits for the calibration estimator. In particular, M-estimation and bootstrapping techniques are implemented to obtain estimates of regression parameters during the training stage. Moreover, bootstrap is used to construct several types of confidence intervals that are compared to the classical approach based on the assumption of normality. For some of these intervals, the second order asymptotic properties can be established by means of Edgeworth expansions. Two data sets,one on space debris and the other on bacteriological counts in water samples,are used to illustrate the method's environmental applications. Copyright © 2002 John Wiley & Sons, Ltd. [source]

Regression modelling of correlated data in ecology: subject-specific and population averaged response patterns

Description of growth by simple versus complex models for Baltic Sea spring spawning herring

JOURNAL OF APPLIED ICHTHYOLOGY, Issue 1 2001
J. Gröger
The objective was to find a length,growth model to help differentiate between herring stocks (Clupea harengus l.) when their length,growth shows systematically different patterns. The most essential model restriction was that it should react robustly against variations in the underlying age range which varies not only over time but also between the different herring stocks. Because of the limited age range, significance tests as well as confidence intervals of the model parameters should allow a small sample restriction. Thus, parameter estimation should be of an analytical rather than asymptotic nature and the model should contain a minimum set of parameters. The article studies the comparative characteristics of a simple non-asymptotic two-parameter growth model (allometric length,growth function, abbreviated as ALG model) in contrast to higher parametric and more complex growth models (logistic and von-Bertalanffy growth functions, abbreviated as LGF and VBG models). An advantage of the ALG model is that it can be easily linearized and the growth coefficients can be directly derived as regression parameters. The intrinsic ALG model linearity makes it easy to test restrictions (normality, homoscedasticity and serial uncorrelation of the error term) and to formulate analytic confidence intervals. The ALG model features were exemplified and validated by a 1995 Baltic spring spawning herring (BSSH) data set that included a 12-year age range. The model performance was compared with that of the logistic and the von-Bertalanffy length,growth curves for different age ranges and by means of various parameter estimation techniques. In all cases the ALG model performed better and all ALG model restrictions (no autocorrelation, homoscedasticity, and normality of the error term) were fulfilled. Furthermore, all findings seemed to indicate a pseudo-asymptotic growth for BSSH. The proposed model was explicitly derived for of herring length-growth; the results thus should not be generalized interspecifically without additional proof. [source]

Species richness, rarity and endemicity on Italian offshore islands: complementary signals from island-focused and species-focused analyses

JOURNAL OF BIOGEOGRAPHY, Issue 4 2008
Leonardo Dapporto
Abstract Aims, To investigate the relative explanatory power of source faunas and geographical variables for butterfly incidence, frequency, richness, rarity, and endemicity on offshore islands. Location, The western Italian offshore islands (Italy and Malta). Methods, Thirty-one islands were examined. Data were taken from our own field surveys and from the literature. Two approaches were undertaken, described as island-focused and species-focused, respectively. Offshore islands were allocated to their neighbouring source landmasses (Italian Peninsula, Sicily and Sardinia,Corsica) and compared with each other for faunal attributes, source and island geography. Generalized linear and stepwise multiple regression models were then used to determine the relationships of island species richness, rarity and endemicity with potential geographical predictors and source richness, rarity, and endemicity (island-focused). Species frequency and incidence were assessed in relation to geographical and source predictors using stepwise linear and logistic regression, and inter-island associations were examined using K-Means clustering and non-metric scaling (species-focused). Results, The analysis reveals firm evidence for the influence of the nearest large landmass sources on island species assemblages, richness, rarity and endemicity. A clear distinction in faunal affinities occurs between the Sardinian islands and islands lying offshore from the Italian mainland and Sicily. Islands neighbouring these three distinct sources differ significantly in richness, rarity and endemicity. Source richness, rarity, and endemicity have explanatory power for island richness, rarity, and endemicity, respectively, and together with island geography account for a substantial part of the variation in island faunas (richness 59%, rarity 60% and endemicity 64%). Source dominates the logistic regression parameters predicting the incidence of island species [13 (38%) of 34 species that could be analysed]; three ecological factors (source frequency, flight period and maximal altitude at which species live) explained 75% of the variation in the occurrence of species on the islands. Species found more frequently on islands occurred more frequently at sources, had longer flight periods, and occurred at lower altitudes at the sources. The incidence of most species on islands (84%) is correctly predicted by the same three variables. Main conclusions, The Italian region of the Mediterranean Sea has a rich butterfly fauna comprising endemics and rare species as well as more cosmopolitan species. Analysis of island records benefited from the use of two distinct approaches, namely island-focused and species-focused, that sift distinct elements in island and source faunas. Clear contemporary signals appear in island,source relationships as well as historical signals. Differences among faunas relating to sources within the same region caution against assuming that contemporary (ecological) and historical (evolutionary) influences affect faunas of islands in different parts of the same region to the same extent. The implications of source,island relationships for the conservation of butterflies within the Italian region are considered, particularly for the long-term persistence of species. [source]

A robust PCR method for high-dimensional regressors

JOURNAL OF CHEMOMETRICS, Issue 8-9 2003
Mia Hubert
Abstract We consider the multivariate calibration model which assumes that the concentrations of several constituents of a sample are linearly related to its spectrum. Principal component regression (PCR) is widely used for the estimation of the regression parameters in this model. In the classical approach it combines principal component analysis (PCA) on the regressors with least squares regression. However, both stages yield very unreliable results when the data set contains outlying observations. We present a robust PCR (RPCR) method which also consists of two parts. First we apply a robust PCA method for high-dimensional data on the regressors, then we regress the response variables on the scores using a robust regression method. A robust RMSECV value and a robust R2 value are proposed as exploratory tools to select the number of principal components. The prediction error is also estimated in a robust way. Moreover, we introduce several diagnostic plots which are helpful to visualize and classify the outliers. The robustness of RPCR is demonstrated through simulations and the analysis of a real data set. Copyright © 2003 John Wiley & Sons, Ltd. [source]

SN2 reaction of a sulfonate ester in the presence of alkyltriphenylphosphonium bromides and mixed cationic-cationic systems

JOURNAL OF PHYSICAL ORGANIC CHEMISTRY, Issue 5 2006
Michael M. Mohareb
Abstract The effects of alkyltriphenylphosphonium bromides (CnTPB, n,=,10, 12, 14, 16) on the rates of SN2 reactions of methyl 4-nitrobenzenesulfonate and bromide ion have been studied. Observed first-order rate constants are significantly higher than those found for other cationic surfactants for the same reaction. The results have been analyzed by the pseudophase model of micellar kinetics and show true micellar catalysis in the sense that second-order micellar rate constants are higher than the second-order rate constants in water. An attempt has also been made to investigate mixed cationic,cationic surfactant systems with respect to observed rates and pseudophase regression parameters. In addition, modeling of some cationic head groups has illustrated possible differences in head group charges and counterion interactions that may prove kinetically relevant. Copyright © 2006 John Wiley & Sons, Ltd. [source]

A unified approach to regression analysis under double-sampling designs

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 3 2000
Yi-Hau Chen
We propose a unified approach to the estimation of regression parameters under double-sampling designs, in which a primary sample consisting of data on the rough or proxy measures for the response and/or explanatory variables as well as a validation subsample consisting of data on the exact measurements are available. We assume that the validation sample is a simple random subsample from the primary sample. Our proposal utilizes a specific parametric model to extract the partial information contained in the primary sample. The resulting estimator is consistent even if such a model is misspecified, and it achieves higher asymptotic efficiency than the estimator based only on the validation data. Specific cases are discussed to illustrate the application of the estimator proposed. [source]

QRS Amplitude and Shape Variability in Magnetocardiograms

PACING AND CLINICAL ELECTROPHYSIOLOGY, Issue 2 2000
MARKUS HUCK
In magnetocardiography, averaging of QRS complexes is often used to improve the signal-to-noise ratio. However, averaging of QRS complexes ignores the variation in amplitude and shape of the signals caused, for example, by respiration. This may lead to suppression of signal portions within the QRS complexes. Furthermore, for inverse source, reconstructions of dipoles and of current density distributions errors in the special arrangement may occur. To overcome these problems we developed a method for separating and selective averaging QRS complexes with different shapes and amplitudes. The method is based on a spline interpolation of the QRS complex averaged by a standard procedure. This spline function then is fitted to each QRS complex in the raw data by means of nonlinear regression (Levenberg-Marquardt method). Five regression parameters are applied: a linear amplitude scaling, two parameters describing the baseline drift, a time scaling parameter, and a time shift parameter. We found that both amplitude and shape of the QRS complex are influenced by respiration. The baseline shows a weaker influence of the respiration. The regression parameters of two neighboring measurement channels correlate linearly. Thus, selective averaging of a larger number of sensors can be performed simultaneously. [source]

Assessment of Different Sperm Quality Parameters to Predict in vitro Fertility of Bulls

REPRODUCTION IN DOMESTIC ANIMALS, Issue 3 2002
S Tanghe
Contents Frozen-thawed semen from six bulls with high (> 60%) and low (20,35%) in vitro fertility was used for studying the predictive value of simple sperm quality tests with respect to in vitro fertilization (IVF) outcome as assessed by pronucleus (PN) formation ability. Sperm quality parameters, such as sperm concentration, motility, progressive motility, live-dead sperm ratio, morphology, membrane integrity, mitochondrial activity and acrosomal status were analysed using both conventional and automatic techniques at three time points during the IVF process, namely after sperm thawing, Percoll differential gradient centrifugation and IVF. Associations between the sperm quality parameters before and after IVF, and PN formation ability were assessed by using linear regression analyses. The percentages of motility, progressive motility and normal morphology determined after sperm thawing, and the percentage of live spermatozoa assessed after Percoll preparation by using nigrosin-eosin (N-E) staining showed a good correlation with PN formation ability, but the regression parameters were borderline not significant. These parameters formed the most reliable basis for predicting IVF outcome. After IVF, the percentage of live spermatozoa determined by using N-E staining was the only sperm quality parameter showing a significant association with the PN formation ability of a given bull. This sperm quality test can be used as a non-invasive method to estimate the PN formation ability of oocytes which are further cultured to assess embryonic development. [source]

Rough Terrain: Spatial Variation in Campaign Contributing and Volunteerism

AMERICAN JOURNAL OF POLITICAL SCIENCE, Issue 1 2010
Wendy K. Tam Cho
We examine spatial patterns of mass political participation in the form of volunteering and donating to a major statewide election campaign. While these forms of participation are predictably associated with the political and socioeconomic characteristics of the precincts in which the participants reside, we find that these statistical relationships are spatially nonstationary. High-income neighborhoods, for example, are associated with stronger effects on participation at some locations more than at others. By using geographically weighted regression (GWR) to specify local regression parameters, we are able to capture the heterogeneity of contextual processes that generate the geographically uneven flow of volunteers and contributors into a political campaign. Since spatial nonstationarity may well be a rule rather than an exception in the study of many political phenomena, social scientific analyses should be mindful that relationships may vary by location. [source]

Estimation of regression parameters in missing data problems

THE CANADIAN JOURNAL OF STATISTICS, Issue 2 2006
Donald L. Mcleish
Abstract Let Y be a response variable, possibly multivariate, with a density function f (y|x, v; ,) conditional on vectors x and v of covariates and a vector , of unknown parameters. The authors consider the problem of estimating , when the values taken by the covariate vector v are available for all observations while some of those taken by the covariate x are missing at random. They compare the profile estimator to several alternatives, both in terms of bias and standard deviation, when the response and covariates are discrete or continuous. Estimation des paramètres de régression en I'absence de certaines données Soit Y une variable réponse uni- ou multi-dimensionnelle et soit f(y|x, v; ,) sa densité étant donné des vecteurs x et v de covariables et un vecteur , de paramètres inconnus. Les auteurs s'intéressent à l'estimation de , lorsque la valeur de v est disponible pour toutes les observations, mais que certaines valeurs de x sont manquantes au hasard. Us comparent l'estimateur profil à diverses autres solutions, tant en terme de biais que d'écart-type, selon que la variable réponse et les covariables sont discrètes ou continues. [source]

Change-point monitoring in linear models

THE ECONOMETRICS JOURNAL, Issue 3 2006
Alexander Aue
Summary, We consider a linear regression model with errors modelled by martingale difference sequences, which include heteroskedastic augmented GARCH processes. We develop asymptotic theory for two monitoring schemes aimed at detecting a change in the regression parameters. The first method is based on the CUSUM of the residuals and was studied earlier in the context of independent identically distributed errors. The second method is new and is based on the squares of prediction errors. Both methods use a training sample of size m. We show that, as m,,, both methods have correct asymptotic size and detect a change with probability approaching unity. The methods are illustrated and compared in a small simulation study. [source]

COVARIATE-ADJUSTED REGRESSION FOR LONGITUDINAL DATA INCORPORATING CORRELATION BETWEEN REPEATED MEASUREMENTS

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 3 2009
Danh V. Nguyen
Summary We propose an estimation method that incorporates the correlation/covariance structure between repeated measurements in covariate-adjusted regression models for distorted longitudinal data. In this distorted data setting, neither the longitudinal response nor (possibly time-varying) predictors are directly observable. The unobserved response and predictors are assumed to be distorted/contaminated by unknown functions of a common observable confounder. The proposed estimation methodology adjusts for the distortion effects both in estimation of the covariance structure and in the regression parameters using generalized least squares. The finite-sample performance of the proposed estimators is studied numerically by means of simulations. The consistency and convergence rates of the proposed estimators are also established. The proposed method is illustrated with an application to data from a longitudinal study of cognitive and social development in children. [source]

ROBUST ESTIMATION IN PARAMETRIC TIME SERIES MODELS UNDER LONG- AND SHORT-RANGE-DEPENDENT STRUCTURES

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2009
Jiti Gao
Summary This paper studies the asymptotic behaviour of an M-estimator of regression parameters in the linear model when the design variables are either stationary short-range dependent (SRD), ,-mixing or long-range dependent (LRD), and the errors are LRD. The weak consistency and the asymptotic distributions of the M-estimator are established. We present some simulated examples to illustrate the efficiency of the proposed M-estimation method. [source]

Grapevine dormant pruning weight prediction using remotely sensed data

AUSTRALIAN JOURNAL OF GRAPE AND WINE RESEARCH, Issue 3 2003
S.Z. DOBROWSKI
Abstract Aerial image analysis was utilised to predict dormant pruning weights between two growing seasons. We utilised an existing in-row spacing trial in order to examine the relationship between dormant pruning weights and remotely sensed data. The experimental vineyard had a constant between-row spacing (2.44 m) and five different in-row spacings (0.91, 1.52, 2.13, 2.74 and 3.35 m) resulting in spatial variation in canopy volume and dormant pruning weights (kg/metre of row). It was shown that the ratio vegetation index (NIR/R) was linearly correlated with field-wide measurements of pruning weight density (dormant pruning weight per metre of canopy) for both the 1998 and 1999 growing seasons (r2= 0.68 and 0.88, respectively). Additionally, it was shown that the regression parameters remained consistent between the two growing seasons allowing for an inter-annual comparison such that the vegetation index vs canopy parameter relationship determined for the 1998 growing season was used to predict field-wide pruning weight densities in the 1999 growing season prior to harvest. [source]

Cox Regression in Nested Case,Control Studies with Auxiliary Covariates

BIOMETRICS, Issue 2 2010
Mengling Liu
Summary Nested case,control (NCC) design is a popular sampling method in large epidemiological studies for its cost effectiveness to investigate the temporal relationship of diseases with environmental exposures or biological precursors. Thomas' maximum partial likelihood estimator is commonly used to estimate the regression parameters in Cox's model for NCC data. In this article, we consider a situation in which failure/censoring information and some crude covariates are available for the entire cohort in addition to NCC data and propose an improved estimator that is asymptotically more efficient than Thomas' estimator. We adopt a projection approach that, heretofore, has only been employed in situations of random validation sampling and show that it can be well adapted to NCC designs where the sampling scheme is a dynamic process and is not independent for controls. Under certain conditions, consistency and asymptotic normality of the proposed estimator are established and a consistent variance estimator is also developed. Furthermore, a simplified approximate estimator is proposed when the disease is rare. Extensive simulations are conducted to evaluate the finite sample performance of our proposed estimators and to compare the efficiency with Thomas' estimator and other competing estimators. Moreover, sensitivity analyses are conducted to demonstrate the behavior of the proposed estimator when model assumptions are violated, and we find that the biases are reasonably small in realistic situations. We further demonstrate the proposed method with data from studies on Wilms' tumor. [source]

Semiparametric Regression in Size-Biased Sampling

BIOMETRICS, Issue 1 2010
Ying Qing Chen
Summary Size-biased sampling arises when a positive-valued outcome variable is sampled with selection probability proportional to its size. In this article, we propose a semiparametric linear regression model to analyze size-biased outcomes. In our proposed model, the regression parameters of covariates are of major interest, while the distribution of random errors is unspecified. Under the proposed model, we discover that regression parameters are invariant regardless of size-biased sampling. Following this invariance property, we develop a simple estimation procedure for inferences. Our proposed methods are evaluated in simulation studies and applied to two real data analyses. [source]

Marginal Hazards Regression for Retrospective Studies within Cohort with Possibly Correlated Failure Time Data

BIOMETRICS, Issue 2 2009
Sangwook Kang
Summary A retrospective dental study was conducted to evaluate the degree to which pulpal involvement affects tooth survival. Due to the clustering of teeth, the survival times within each subject could be correlated and thus the conventional method for the case,control studies cannot be directly applied. In this article, we propose a marginal model approach for this type of correlated case,control within cohort data. Weighted estimating equations are proposed for the estimation of the regression parameters. Different types of weights are also considered for improving the efficiency. Asymptotic properties of the proposed estimators are investigated and their finite sample properties are assessed via simulations studies. The proposed method is applied to the aforementioned dental study. [source]

Median Regression Models for Longitudinal Data with Dropouts

BIOMETRICS, Issue 2 2009
Grace Y. Yi
Summary Recently, median regression models have received increasing attention. When continuous responses follow a distribution that is quite different from a normal distribution, usual mean regression models may fail to produce efficient estimators whereas median regression models may perform satisfactorily. In this article, we discuss using median regression models to deal with longitudinal data with dropouts. Weighted estimating equations are proposed to estimate the median regression parameters for incomplete longitudinal data, where the weights are determined by modeling the dropout process. Consistency and the asymptotic distribution of the resultant estimators are established. The proposed method is used to analyze a longitudinal data set arising from a controlled trial of HIV disease (Volberding et al., 1990, The New England Journal of Medicine322, 941,949). Simulation studies are conducted to assess the performance of the proposed method under various situations. An extension to estimation of the association parameters is outlined. [source]

Polynomial Spline Estimation and Inference of Proportional Hazards Regression Models with Flexible Relative Risk Form

BIOMETRICS, Issue 3 2006
Jianhua Z. Huang
Summary The Cox proportional hazards model usually assumes an exponential form for the dependence of the hazard function on covariate variables. However, in practice this assumption may be violated and other relative risk forms may be more appropriate. In this article, we consider the proportional hazards model with an unknown relative risk form. Issues in model interpretation are addressed. We propose a method to estimate the relative risk form and the regression parameters simultaneously by first approximating the logarithm of the relative risk form by a spline, and then employing the maximum partial likelihood estimation. An iterative alternating optimization procedure is developed for efficient implementation. Statistical inference of the regression coefficients and of the relative risk form based on parametric asymptotic theory is discussed. The proposed methods are illustrated using simulation and an application to the Veteran's Administration lung cancer data. [source]

Bayesian Analysis for Generalized Linear Models with Nonignorably Missing Covariates

BIOMETRICS, Issue 3 2005
Lan Huang
Summary We propose Bayesian methods for estimating parameters in generalized linear models (GLMs) with nonignorably missing covariate data. We show that when improper uniform priors are used for the regression coefficients, ,, of the multinomial selection model for the missing data mechanism, the resulting joint posterior will always be improper if (i) all missing covariates are discrete and an intercept is included in the selection model for the missing data mechanism, or (ii) at least one of the covariates is continuous and unbounded. This impropriety will result regardless of whether proper or improper priors are specified for the regression parameters, ,, of the GLM or the parameters, ,, of the covariate distribution. To overcome this problem, we propose a novel class of proper priors for the regression coefficients, ,, in the selection model for the missing data mechanism. These priors are robust and computationally attractive in the sense that inferences about , are not sensitive to the choice of the hyperparameters of the prior for , and they facilitate a Gibbs sampling scheme that leads to accelerated convergence. In addition, we extend the model assessment criterion of Chen, Dey, and Ibrahim (2004a, Biometrika91, 45,63), called the weighted L measure, to GLMs and missing data problems as well as extend the deviance information criterion (DIC) of Spiegelhalter et al. (2002, Journal of the Royal Statistical Society B64, 583,639) for assessing whether the missing data mechanism is ignorable or nonignorable. A novel Markov chain Monte Carlo sampling algorithm is also developed for carrying out posterior computation. Several simulations are given to investigate the performance of the proposed Bayesian criteria as well as the sensitivity of the prior specification. Real datasets from a melanoma cancer clinical trial and a liver cancer study are presented to further illustrate the proposed methods. [source]

Regression Analysis of Doubly Censored Failure Time Data Using the Additive Hazards Model

BIOMETRICS, Issue 3 2004
Liuquan Sun
Summary Doubly censored failure time data arise when the survival time of interest is the elapsed time between two related events and observations on occurrences of both events could be censored. Regression analysis of doubly censored data has recently attracted considerable attention and for this a few methods have been proposed (Kim et al., 1993, Biometrics49, 13,22; Sun et al., 1999, Biometrics55, 909,914; Pan, 2001, Biometrics57, 1245,1250). However, all of the methods are based on the proportional hazards model and it is well known that the proportional hazards model may not fit failure time data well sometimes. This article investigates regression analysis of such data using the additive hazards model and an estimating equation approach is proposed for inference about regression parameters of interest. The proposed method can be easily implemented and the properties of the proposed estimates of regression parameters are established. The method is applied to a set of doubly censored data from an AIDS cohort study. [source]