Model Selection Procedure (model + selection_procedure)

Distribution by Scientific Domains


Selected Abstracts


Analysis of multilocus models of association

GENETIC EPIDEMIOLOGY, Issue 1 2003
B. Devlin
Abstract It is increasingly recognized that multiple genetic variants, within the same or different genes, combine to affect liability for many common diseases. Indeed, the variants may interact among themselves and with environmental factors. Thus realistic genetic/statistical models can include an extremely large number of parameters, and it is by no means obvious how to find the variants contributing to liability. For models of multiple candidate genes and their interactions, we prove that statistical inference can be based on controlling the false discovery rate (FDR), which is defined as the expected number of false rejections divided by the number of rejections. Controlling the FDR automatically controls the overall error rate in the special case that all the null hypotheses are true. So do more standard methods such as Bonferroni correction. However, when some null hypotheses are false, the goals of Bonferroni and FDR differ, and FDR will have better power. Model selection procedures, such as forward stepwise regression, are often used to choose important predictors for complex models. By analysis of simulations of such models, we compare a computationally efficient form of forward stepwise regression against the FDR methods. We show that model selection includes numerous genetic variants having no impact on the trait, whereas FDR maintains a false-positive rate very close to the nominal rate. With good control over false positives and better power than Bonferroni, the FDR-based methods we introduce present a viable means of evaluating complex, multivariate genetic models. Naturally, as for any method seeking to explore complex genetic models, the power of the methods is limited by sample size and model complexity. Genet Epidemiol 25:36,47, 2003. © 2003 Wiley-Liss, Inc. [source]


Why do primary care doctors diagnose depression when diagnostic criteria are not met?

INTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH, Issue 3 2000
Michael Höfler
Abstract This study examines predictors of false positive depression diagnoses by primary care doctors in a sample of primary care attendees, taking the patients' diagnostic status from a self-report measure (Depression Screening Questionnaire, DSQ) as a yardstick against which to measure doctors' correct and false positive recognition rates. In a nationwide study, primary care patients aged 15,99 in 633 doctors' offices completed a self-report packet that included the DSQ, a questionnaire that assesses depression symptoms on a three-point scale to provide diagnoses of depression according to the criteria of DSM-IV and ICD-10. Doctors completed an evaluation form for each patient seen, reporting the patient's depression status, clinical severity, and treatment choices. Predictor analyses are based on 16,909 patient-doctor records. Covariates examined included depression symptoms, the total DSQ score, number and persistence of depression items endorsed, patient's prior treatment, history of depression, age and gender. According to the DSQ, 11.3% of patients received a diagnosis of ICD-10 depression, 58.9% of which were correctly identified by the doctor as definite threshold, and 26.2% as definite subthreshold cases. However, an additional 11.7% of patients not meeting the minimum DSQ threshold were rated by their doctors as definitely having depression (the false positive rate). Specific DSQ depression items endorsed, a higher DSQ total score, more two-week depression symptoms endorsed, female gender, higher age, and patient's prior treatment were all associated with an elevated rate of false positive diagnoses. The probability of false positive diagnoses was shown to be affected more by doctors ignoring the ,duration of symptoms' criterion than by doctors not following the ,number of symptoms' criterion for an ICD or DSM diagnosis of depression. A model selection procedure revealed that it is sufficient to regress the ,false positive diagnoses' on the DSQ-total score, symptoms of depressed mood, loss of interest, and suicidal ideation; higher age; and patient's prior treatment. Further, the total DSQ score was less important in prediction if there was a prior treatment. The predictive value of this model was quite good, with area under the ROC-curve = 0.86. When primary care doctors use depression screening instruments they are oversensitive to the diagnosis of depression. This is due to not strictly obeying the two weeks duration required by the diagnostic criteria of ICD-10 and DSM-IV. False positive rates are further increased in particular by the doctor's knowledge of a patient's prior treatment history as well as the presence of a few specific depression symptoms. Copyright © 2000 Whurr Publishers Ltd. [source]


Estimation of gonad volume, fecundity, and reproductive stage of shovelnose sturgeon using sonography and endoscopy with application to the endangered pallid sturgeon

JOURNAL OF APPLIED ICHTHYOLOGY, Issue 4 2007
By J. L. Bryan
Summary Most species of sturgeon are declining in the Mississippi River Basin of North America including pallid (Scaphirhynchus albus F. and R.) and shovelnose sturgeons (S. platorynchus R.). Understanding the reproductive cycle of sturgeon in the Mississippi River Basin is important in evaluating the status and viability of sturgeon populations. We used non-invasive, non-lethal methods for examining internal reproductive organs of shovelnose and pallid sturgeon. We used an ultrasound to measure egg diameter, fecundity, and gonad volume; endoscope was used to visually examine the gonad. We found the ultrasound to accurately measure the gonad volume, but it underestimated egg diameter by 52%. After correcting for the measurement error, the ultrasound accurately measured the gonad volume but it was higher than the true gonad volume for stages I and II. The ultrasound underestimated the fecundity of shovelnose sturgeon by 5%. The ultrasound fecundity was lower than the true fecundity for stage III and during August. Using the endoscope, we viewed seven different egg color categories. Using a model selection procedure, the presence of four egg categories correctly predicted the reproductive stage ± one reproductive stage of shovelnose sturgeon 95% of the time. For pallid sturgeon, the ultrasound overestimated the density of eggs by 49% and the endoscope was able to view eggs in 50% of the pallid sturgeon. Individually, the ultrasound and endoscope can be used to assess certain reproductive characteristics in sturgeon. The use of both methods at the same time can be complementary depending on the parameter measured. These methods can be used to track gonad characteristics, including measuring Gonadosomatic Index in individuals and/or populations through time, which can be very useful when associating gonad characteristics with environmental spawning triggers or with repeated examinations of individual fish throughout the reproductive cycle. [source]


Selection of Value-at-Risk models

JOURNAL OF FORECASTING, Issue 4 2003
Mandira Sarma
Abstract Value-at-Risk (VaR) is widely used as a tool for measuring the market risk of asset portfolios. However, alternative VaR implementations are known to yield fairly different VaR forecasts. Hence, every use of VaR requires choosing among alternative forecasting models. This paper undertakes two case studies in model selection, for the S&P 500 index and India's NSE-50 index, at the 95% and 99% levels. We employ a two-stage model selection procedure. In the first stage we test a class of models for statistical accuracy. If multiple models survive rejection with the tests, we perform a second stage filtering of the surviving models using subjective loss functions. This two-stage model selection procedure does prove to be useful in choosing a VaR model, while only incompletely addressing the problem. These case studies give us some evidence about the strengths and limitations of present knowledge on estimation and testing for VaR.,Copyright © 2003 John Wiley & Sons, Ltd. [source]


A NOVEL METHOD OF FITTING SPATIO-TEMPORAL MODELS TO DATA, WITH APPLICATIONS TO THE DYNAMICS OF MOUNTAIN PINE BEETLES

NATURAL RESOURCE MODELING, Issue 4 2008
JUSTIN HEAVILIN
Abstract We develop a modular landscape model for the mountain pine beetle (Dendroctonus ponderosae Hopkins) infestation of a stage-structured forest of lodgepole pine (Pinus contorta Douglas). Beetle attack dynamics are modeled using response functions and beetle movement using dispersal kernels. This modeling technique yields four model candidates. These models allow discrimination between four broad possibilities at the landscape scale: whether or not beetles are subject to an Allee effect at the landscape scale and whether or not host selection is random or directed. We fit the models with aerial damage survey data to the Sawtooth National Recreation Area using estimating functions, which allows for more rapid and complete parameter determination. We then introduce a novel model selection procedure based on facial recognition technology to compliment traditional nonspatial selection metrics. Together with these we are able to select a best model and draw inferences regarding the behavior of the beetle in outbreak conditions. [source]


PARTIALLY LINEAR MODEL SELECTION BY THE BOOTSTRAP

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2009
Samuel Müller
Summary We propose a new approach to the selection of partially linear models based on the conditional expected prediction square loss function, which is estimated using the bootstrap. Because of the different speeds of convergence of the linear and the nonlinear parts, a key idea is to select each part separately. In the first step, we select the nonlinear components using an ,m -out-of- n' residual bootstrap that ensures good properties for the nonparametric bootstrap estimator. The second step selects the linear components from the remaining explanatory variables, and the non-zero parameters are selected based on a two-level residual bootstrap. We show that the model selection procedure is consistent under some conditions, and our simulations suggest that it selects the true model most often than the other selection procedures considered. [source]


Partly Functional Temporal Process Regression with Semiparametric Profile Estimating Functions

BIOMETRICS, Issue 2 2009
Jun Yan
Summary Marginal mean models of temporal processes in event time data analysis are gaining more attention for their milder assumptions than the traditional intensity models. Recent work on fully functional temporal process regression (TPR) offers great flexibility by allowing all the regression coefficients to be nonparametrically time varying. The existing estimation procedure, however, prevents successive goodness-of-fit test for covariate coefficients in comparing a sequence of nested models. This article proposes a partly functional TPR model in the line of marginal mean models. Some covariate effects are time independent while others are completely unspecified in time. This class of models is very rich, including the fully functional model and the semiparametric model as special cases. To estimate the parameters, we propose semiparametric profile estimating equations, which are solved via an iterative algorithm, starting at a consistent estimate from a fully functional model in the existing work. No smoothing is needed, in contrast to other varying-coefficient methods. The weak convergence of the resultant estimators are developed using the empirical process theory. Successive tests of time-varying effects and backward model selection procedure can then be carried out. The practical usefulness of the methodology is demonstrated through a simulation study and a real example of recurrent exacerbation among cystic fibrosis patients. [source]


Exploratory Bayesian Model Selection for Serial Genetics Data

BIOMETRICS, Issue 2 2005
Jing X. Zhao
Summary Characterizing the process by which molecular and cellular level changes occur over time will have broad implications for clinical decision making and help further our knowledge of disease etiology across many complex diseases. However, this presents an analytic challenge due to the large number of potentially relevant biomarkers and the complex, uncharacterized relationships among them. We propose an exploratory Bayesian model selection procedure that searches for model simplicity through independence testing of multiple discrete biomarkers measured over time. Bayes factor calculations are used to identify and compare models that are best supported by the data. For large model spaces, i.e., a large number of multi-leveled biomarkers, we propose a Markov chain Monte Carlo (MCMC) stochastic search algorithm for finding promising models. We apply our procedure to explore the extent to which HIV-1 genetic changes occur independently over time. [source]


Forecasting daily high ozone concentrations by classification trees

ENVIRONMETRICS, Issue 2 2004
F. Bruno
Abstract This article proposes the use of classification trees (CART) as a suitable technique for forecasting the daily exceedance of ozone standards established by Italian law. A model is formulated for predicting, 1 and 2 days beforehand, the most probable class of the maximum daily urban ozone concentration in the city of Bologna. The standard employed is the so-called ,warning level' (180,,g/m3). Meteorological forecasted variables are considered as predictors. Pollution data show a considerable discrepancy between the dimensions of the two classes of events. The first class includes those days when the observed maximum value exceeds the established standard, while the second class contains those when the observed maximum value does not exceed the said standard. Due to this peculiarity, model selection procedures using cross-validation usually lead to overpruning. We can overcome this drawback by means of techniques which replicate observations, through the modification of their inclusion probabilities in the cross-validation sets. Copyright © 2004 John Wiley & Sons, Ltd. [source]


UPPER BOUNDS ON THE MINIMUM COVERAGE PROBABILITY OF CONFIDENCE INTERVALS IN REGRESSION AFTER MODEL SELECTION

AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 3 2009
Paul Kabaila
Summary We consider a linear regression model, with the parameter of interest a specified linear combination of the components of the regression parameter vector. We suppose that, as a first step, a data-based model selection (e.g. by preliminary hypothesis tests or minimizing the Akaike information criterion , AIC) is used to select a model. It is common statistical practice to then construct a confidence interval for the parameter of interest, based on the assumption that the selected model had been given to us,a priori. This assumption is false, and it can lead to a confidence interval with poor coverage properties. We provide an easily computed finite-sample upper bound (calculated by repeated numerical evaluation of a double integral) to the minimum coverage probability of this confidence interval. This bound applies for model selection by any of the following methods: minimum AIC, minimum Bayesian information criterion (BIC), maximum adjusted,R2, minimum Mallows' CP and,t -tests. The importance of this upper bound is that it delineates general categories of design matrices and model selection procedures for which this confidence interval has poor coverage properties. This upper bound is shown to be a finite-sample analogue of an earlier large-sample upper bound due to Kabaila and Leeb. [source]