Model Choice (model + choice)

Distribution by Scientific Domains


Selected Abstracts


The Impact of Incomplete Linkage Disequilibrium and Genetic Model Choice on the Analysis and Interpretation of Genome-wide Association Studies

ANNALS OF HUMAN GENETICS, Issue 4 2010
Mark M. Iles
Summary When conducting a genetic association study, it has previously been observed that a multiplicative risk model tends to fit better at a disease-associated marker locus than at the ungenotyped causative locus. This suggests that, while overall risk decreases as linkage disequilibrium breaks down, non-multiplicative components are more affected. This effect is investigated here, in particular the practical consequences it has on testing for trait/marker associations and the estimation of mode of inheritance and risk once an associated locus has been found. The extreme significance levels required for genome-wide association studies define a restricted range of detectable allele frequencies and effect sizes. For such parameters there is little to be gained by using a test that models the correct mode of inheritance rather than the multiplicative; thus the Cochran-Armitage trend test, which assumes a multiplicative model, is preferable to a more general model as it uses fewer degrees of freedom. Equally when estimating risk, it is likely that a multiplicative risk model will provide a good fit to the data, regardless of the underlying mode of inheritance at the true susceptibility locus. This may lead to problems in interpreting risk estimates. [source]


Variable Selection and Model Choice in Geoadditive Regression Models

BIOMETRICS, Issue 2 2009
Thomas Kneib
Summary Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection. [source]


Model choice in time series studies of air pollution and mortality

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 2 2006
Roger D. Peng
Summary., Multicity time series studies of particulate matter and mortality and morbidity have provided evidence that daily variation in air pollution levels is associated with daily variation in mortality counts. These findings served as key epidemiological evidence for the recent review of the US national ambient air quality standards for particulate matter. As a result, methodological issues concerning time series analysis of the relationship between air pollution and health have attracted the attention of the scientific community and critics have raised concerns about the adequacy of current model formulations. Time series data on pollution and mortality are generally analysed by using log-linear, Poisson regression models for overdispersed counts with the daily number of deaths as outcome, the (possibly lagged) daily level of pollution as a linear predictor and smooth functions of weather variables and calendar time used to adjust for time-varying confounders. Investigators around the world have used different approaches to adjust for confounding, making it difficult to compare results across studies. To date, the statistical properties of these different approaches have not been comprehensively compared. To address these issues, we quantify and characterize model uncertainty and model choice in adjusting for seasonal and long-term trends in time series models of air pollution and mortality. First, we conduct a simulation study to compare and describe the properties of statistical methods that are commonly used for confounding adjustment. We generate data under several confounding scenarios and systematically compare the performance of the various methods with respect to the mean-squared error of the estimated air pollution coefficient. We find that the bias in the estimates generally decreases with more aggressive smoothing and that model selection methods which optimize prediction may not be suitable for obtaining an estimate with small bias. Second, we apply and compare the modelling approaches with the National Morbidity, Mortality, and Air Pollution Study database which comprises daily time series of several pollutants, weather variables and mortality counts covering the period 1987,2000 for the largest 100 cities in the USA. When applying these approaches to adjusting for seasonal and long-term trends we find that the Study's estimates for the national average effect of PM10 at lag 1 on mortality vary over approximately a twofold range, with 95% posterior intervals always excluding zero risk. [source]


Estimation of the Dominating Frequency for Stationary and Nonstationary Fractional Autoregressive Models

JOURNAL OF TIME SERIES ANALYSIS, Issue 5 2000
Jan Beran
This paper was motivated by the investigation of certain physiological series for premature infants. The question was whether the series exhibit periodic fluctuations with a certain dominating period. The observed series are nonstationary and/or have long-range dependence. The assumed model is a Gaussian process Xt whose mth difference Yt = (1 ,B)mXt is stationary with a spectral density f that may have a pole (or a zero) at the origin. the problem addressed in this paper is the estimation of the frequency ,max where f achieves the largest local maximum in the open interval (0, ,). The process Xt is assumed to belong to a class of parametric models, characterized by a parameter vector ,, defined in Beran (1995). An estimator of ,max is proposed and its asymptotic distribution is derived, with , being estimated by maximum likelihood. In particular, m and a fractional differencing parameter that models long memory are estimated from the data. Model choice is also incorporated. Thus, within the proposed framework, a data driven procedure is obtained that can be applied in situations where the primary interest is in estimating a dominating frequency. A simulation study illustrates the finite sample properties of the method. In particular, for short series, estimation of ,max is difficult, if the local maximum occurs close to the origin. The results are illustrated by two of the data examples that motivated this research. [source]


Variable Selection and Model Choice in Geoadditive Regression Models

BIOMETRICS, Issue 2 2009
Thomas Kneib
Summary Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection. [source]


Validating the Relationship Qualities of Influence and Persuasion With the Family Social Relations Model

HUMAN COMMUNICATION RESEARCH, Issue 1 2003
Rachel Oakley Hsiung
Influence and persuasion are related, fundamental constructs in interpersonal communication. Both display the relationship qualities of interdependence, bidirectionality, reciprocity, and multiple levels of analysis. Yet, empirical validation of these relationship qualities is lacking, largely due to an absence of appropriate methods and statistical procedures. This article uses the family social relations model (SRM) to test for the personal relationship qualities of influence and persuasion in the family decision-making context of buying a new car. New relationship measures of influence and persuasion were developed because, historically, measures have been at the individual level. The sample size of 110 families proved sufficient for stable parameter estimates. The results uncovered patterns in the relationship qualities of influence and persuasion across 3 decisions families make when buying a new car (i.e., how much to spend, car model choice, final decision). The findings confirm that both influence and persuasion are truly relational. The novel use of the model across decisions allowed the patterns of relationships among family members to be compared, and demonstrated the importance of the relationship qualities of influence and persuasion in decision making. Predictions were examined across decisions as well so as to check the consistency of hypotheses. The results provide further insight into the meaning of influence and persuasion, and of SRM terms. [source]


Different methods for modelling the areal infiltration of a grass field under heavy precipitation

HYDROLOGICAL PROCESSES, Issue 7 2002
Bruno Merz
Abstract The areal infiltration behaviour of a grass field is studied using a data set of 78 sprinkler infiltration experiments. The analysis of the experimental data shows a distinct event dependency: once runoff begins, the final infiltration rate increases with increasing rainfall intensity. This behaviour is attributed to the effects of small-scale variability. Increasing rainfall intensity increases the ponded area and therefore the portion of the plot which infiltrates at maximum rate. To describe the areal infiltration behaviour of the grass field the study uses two different model structures and investigates different approaches for consideration of subgrid variability. It is found that the effective parameter approach is not suited for this purpose. A good representation of the observed behaviour is obtained by using a distribution function approach or a parameterization approach. However, it is not clear how the parameters can be derived for these two approaches without a large measurement campaign. The data analysis and the simulations show the great importance of considering the effects of spatial variability for the infiltration process. This may be significant even at a small scale for a comparatively homogeneous area. The consideration of heterogeneity seems to be more important than the choice of the model type. Furthermore, similar results may be obtained with different modelling approaches. Even the relatively detailed data set does not seem to permit a clear model choice. In view of these results it is questionable to use very complex and detailed simulation models given the approximate nature of the problem. Although the principle processes may be well understood there is a lack of models that represent these processes and, more importantly, there is a lack of techniques to measure and parameterize them. Copyright © 2002 John Wiley & Sons, Ltd. [source]


A Measure of Representativeness of a Sample for Inferential Purposes

INTERNATIONAL STATISTICAL REVIEW, Issue 2 2006
Salvatore Bertino
Summary After defining the concept of representativeness of a random sample, the author proposes a measure of how much the observed sample represents its parent distribution. This measure is called Representativeness Index. The same measure, seen as a function of a sample and of a distribution, will be called Representativeness Function. For a given sample it provides the value of the index for the different distributions under examination, and for a given distribution it provides a measure of the representativeness of its possible samples. Such Representativeness Function can be used in an inferential framework just as the likelihood function, since it gives to any distribution the "experimental support" provided by the observed sample. This measure is distribution-free and it is shown to be a transformation of the wellknown Cramér,von Mises statistic. By using the properties of that statistic, criteria for providing set estimators and tests of hypotheses are introduced. The utilization of the representativeness function in many standard statistical problems is outlined through examples. The quality of the inferential decisions can be assessed with the usual techniques (MSE, power function, coverage probabilities). The most interesting examples turn out to be those of situations that are "non-regular", as for instance the estimation of parameters involved in the support of the parent distribution, or less explored (model choice). Résumé Après avoir défini le concept de répresentativité d'un échantillon aléatoire, l'auteur propose une mesure de combien l'échantillon observé réprésente la distribution parente. Cette mesure est dite Fonction de Répresentativité. Pour un échantillon donné la fonction donne les valeurs de l'indice pour toutes le distributions de la famille consideée, tandis que, pour une distribution donnée, elle donne la mesure de la répresentativité de chaque possible échantillon. La Fonction de Répresentativité peut être employée dans les problèmes d'inference statistique justement comme la fonction de vraisemblance, puisque elle donne à chaque distribution le "support expérimental" produit par l'échantillon observé. La measure est à distribution libre et on demontre que elle est une tranformation de la bien connue statistique de Cramér,von Mises. En utilisant le propriétés de la dite statistique, on introduit des crières pour obtenir estimateur ensemblistes et test d'hypothèse. L'utilization de la fonction de répresentativité dans plusieurs problèmes statistique est montrée par des examples. La qualité des decisions inférentielles peut être evaluée par les techniques usuelles (MSE, fonction de puissance, probabiliés de couverture). Les examples les plus interessant sont ceux qui concerne les situations "non regulères", par exemple l'estimation de paramètres qui figurent dans le support de la population parente, ou situations moins exploées (choix du modèle). [source]


How do changes in monetary policy affect bank lending?

JOURNAL OF APPLIED ECONOMETRICS, Issue 3 2006
An analysis of Austrian bank data
Using a panel of Austrian bank data we show that the lending decisions of the smallest banks are more sensitive to interest rate changes, and that for all banks, sensitivity changes over time. We propose to estimate the groups of banks that display similar lending reactions by means of a group indicator which, after estimation, indicates each bank's classification. Additionally, we estimate a state indicator that indicates the periods during which the lending reaction differs from what we normally observe. Bayesian methods are used for estimation; a sensitivity analysis and a forecast evaluation confirm our model choice. Copyright © 2006 John Wiley & Sons, Ltd. [source]


Model choice in time series studies of air pollution and mortality

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES A (STATISTICS IN SOCIETY), Issue 2 2006
Roger D. Peng
Summary., Multicity time series studies of particulate matter and mortality and morbidity have provided evidence that daily variation in air pollution levels is associated with daily variation in mortality counts. These findings served as key epidemiological evidence for the recent review of the US national ambient air quality standards for particulate matter. As a result, methodological issues concerning time series analysis of the relationship between air pollution and health have attracted the attention of the scientific community and critics have raised concerns about the adequacy of current model formulations. Time series data on pollution and mortality are generally analysed by using log-linear, Poisson regression models for overdispersed counts with the daily number of deaths as outcome, the (possibly lagged) daily level of pollution as a linear predictor and smooth functions of weather variables and calendar time used to adjust for time-varying confounders. Investigators around the world have used different approaches to adjust for confounding, making it difficult to compare results across studies. To date, the statistical properties of these different approaches have not been comprehensively compared. To address these issues, we quantify and characterize model uncertainty and model choice in adjusting for seasonal and long-term trends in time series models of air pollution and mortality. First, we conduct a simulation study to compare and describe the properties of statistical methods that are commonly used for confounding adjustment. We generate data under several confounding scenarios and systematically compare the performance of the various methods with respect to the mean-squared error of the estimated air pollution coefficient. We find that the bias in the estimates generally decreases with more aggressive smoothing and that model selection methods which optimize prediction may not be suitable for obtaining an estimate with small bias. Second, we apply and compare the modelling approaches with the National Morbidity, Mortality, and Air Pollution Study database which comprises daily time series of several pollutants, weather variables and mortality counts covering the period 1987,2000 for the largest 100 cities in the USA. When applying these approaches to adjusting for seasonal and long-term trends we find that the Study's estimates for the national average effect of PM10 at lag 1 on mortality vary over approximately a twofold range, with 95% posterior intervals always excluding zero risk. [source]


A light-tailed conditionally heteroscedastic model with applications to river flows

JOURNAL OF TIME SERIES ANALYSIS, Issue 1 2008
Péter Elek
Abstract., A conditionally heteroscedastic model, different from the more commonly used autoregressive moving average,generalized autoregressive conditionally heteroscedastic (ARMA-GARCH) processes, is established and analysed here. The time-dependent variance of innovations passing through an ARMA filter is conditioned on the lagged values of the generated process, rather than on the lagged innovations, and is defined to be asymptotically proportional to those past values. Designed this way, the model incorporates certain feedback from the modelled process, the innovation is no longer of GARCH type, and all moments of the modelled process are finite provided the same is true for the generating noise. The article gives the condition of stationarity, and proves consistency and asymptotic normality of the Gaussian quasi-maximum likelihood estimator of the variance parameters, even though the estimated parameters of the linear filter contain an error. An analysis of six diurnal water discharge series observed along Rivers Danube and Tisza in Hungary demonstrates the usefulness of such a model. The effect of lagged river discharge turns out to be highly significant on the variance of innovations, and nonparametric estimation approves its approximate linearity. Simulations from the new model preserve well the probability distribution, the high quantiles, the tail behaviour and the high-level clustering of the original series, further justifying model choice. [source]


Model Selection for Monetary Policy Analysis: How Important is Empirical Validity?,

OXFORD BULLETIN OF ECONOMICS & STATISTICS, Issue 1 2009
Q. Farooq Akram
Abstract We investigate the economic significance of trading off empirical validity of models against other desirable model properties. Our investigation is based on three alternative econometric systems of the supply side, in a model that can be used to discuss optimal monetary policy in Norway. Our results caution against compromising empirical validity when selecting a model for policy analysis. We also find large costs from basing policies on the robust model, or on a suite of models, even when it contains the valid model. This confirms an important role for econometric modelling and evaluation in model choice for policy analysis. [source]


Models for Bounded Systems with Continuous Dynamics

BIOMETRICS, Issue 3 2009
Amanda R. Cangelosi
Summary Models for natural nonlinear processes, such as population dynamics, have been given much attention in applied mathematics. For example, species competition has been extensively modeled by differential equations. Often, the scientist has preferred to model the underlying dynamical processes (i.e., theoretical mechanisms) in continuous time. It is of both scientific and mathematical interest to implement such models in a statistical framework to quantify uncertainty associated with the models in the presence of observations. That is, given discrete observations arising from the underlying continuous process, the unobserved process can be formally described while accounting for multiple sources of uncertainty (e.g., measurement error, model choice, and inherent stochasticity of process parameters). In addition to continuity, natural processes are often bounded; specifically, they tend to have nonnegative support. Various techniques have been implemented to accommodate nonnegative processes, but such techniques are often limited or overly compromising. This article offers an alternative to common differential modeling practices by using a bias-corrected truncated normal distribution to model the observations and latent process, both having bounded support. Parameters of an underlying continuous process are characterized in a Bayesian hierarchical context, utilizing a fourth-order Runge,Kutta approximation. [source]


Variable Selection and Model Choice in Geoadditive Regression Models

BIOMETRICS, Issue 2 2009
Thomas Kneib
Summary Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection. [source]