Home About us Contact | |||
Model Complexity (model + complexity)
Selected AbstractsModel complexity versus scatter in fatigueFATIGUE & FRACTURE OF ENGINEERING MATERIALS AND STRUCTURES, Issue 11 2004T. SVENSSON ABSTRACT Fatigue assessment in industry is often based on simple empirical models, such as the Wöhler curve or the Paris' law. In contrast, fatigue research to a great extent works with very complex models, far from the engineering practice. One explanation for this discrepancy is that the scatter in service fatigue obscures many of the subtle phenomena that can be studied in a laboratory. Here we use a statistical theory for stepwise regression to investigate the role of scatter in the choice of model complexity in fatigue. The results suggest that the amount of complexity used in different design concepts reflects the appreciated knowledge about input parameters. The analysis also points out that even qualitative knowledge about the neglected complexity may be important in order to avoid systematic errors. [source] CHALLENGES IN MODELING HYDROLOGIC AND WATER QUALITY PROCESSES IN RIPARIAN ZONES,JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION, Issue 1 2006Shreeram Inamdar ABSTRACT: This paper presents key challenges in modeling water quality processes of riparian ecosystems: How can the spatial and temporal extent of water and solute mixing in the riparian zone be modeled? What level of model complexity is justified? How can processes at the riparian scale be quantified? How can the impact of riparian ecosystems be determined at the watershed scale? Flexible models need to be introduced that can simulate varying levels of hillslope-riparian mixing dictated by topography, upland and riparian depths, and moisture conditions. Model simulations need to account for storm event peak flow conditions when upland solute loadings may either bypass or overwhelm the riparian zone. Model complexity should be dictated by the level of detail in measured data. Model algorithms need to be developed using new macro-scale and meso-scale experiments that capture process dynamics at the hillslope or landscape scales. Monte Carlo simulations should be an integral part of model simulations and rigorous tests that go beyond simple time series, and point-output comparisons need to be introduced. The impact of riparian zones on watershed-scale water quality can be assessed by performing simulations for representative hillsloperiparian scenarios. [source] Analysis of Single-Molecule Fluorescence Spectroscopic Data with a Markov-Modulated Poisson ProcessCHEMPHYSCHEM, Issue 14 2009Mark Jäger Dr. Abstract We present a photon-by-photon analysis framework for the evaluation of data from single-molecule fluorescence spectroscopy (SMFS) experiments using a Markov-modulated Poisson process (MMPP). A MMPP combines a discrete (and hidden) Markov process with an additional Poisson process reflecting the observation of individual photons. The algorithmic framework is used to automatically analyze the dynamics of the complex formation and dissociation of Cu2+ ions with the bidentate ligand 2,2,-bipyridine-4,4,dicarboxylic acid in aqueous media. The process of association and dissociation of Cu2+ ions is monitored with SMFS. The dcbpy-DNA conjugate can exist in two or more distinct states which influence the photon emission rates. The advantage of a photon-by-photon analysis is that no information is lost in preprocessing steps. Different model complexities are investigated in order to best describe the recorded data and to determine transition rates on a photon-by-photon basis. The main strength of the method is that it allows to detect intermittent phenomena which are masked by binning and that are difficult to find using correlation techniques when they are short-lived. [source] Testing Conditional Asset Pricing Models Using a Markov Chain Monte Carlo ApproachEUROPEAN FINANCIAL MANAGEMENT, Issue 3 2008Manuel Ammann G12 Abstract We use Markov Chain Monte Carlo (MCMC) methods for the parameter estimation and the testing of conditional asset pricing models. In contrast to traditional approaches, it is truly conditional because the assumption that time variation in betas is driven by a set of conditioning variables is not necessary. Moreover, the approach has exact finite sample properties and accounts for errors-in-variables. Using S&P 500 panel data, we analyse the empirical performance of the CAPM and theFama and French (1993)three-factor model. We find that time-variation of betas in the CAPM and the time variation of the coefficients for the size factor (SMB) and the distress factor (HML) in the three-factor model improve the empirical performance. Therefore, our findings are consistent with time variation of firm-specific exposure to market risk, systematic credit risk and systematic size effects. However, a Bayesian model comparison trading off goodness of fit and model complexity indicates that the conditional CAPM performs best, followed by the conditional three-factor model, the unconditional CAPM, and the unconditional three-factor model. [source] Model complexity versus scatter in fatigueFATIGUE & FRACTURE OF ENGINEERING MATERIALS AND STRUCTURES, Issue 11 2004T. SVENSSON ABSTRACT Fatigue assessment in industry is often based on simple empirical models, such as the Wöhler curve or the Paris' law. In contrast, fatigue research to a great extent works with very complex models, far from the engineering practice. One explanation for this discrepancy is that the scatter in service fatigue obscures many of the subtle phenomena that can be studied in a laboratory. Here we use a statistical theory for stepwise regression to investigate the role of scatter in the choice of model complexity in fatigue. The results suggest that the amount of complexity used in different design concepts reflects the appreciated knowledge about input parameters. The analysis also points out that even qualitative knowledge about the neglected complexity may be important in order to avoid systematic errors. [source] Multistage designs in the genomic era: Providing balance in complex disease studiesGENETIC EPIDEMIOLOGY, Issue S1 2007Marie-Pierre Dubé Abstract In this summary paper, we describe the contributions included in the Multistage Design group (Group 14) at the Genetic Analysis Workshop 15, which was held during November 12,14, 2006. Our group contrasted and compared different approaches to reducing complexity in a genetic study through implementation of staged designs. Most groups used the simulated dataset (problem 3), which provided ample opportunities for evaluating various staged designs. A wide range of multistage designs that targeted different aspects of complexity were explored. We categorized these approaches as reducing phenotypic complexity, model complexity, analytic complexity or genetic complexity. In general we learned that: (1) when staged designs are carefully planned and implemented, the power loss compared to a single-stage analysis can be minimized and study cost is greatly reduced; (2) a joint analysis of the results from each stage is generally more powerful than treating the second stage as a replication analysis. Genet. Epidemiol. 31 (Suppl. 1):S118,S123, 2007. © 2007 Wiley-Liss, Inc. [source] Analysis of multilocus models of associationGENETIC EPIDEMIOLOGY, Issue 1 2003B. Devlin Abstract It is increasingly recognized that multiple genetic variants, within the same or different genes, combine to affect liability for many common diseases. Indeed, the variants may interact among themselves and with environmental factors. Thus realistic genetic/statistical models can include an extremely large number of parameters, and it is by no means obvious how to find the variants contributing to liability. For models of multiple candidate genes and their interactions, we prove that statistical inference can be based on controlling the false discovery rate (FDR), which is defined as the expected number of false rejections divided by the number of rejections. Controlling the FDR automatically controls the overall error rate in the special case that all the null hypotheses are true. So do more standard methods such as Bonferroni correction. However, when some null hypotheses are false, the goals of Bonferroni and FDR differ, and FDR will have better power. Model selection procedures, such as forward stepwise regression, are often used to choose important predictors for complex models. By analysis of simulations of such models, we compare a computationally efficient form of forward stepwise regression against the FDR methods. We show that model selection includes numerous genetic variants having no impact on the trait, whereas FDR maintains a false-positive rate very close to the nominal rate. With good control over false positives and better power than Bonferroni, the FDR-based methods we introduce present a viable means of evaluating complex, multivariate genetic models. Naturally, as for any method seeking to explore complex genetic models, the power of the methods is limited by sample size and model complexity. Genet Epidemiol 25:36,47, 2003. © 2003 Wiley-Liss, Inc. [source] Towards a simple dynamic process conceptualization in rainfall,runoff models using multi-criteria calibration and tracers in temperate, upland catchmentsHYDROLOGICAL PROCESSES, Issue 3 2010C. Birkel Abstract Empirically based understanding of streamflow generation dynamics in a montane headwater catchment formed the basis for the development of simple, low-parameterized, rainfall,runoff models. This study was based in the Girnock catchment in the Cairngorm Mountains of Scotland, where runoff generation is dominated by overland flow from peaty soils in valley bottom areas that are characterized by dynamic expansion and contraction of saturation zones. A stepwise procedure was used to select the level of model complexity that could be supported by field data. This facilitated the assessment of the way the dynamic process representation improved model performance. Model performance was evaluated using a multi-criteria calibration procedure which applied a time series of hydrochemical tracers as an additional objective function. Flow simulations comparing a static against the dynamic saturation area model (SAM) substantially improved several evaluation criteria. Multi-criteria evaluation using ensembles of performance measures provided a much more comprehensive assessment of the model performance than single efficiency statistics, which alone, could be misleading. Simulation of conservative source area tracers (Gran alkalinity) as part of the calibration procedure showed that a simple two-storage model is the minimum complexity needed to capture the dominant processes governing catchment response. Additionally, calibration was improved by the integration of tracers into the flow model, which constrained model uncertainty and improved the hydrodynamics of simulations in a way that plausibly captured the contribution of different source areas to streamflow. This approach contributes to the quest for low-parameter models that can achieve process-based simulation of hydrological response. Copyright © 2009 John Wiley & Sons, Ltd. [source] Evaluation of model complexity and space,time resolution on the prediction of long-term soil salinity dynamics, western San Joaquin Valley, CaliforniaHYDROLOGICAL PROCESSES, Issue 13 2006G. Schoups Abstract The numerical simulation of long-term large-scale (field to regional) variably saturated subsurface flow and transport remains a computational challenge, even with today's computing power. Therefore, it is appropriate to develop and use simplified models that focus on the main processes operating at the pertinent time and space scales, as long as the error introduced by the simpler model is small relative to the uncertainties associated with the spatial and temporal variation of boundary conditions and parameter values. This study investigates the effects of various model simplifications on the prediction of long-term soil salinity and salt transport in irrigated soils. Average root-zone salinity and cumulative annual drainage salt load were predicted for a 10-year period using a one-dimensional numerical flow and transport model (i.e. UNSATCHEM) that accounts for solute advection, dispersion and diffusion, and complex salt chemistry. The model uses daily values for rainfall, irrigation, and potential evapotranspiration rates. Model simulations consist of benchmark scenarios for different hypothetical cases that include shallow and deep water tables, different leaching fractions and soil gypsum content, and shallow groundwater salinity, with and without soil chemical reactions. These hypothetical benchmark simulations are compared with the results of various model simplifications that considered (i) annual average boundary conditions, (ii) coarser spatial discretization, and (iii) reducing the complexity of the salt-soil reaction system. Based on the 10-year simulation results, we conclude that salt transport modelling does not require daily boundary conditions, a fine spatial resolution, or complex salt chemistry. Instead, if the focus is on long-term salinity, then a simplified modelling approach can be used, using annually averaged boundary conditions, a coarse spatial discretization, and inclusion of soil chemistry that only accounts for cation exchange and gypsum dissolution,precipitation. We also demonstrate that prediction errors due to these model simplifications may be small, when compared with effects of parameter uncertainty on model predictions. The proposed model simplifications lead to larger time steps and reduced computer simulation times by a factor of 1000. Copyright © 2006 John Wiley & Sons, Ltd. [source] Downward approach to hydrological predictionHYDROLOGICAL PROCESSES, Issue 11 2003Murugesu Sivapalan Abstract This paper presents an overview of the ,downward approach' to hydrologic prediction and attempts to provide a context for the papers appearing in this special issue. The downward approach is seen as a necessary counterpoint to the mechanistic ,reductionist' approach that dominates current hydrological model development. It provides a systematic framework to learning from data, including the testing of hypotheses at every step of analysis. It can also be applied in a hierarchical manner: starting from exploring first-order controls in the modelling of catchment response, the model complexity can then be increased in response to deficiencies in reproducing observations at different levels. The remaining contributions of this special issue present a number of applications of the downward approach, including development of parsimonious water balance models with changing time scales by learning from signatures extracted from observed streamflow data at different time scales, regionalization of model parameters, parameterization of effects of sub-grid variability, and standardized statistical approaches to analyse data and to develop model structures. This review demonstrates that the downward approach is not a rigid methodology, but represents a generic framework. It needs to play an increasing role in the future in the development of hydrological models at the catchment scale. Copyright © 2003 John Wiley & Sons, Ltd. [source] Nutrient fluxes at the river basin scale.HYDROLOGICAL PROCESSES, Issue 5 2001II: the balance between data availability, model complexity Abstract In order to model complex environmental systems, one needs to find a balance between the model complexity and the quality of the data available needed to run and validate the model. This paper describes a method to find this balance. Four models of different complexity were applied to describe the transfer of nitrogen and phosphorus from pollution sources to river outlets in two large European river basins (Rhine and Elbe). A comparison of the predictive capability of these four models tells us something about the added value of the added model complexity. We also quantified the errors in the data that were used to run and validate the models and analysed to what extent the model validation errors could be attributed to data errors, and to what extent to shortcomings of the model. We conclude that although the addition of more process description is interesting from a theoretical point of view, it does not necessarily improve the predictive capability. Although our analysis is based on an extensive pollution-sources,river-load database it appeared that the information content of this database was sufficient only to support models of a limited complexity. Our analysis also illustrates that for a proper justification of a model's degree of complexity one should compare the model to simplified versions of the model. Copyright © 2001 John Wiley & Sons, Ltd. [source] Wrapped input selection using multilayer perceptrons for repeat-purchase modeling in direct marketingINTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE & MANAGEMENT, Issue 2 2001Stijn Viaene In this paper, we try to validate existing theory on and develop additional insight into repeat-purchase behavior in a direct marketing setting by means of an illuminating case study. The case involves the detection and qualification of the most relevant RFM (Recency, Frequency and Monetary) variables, using a neural network wrapper as our input pruning method. Results indicate that elimination of redundant and/or irrelevant inputs by means of the discussed input selection method allows us to significantly reduce model complexity without degrading the predictive generalization ability. It is precisely this issue that will enable us to infer some interesting marketing conclusions concerning the relative importance of the RFM predictor categories and their operationalizations. The empirical findings highlight the importance of a combined use of RFM variables in predicting repeat-purchase behavior. However, the study also reveals the dominant role of the frequency category. Results indicate that a model including only frequency variables still yields satisfactory classification accuracy compared to the optimally reduced model. Copyright © 2001 John Wiley & Sons, Ltd. [source] On models of fractal networksINTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, Issue 5 2009Walter Arrighetti Abstract A couple of iterative models for the theoretical study of fractal networks whose topologies are generated via iterated function systems is presented: a lumped-parameter impedor-oriented one and a two-port-network-oriented one. With the former, the voltage and current patterns give a detailed understanding of the electromagnetic fields' self-similar distribution throughout the network; on the other hand, model complexity exponentially increases with the prefractal iteration order. The latter ,black-box' model only controls port-oriented global parameters that are the ones commonly used in the integration of different electronic systems, and its complexity is independent of prefractal order. Sierpinski gasket and carpet topologies are reported as examples. Copyright © 2008 John Wiley & Sons, Ltd. [source] Toward better scoring metrics for pseudo-independent modelsINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 8 2004Y. Xiang Learning belief networks from data is NP-hard in general. A common method used in heuristic learning is the single-link lookahead search. When the problem domain is pseudo-independent (PI), the method cannot discover the underlying probabilistic model. In learning these models, to explicitly trade model accuracy and model complexity, parameterization of PI models is necessary. Understanding of PI models also provides a new dimension of trade-off in learning even when the underlying model may not be PI. In this work, we adopt a hypercube perspective to analyze PI models and derive an improved result for computing the maximum number of parameters needed to specify a full PI model. We also present results on parameterization of a subclass of partial PI models. © 2004 Wiley Periodicals, Inc. Int J Int Syst 19: 749,768, 2004. [source] Prospects and challenges for parametric models in historical biogeographical inferenceJOURNAL OF BIOGEOGRAPHY, Issue 7 2009Richard H. Ree Abstract In historical biogeography, phylogenetic trees have long been used as tools for addressing a wide range of inference problems, from explaining common distribution patterns of species to reconstructing ancestral geographic ranges on branches of the tree of life. However, the potential utility of phylogenies for this purpose has yet to be fully realized, due in part to a lack of explicit conceptual links between processes underlying the evolution of geographic ranges and processes of phylogenetic tree growth. We suggest that statistical approaches that use parametric models to forge such links will stimulate integration and propel hypothesis-driven biogeographical inquiry in new directions. We highlight here two such approaches and describe how they represent early steps towards a more general framework for model-based historical biogeography that is based on likelihood as an optimality criterion, rather than having the traditional reliance on parsimony. The development of this framework will not be without significant challenges, particularly in balancing model complexity with statistical power, and these will be most apparent in studies of regions with many component areas and complex geological histories, such as the Mediterranean Basin. [source] A systematic evaluation of the benefits and hazards of variable selection in latent variable regression.JOURNAL OF CHEMOMETRICS, Issue 7 2002Part I. Search algorithm, simulations, theory Abstract Variable selection is an extensively studied problem in chemometrics and in the area of quantitative structure,activity relationships (QSARs). Many search algorithms have been compared so far. Less well studied is the influence of different objective functions on the prediction quality of the selected models. This paper investigates the performance of different cross-validation techniques as objective function for variable selection in latent variable regression. The results are compared in terms of predictive ability, model size (number of variables) and model complexity (number of latent variables). It will be shown that leave-multiple-out cross-validation with a large percentage of data left out performs best. Since leave-multiple-out cross-validation is computationally expensive, a very efficient tabu search algorithm is introduced to lower the computational burden. The tabu search algorithm needs no user-defined operational parameters and optimizes the variable subset and the number of latent variables simultaneously. Copyright © 2002 John Wiley & Sons, Ltd. [source] A systematic evaluation of the benefits and hazards of variable selection in latent variable regression.JOURNAL OF CHEMOMETRICS, Issue 7 2002Part II. Abstract Leave-multiple-out cross-validation (LMO-CV) is compared to leave-one-out cross-validation (LOO-CV) as objective function in variable selection for four real data sets. Two data sets stem from NIR spectroscopy and two from quantitative structure,activity relationships. In all four cases, LMO-CV outperforms LOO-CV with respect to prediction quality, model complexity (number of latent variables) and model size (number of variables). The number of objects left out in LMO-CV has an important effect on the final results. It controls both the number of latent variables in the final model and the prediction quality. The results of variable selection need to be validated carefully with a validation step that is independent of the variable selection. This step needs to be done because the internal figures of merit (i.e. anything that is derived from the objective function value) do not correlate well with the external predictivity of the selected models. This is most obvious for LOO-CV. LOO-CV without further constraints always shows the best internal figures of merit and the worst prediction quality. Copyright © 2002 John Wiley & Sons, Ltd. [source] A structured model for the simulation of bioreactors under transient conditionsAICHE JOURNAL, Issue 11 2009Jérôme Morchain Abstract Modeling the transient behavior of continuous culture is of primary importance for the scale-up of biological processes. Spatial heterogeneities increase with the reactor size and micro-organisms have to cope with a fluctuating environment along their trajectories within the bioreactor. In this article, a structured model for bioreactions expressed in terms of biological extensive variables is proposed. A biological variable is introduced to calculate the growth rate of the population. The value is updated on the basis of the difference between the composition in the liquid and biotic phase. The structured model is able to predict the transient behavior of different continuous cultures subject to various drastic perturbations. This performance is obtained with a minimum increase in the standard unstructured model complexity (one additional time constant). In the final part, the consequences of decoupling the growth rate from the substrate uptake rate are discussed. © 2009 American Institute of Chemical Engineers AIChE J, 2009 [source] CHALLENGES IN MODELING HYDROLOGIC AND WATER QUALITY PROCESSES IN RIPARIAN ZONES,JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION, Issue 1 2006Shreeram Inamdar ABSTRACT: This paper presents key challenges in modeling water quality processes of riparian ecosystems: How can the spatial and temporal extent of water and solute mixing in the riparian zone be modeled? What level of model complexity is justified? How can processes at the riparian scale be quantified? How can the impact of riparian ecosystems be determined at the watershed scale? Flexible models need to be introduced that can simulate varying levels of hillslope-riparian mixing dictated by topography, upland and riparian depths, and moisture conditions. Model simulations need to account for storm event peak flow conditions when upland solute loadings may either bypass or overwhelm the riparian zone. Model complexity should be dictated by the level of detail in measured data. Model algorithms need to be developed using new macro-scale and meso-scale experiments that capture process dynamics at the hillslope or landscape scales. Monte Carlo simulations should be an integral part of model simulations and rigorous tests that go beyond simple time series, and point-output comparisons need to be introduced. The impact of riparian zones on watershed-scale water quality can be assessed by performing simulations for representative hillsloperiparian scenarios. [source] Bayesian measures of model complexity and fitJOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 4 2002David J. Spiegelhalter Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ,hat' matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis. [source] Process vs resource-oriented Petri net modeling of automated manufacturing systems,ASIAN JOURNAL OF CONTROL, Issue 3 2010NaiQi Wu Abstract Since the 1980s, Petri nets (PN) have been widely used to model automated manufacturing systems (AMS) for analysis, performance evaluation, simulation, and control. They are mostly based on process-oriented modeling methods and thus termed as process-oriented PN (POPN) in this paper. The recent study of deadlock avoidance problems in AMS led to another type of PN called resource-oriented PN (ROPN). This paper, for the first time, compares these two modeling methods and resultant models in terms of modeling power, model complexity for analysis and control, and some critical properties. POPN models the part production processes straightforwardly, while ROPN is more compact and effective for deadlock resolution. The relations between these two models are investigated. Several examples are used to illustrate them. Copyright © 2010 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society [source] VARIATIONAL BAYESIAN ANALYSIS FOR HIDDEN MARKOV MODELSAUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, Issue 2 2009C. A. McGrory Summary The variational approach to Bayesian inference enables simultaneous estimation of model parameters and model complexity. An interesting feature of this approach is that it also leads to an automatic choice of model complexity. Empirical results from the analysis of hidden Markov models with Gaussian observation densities illustrate this. If the variational algorithm is initialized with a large number of hidden states, redundant states are eliminated as the method converges to a solution, thereby leading to a selection of the number of hidden states. In addition, through the use of a variational approximation, the deviance information criterion for Bayesian model selection can be extended to the hidden Markov model framework. Calculation of the deviance information criterion provides a further tool for model selection, which can be used in conjunction with the variational approach. [source] |