Computational Challenges (computational + challenge)

Distribution by Scientific Domains


Selected Abstracts


Computational challenges in combinatorial library design for protein engineering

AICHE JOURNAL, Issue 2 2004
Gregory L. Moore
[source]


Evaluation of model complexity and space,time resolution on the prediction of long-term soil salinity dynamics, western San Joaquin Valley, California

HYDROLOGICAL PROCESSES, Issue 13 2006
G. Schoups
Abstract The numerical simulation of long-term large-scale (field to regional) variably saturated subsurface flow and transport remains a computational challenge, even with today's computing power. Therefore, it is appropriate to develop and use simplified models that focus on the main processes operating at the pertinent time and space scales, as long as the error introduced by the simpler model is small relative to the uncertainties associated with the spatial and temporal variation of boundary conditions and parameter values. This study investigates the effects of various model simplifications on the prediction of long-term soil salinity and salt transport in irrigated soils. Average root-zone salinity and cumulative annual drainage salt load were predicted for a 10-year period using a one-dimensional numerical flow and transport model (i.e. UNSATCHEM) that accounts for solute advection, dispersion and diffusion, and complex salt chemistry. The model uses daily values for rainfall, irrigation, and potential evapotranspiration rates. Model simulations consist of benchmark scenarios for different hypothetical cases that include shallow and deep water tables, different leaching fractions and soil gypsum content, and shallow groundwater salinity, with and without soil chemical reactions. These hypothetical benchmark simulations are compared with the results of various model simplifications that considered (i) annual average boundary conditions, (ii) coarser spatial discretization, and (iii) reducing the complexity of the salt-soil reaction system. Based on the 10-year simulation results, we conclude that salt transport modelling does not require daily boundary conditions, a fine spatial resolution, or complex salt chemistry. Instead, if the focus is on long-term salinity, then a simplified modelling approach can be used, using annually averaged boundary conditions, a coarse spatial discretization, and inclusion of soil chemistry that only accounts for cation exchange and gypsum dissolution,precipitation. We also demonstrate that prediction errors due to these model simplifications may be small, when compared with effects of parameter uncertainty on model predictions. The proposed model simplifications lead to larger time steps and reduced computer simulation times by a factor of 1000. Copyright © 2006 John Wiley & Sons, Ltd. [source]


Rotamer optimization for protein design through MAP estimation and problem-size reduction

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 12 2009
Eun-Jong Hong
Abstract The search for the global minimum energy conformation (GMEC) of protein side chains is an important computational challenge in protein structure prediction and design. Using rotamer models, the problem is formulated as a NP-hard optimization problem. Dead-end elimination (DEE) methods combined with systematic A* search (DEE/A*) has proven useful, but may not be strong enough as we attempt to solve protein design problems where a large number of similar rotamers is eligible and the network of interactions between residues is dense. In this work, we present an exact solution method, named BroMAP (branch-and-bound rotamer optimization using MAP estimation), for such protein design problems. The design goal of BroMAP is to be able to expand smaller search trees than conventional branch-and-bound methods while performing only a moderate amount of computation in each node, thereby reducing the total running time. To achieve that, BroMAP attempts reduction of the problem size within each node through DEE and elimination by lower bounds from approximate maximum-a-posteriori (MAP) estimation. The lower bounds are also exploited in branching and subproblem selection for fast discovery of strong upper bounds. Our computational results show that BroMAP tends to be faster than DEE/A* for large protein design cases. BroMAP also solved cases that were not solved by DEE/A* within the maximum allowed time, and did not incur significant disadvantage for cases where DEE/A* performed well. Therefore, BroMAP is particularly applicable to large protein design problems where DEE/A* struggles and can also substitute for DEE/A* in general GMEC search. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009 [source]


Efficient calculation of configurational entropy from molecular simulations by combining the mutual-information expansion and nearest-neighbor methods,,

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 10 2008
Vladimir Hnizdo
Abstract Changes in the configurational entropies of molecules make important contributions to the free energies of reaction for processes such as protein-folding, noncovalent association, and conformational change. However, obtaining entropy from molecular simulations represents a long-standing computational challenge. Here, two recently introduced approaches, the nearest-neighbor (NN) method and the mutual-information expansion (MIE), are combined to furnish an efficient and accurate method of extracting the configurational entropy from a molecular simulation to a given order of correlations among the internal degrees of freedom. The resulting method takes advantage of the strengths of each approach. The NN method is entirely nonparametric (i.e., it makes no assumptions about the underlying probability distribution), its estimates are asymptotically unbiased and consistent, and it makes optimum use of a limited number of available data samples. The MIE, a systematic expansion of entropy in mutual information terms of increasing order, provides a well-characterized approximation for lowering the dimensionality of the numerical problem of calculating the entropy of a high-dimensional system. The combination of these two methods enables obtaining well-converged estimations of the configurational entropy that capture many-body correlations of higher order than is possible with the simple histogramming that was used in the MIE method originally. The combined method is tested here on two simple systems: an idealized system represented by an analytical distribution of six circular variables, where the full joint entropy and all the MIE terms are exactly known, and the R,S stereoisomer of tartaric acid, a molecule with seven internal-rotation degrees of freedom for which the full entropy of internal rotation has been already estimated by the NN method. For these two systems, all the expansion terms of the full MIE of the entropy are estimated by the NN method and, for comparison, the MIE approximations up to third order are also estimated by simple histogramming. The results indicate that the truncation of the MIE at the two-body level can be an accurate, computationally nondemanding approximation to the configurational entropy of anharmonic internal degrees of freedom. If needed, higher-order correlations can be estimated reliably by the NN method without excessive demands on the molecular-simulation sample size and computing time. © 2008 Wiley Periodicals, Inc. J Comput Chem, 2008 [source]


Radiation damage in macromolecular crystallography: what is it and why should we care?

ACTA CRYSTALLOGRAPHICA SECTION D, Issue 4 2010
Elspeth F. Garman
Radiation damage inflicted during diffraction data collection in macromolecular crystallography has re-emerged in the last decade as a major experimental and computational challenge, as even for crystals held at 100,K it can result in severe data-quality degradation and the appearance in solved structures of artefacts which affect biological interpretations. Here, the observable symptoms and basic physical processes involved in radiation damage are described and the concept of absorbed dose as the basic metric against which to monitor the experimentally observed changes is outlined. Investigations into radiation damage in macromolecular crystallography are ongoing and the number of studies is rapidly increasing. The current literature on the subject is compiled as a resource for the interested researcher. [source]


A Constructive Graphical Model Approach for Knowledge-Based Systems: A Vehicle Monitoring Case Study

COMPUTATIONAL INTELLIGENCE, Issue 3 2003
Y. Xiang
Graphical models have been widely applied to uncertain reasoning in knowledge-based systems. For many of the problems tackled, a single graphical model is constructed before individual cases are presented and the model is used to reason about each new case. In this work, we consider a class of problems whose solution requires inference over a very large number of models that are impractical to construct a priori. We conduct a case study in the domain of vehicle monitoring and then generalize the approach taken. We show that the previously held negative belief on the applicability of graphical models to such problems is unjustified. We propose a set of techniques based on domain decomposition, model separation, model approximation, model compilation, and re-analysis to meet the computational challenges imposed by the combinatorial explosion. Experimental results on vehicle monitoring demonstrated good performance at near-real-time. [source]


Comparison of single-nucleotide polymorphisms and microsatellite markers for linkage analysis in the COGA and simulated data sets for Genetic Analysis Workshop 14: Presentation Groups 1, 2, and 3

GENETIC EPIDEMIOLOGY, Issue S1 2005
Marsha A. Wilcox
Abstract The papers in presentation groups 1,3 of Genetic Analysis Workshop 14 (GAW14) compared microsatellite (MS) markers and single-nucleotide polymorphism (SNP) markers for a variety of factors, using multiple methods in both data sets provided to GAW participants. Group 1 focused on data provided from the Collaborative Study on the Genetics of Alcoholism (COGA). Group 2 focused on data simulated for the workshop. Group 3 contained analyses of both data sets. Issues examined included: information content, signal strength, localization of the signal, use of haplotype blocks, population structure, power, type I error, control of type I error, the effect of linkage disequilibrium, and computational challenges. There were several broad resulting observations. 1) Information content was higher for dense SNP marker panels than for MS panels, and dense SNP markers sets appeared to provide slightly higher linkage scores and slightly higher power to detect linkage than MS markers. 2) Dense SNP panels also gave higher type I errors, suggesting that increased test thresholds may be needed to maintain the correct error rate. 3) Dense SNP panels provided better trait localization, but only in the COGA data, in which the MS markers were relatively loosely spaced. 4) The strength of linkage signals did not vary with the density of SNP panels, once the marker density was ,1 SNP/cM. 5) Analyses with SNPs were computationally challenging, and identified areas where improvements in analysis tools will be necessary to make analysis practical for widespread use. Genet. Epidemiol. 29:(Suppl. 1): S7,S28, 2005. © 2005 Wiley-Liss, Inc. [source]


A-scalability and an integrated computational technology and framework for non-linear structural dynamics.

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, Issue 15 2003
Part 1: Theoretical developments, parallel formulations
Abstract For large-scale problems and large processor counts, the accuracy and efficiency with reduced solution times and attaining optimal parallel scalability of the entire transient duration of the simulation for general non-linear structural dynamics problems poses many computational challenges. For transient analysis, explicit time operators readily inherit algorithmic scalability and consequently enable parallel scalability. However, the key issues concerning parallel simulations via implicit time operators within the framework and encompassing the class of linear multistep methods include the totality of the following considerations to foster the proposed notion of A-scalability: (a) selection of robust scalable optimal time discretized operators that foster stabilized non-linear dynamic implicit computations both in terms of convergence and the number of non-linear iterations for completion of large-scale analysis of the highly non-linear dynamic responses, (b) selecting an appropriate scalable spatial domain decomposition method for solving the resulting linearized system of equations during the implicit phase of the non-linear computations, (c) scalable implementation models and solver technology for the interface and coarse problems for attaining parallel scalability of the computations, and (d) scalable parallel graph partitioning techniques. These latter issues related to parallel implicit formulations are of interest and focus in this paper. The former involving parallel explicit formulations are also a natural subset of the present framework and have been addressed previously in Reference 1 (Advances in Engineering Software 2000; 31: 639,647). In the present context, of the key issues, although a particular aspect or a solver as related to the spatial domain decomposition may be designed to be numerically scalable, the totality of the aforementioned issues simultaneously play an important and integral role to attain A-scalability of the parallel formulations for the entire transient duration of the simulation and is desirable for transient problems. As such, the theoretical developments of the parallel formulations are first detailed in Part 1 of this paper, and the subsequent practical applications and performance results of general non-linear structural dynamics problems are described in Part 2 of this paper to foster the proposed notion of A-scalability. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Identifying and reducing error in cluster-expansion approximations of protein energies

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 16 2010
Seungsoo Hahn
Abstract Protein design involves searching a vast space for sequences that are compatible with a defined structure. This can pose significant computational challenges. Cluster expansion is a technique that can accelerate the evaluation of protein energies by generating a simple functional relationship between sequence and energy. The method consists of several steps. First, for a given protein structure, a training set of sequences with known energies is generated. Next, this training set is used to expand energy as a function of clusters consisting of single residues, residue pairs, and higher order terms, if required. The accuracy of the sequence-based expansion is monitored and improved using cross-validation testing and iterative inclusion of additional clusters. As a trade-off for evaluation speed, the cluster-expansion approximation causes prediction errors, which can be reduced by including more training sequences, including higher order terms in the expansion, and/or reducing the sequence space described by thecluster expansion. This article analyzes the sources of error and introduces a method whereby accuracy can be improved by judiciously reducing the described sequence space. The method is applied to describe the sequence,stability relationship for several protein structures: coiled-coil dimers and trimers, a PDZ domain, and T4 lysozyme as examples with computationally derived energies, and SH3 domains in amphiphysin-1 and endophilin-1 as examples where the expanded pseudo-energies are obtained from experiments. Our open-source software package Cluster Expansion Version 1.0 allows users to expand their own energy function of interest and thereby apply cluster expansion to custom problems in protein design. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010 [source]


Maximum likelihood estimation in semiparametric regression models with censored data

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 4 2007
D. Zeng
Summary., Semiparametric regression models play a central role in formulating the effects of covariates on potentially censored failure times and in the joint modelling of incomplete repeated measures and failure times in longitudinal studies. The presence of infinite dimensional parameters poses considerable theoretical and computational challenges in the statistical analysis of such models. We present several classes of semiparametric regression models, which extend the existing models in important directions. We construct appropriate likelihood functions involving both finite dimensional and infinite dimensional parameters. The maximum likelihood estimators are consistent and asymptotically normal with efficient variances. We develop simple and stable numerical techniques to implement the corresponding inference procedures. Extensive simulation experiments demonstrate that the inferential and computational methods proposed perform well in practical settings. Applications to three medical studies yield important new insights. We conclude that there is no reason, theoretical or numerical, not to use maximum likelihood estimation for semiparametric regression models. We discuss several areas that need further research. [source]


Inference in molecular population genetics

JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES B (STATISTICAL METHODOLOGY), Issue 4 2000
Matthew Stephens
Full likelihood-based inference for modern population genetics data presents methodological and computational challenges. The problem is of considerable practical importance and has attracted recent attention, with the development of algorithms based on importance sampling (IS) and Markov chain Monte Carlo (MCMC) sampling. Here we introduce a new IS algorithm. The optimal proposal distribution for these problems can be characterized, and we exploit a detailed analysis of genealogical processes to develop a practicable approximation to it. We compare the new method with existing algorithms on a variety of genetic examples. Our approach substantially outperforms existing IS algorithms, with efficiency typically improved by several orders of magnitude. The new method also compares favourably with existing MCMC methods in some problems, and less favourably in others, suggesting that both IS and MCMC methods have a continuing role to play in this area. We offer insights into the relative advantages of each approach, and we discuss diagnostics in the IS framework. [source]