Large Databases (large + databases)

Distribution by Scientific Domains


Selected Abstracts


Interactive Visualization with Programmable Graphics Hardware

COMPUTER GRAPHICS FORUM, Issue 3 2002
Thomas Ertl
One of the main scientific goals of visualization is the development of algorithms and appropriate data models which facilitate interactive visual analysis and direct manipulation of the increasingly large data sets which result from simulations running on massive parallel computer systems, from measurements employing fast high-resolution sensors, or from large databases and hierarchical information spaces. This task can only be achieved with the optimization of all stages of the visualization pipeline: filtering, compression, and feature extraction of the raw data sets, adaptive visualization mappings which allow the users to choose between speed and accuracy, and exploiting new graphics hardware features for fast and high-quality rendering. The recent introduction of advanced programmability in widely available graphics hardware has already led to impressive progress in the area of volume visualization. However, besides the acceleration of the final rendering, flexible graphics hardware is increasingly being used also for the mapping and filtering stages of the visualization pipeline, thus giving rise to new levels of interactivity in visualization applications. The talk will present recent results of applying programmable graphics hardware in various visualization algorithms covering volume data, flow data, terrains, NPR rendering, and distributed and remote applications. [source]


Meeting Real,Time Traffic Flow Forecasting Requirements with Imprecise Computations

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, Issue 3 2003
Brian L. Smith
This article explores the ability of imprecise computations to address real,time computational requirements in infrastructure control and management systems. The research in this area focuses on the development of nonparametric regression as a means to forecast traffic flow rates for transportation management systems. Nonparametric regression is a forecasting technique based on nearest neighbor searching, in which forecasts are derived from past observations that are similar to current conditions. A key concern regarding nonparametric regression is the significant time required to search for nearest neighbors in large databases. The results presented in this article indicate that approximate nearest neighbors, which are imprecise computations as applied to nonparametric regression, may be used to adequately speed the execution time of nonparametric regression, with acceptable degradations in forecast accuracy. The article concludes with a demonstration of the use of genetic algorithms as a design aid for real,time algorithms employing imprecise computations. [source]


An artificial neural network based approach for online string matching/filtering of large databases,

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 4 2010
Tatiana Tambouratzis
A novel online approach to exact string matching and filtering of large databases is presented. String matching/filtering is based on artificial neural networks and operates in two stages: initially, a self-organizing map retrieves the cluster of database strings that are most similar to the query string; subsequently, a harmony theory network compares the retrieved strings with the query string and determines whether an exact match exists. The similarity measure is configured to the specific characteristics of the database so as to expose overall string similarity rather than character coincidence at homologous string locations. The experimental results demonstrate foolproof, fast, and practically database-size independent operation that is especially robust to database modifications. The proposed approach is put forward for general-purpose (directory, catalogue, glossary search) as well as Internet-oriented (e-mail blocking, URL, username classification) applications. © 2010 Wiley Periodicals, Inc. [source]


Identifying native-like protein structures using physics-based potentials

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 1 2002
Brian N. Dominy
Abstract As the field of structural genomics matures, new methods will be required that can accurately and rapidly distinguish reliable structure predictions from those that are more dubious. We present a method based on the CHARMM gas phase implicit hydrogen force field in conjunction with a generalized Born implicit solvation term that allows one to make such discrimination. We begin by analyzing pairs of threaded structures from the EMBL database, and find that it is possible to identify the misfolded structures with over 90% accuracy. Further, we find that misfolded states are generally favored by the solvation term due to the mispairing of favorable intramolecular ionic contacts. We also examine 29 sets of 29 misfolded globin sequences from Levitt's "Decoys ,R' Us" database generated using a sequence homology-based method. Again, we find that discrimination is possible with approximately 90% accuracy. Also, even in these less distorted structures, mispairing of ionic contacts results in a more favorable solvation energy for misfolded states. This is also found to be the case for collapsed, partially folded conformations of CspA and protein G taken from folding free energy calculations. We also find that the inclusion of the generalized Born solvation term, in postprocess energy evaluation, improves the correlation between structural similarity and energy in the globin database. This significantly improves the reliability of the hypothesis that more energetically favorable structures are also more similar to the native conformation. Additionally, we examine seven extensive collections of misfolded structures created by Park and Levitt using a four-state reduced model also contained in the "Decoys ,R' Us" database. Results from these large databases confirm those obtained in the EMBL and misfolded globin databases concerning predictive accuracy, the energetic advantage of misfolded proteins regarding the solvation component, and the improved correlation between energy and structural similarity due to implicit solvation. Z-scores computed for these databases are improved by including the generalized Born implicit solvation term, and are found to be comparable to trained and knowledge-based scoring functions. Finally, we briefly explore the dynamic behavior of a misfolded protein relative to properly folded conformations. We demonstrate that the misfolded conformation diverges quickly from its initial structure while the properly folded states remain stable. Proteins in this study are shown to be more stable than their misfolded counterparts and readily identified based on energetic as well as dynamic criteria. In summary, we demonstrate the utility of physics-based force fields in identifying native-like conformations in a variety of preconstructed structural databases. The details of this discrimination are shown to be dependent on the construction of the structural database. © 2002 Wiley Periodicals, Inc. J Comput Chem 23: 147,160, 2002 [source]


Detecting dyads of related individuals in large collections of DNA-profiles by controlling the false discovery rate

MOLECULAR ECOLOGY RESOURCES, Issue 4 2010
H. J. SKAUG
Abstract The search for pairs (dyads) of related individuals in large databases of DNA-profiles has become an increasingly important inference tool in ecology. However, the many, partly dependent, pairwise comparisons introduce statistical issues. We show that the false discovery rate (FDR) procedure is well suited to control for the proportion of false positives, i.e. dyads consisting of unrelated individuals, which under normal circumstances would have been labelled as related individuals. We verify the behaviour of the standard FDR procedure by simulation, demonstrating that the FDR procedure works satisfactory in spite of the many dependent pairwise comparisons involved in an exhaustive database screening. A computer program that implements this method is available online. In addition, we propose to implement a second stage in the procedure, in which additional independent genetic markers are used to identify the false positives. We demonstrate the application of the approach in an analysis of a DNA database consisting of 3300 individual minke whales (Balaenoptera acutorostrata) each typed at ten microsatellite loci. Applying the standard procedure with an FDR of 50% led to the identification of 74 putative dyads of 1st- or 2nd-order relatives. However, introducing the second step, which involved additional genotypes at 15 microsatellite loci, revealed that only 21 of the putative dyads can be claimed with high certainty to be true dyads. [source]


Structuring Chemical Space: Similarity-Based Characterization of the PubChem Database

MOLECULAR INFORMATICS, Issue 1-2 2010
Giovanni Cincilla
Abstract The ensemble of conceivable molecules is referred to as the Chemical Space. In this article we describe a hierarchical version of the Affinity Propagation (AP) clustering algorithm and apply it to analyze the LINGO-based similarity matrix of a 500 000-molecule subset of the PubChem database, which contains more than 19,million compounds. The combination of two highly efficient methods, namely the AP clustering algorithm and LINGO-based molecular similarity calculations, allows the unbiased analysis of large databases. Hierarchical clustering generates a numerical diagonalization of the similarity matrix. The target-independent, intrinsic structure of the database , derived without any previous information on the physical or biological properties of the compounds, maps together molecules experimentally shown to bind the same biological target or to have similar physical properties. [source]


Semi-automated risk estimation using large databases: quinolones and clostridium difficile associated diarrhea,

PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, Issue 6 2010
Robertino M. Mera
Abstract Purpose The availability of large databases with person time information and appropriate statistical methods allow for relatively rapid pharmacovigilance analyses. A semi-automated method was used to investigate the effect of fluoroquinolones on the incidence of C. difficile associated diarrhea (CDAD). Methods Two US databases, an electronic medical record (EMR) and a large medical claims database for the period 2006,2007 were evaluated using a semi-automated methodology. The raw EMR and claims datasets were subject to a normalization procedure that aligns the drug exposures and conditions using ontologies; Snowmed for medications and MedDRA for conditions. A retrospective cohort design was used together with matching by means of the propensity score. The association between exposure and outcome was evaluated using a Poisson regression model after taking into account potential confounders. Results A comparison between quinolones as the target cohort and macrolides as the comparison cohort produced a total of 564,797 subjects exposed to a quinolone in the claims data and 233,090 subjects in the EMR. They were matched with replacement within six strata of the propensity score. Among the matched cohorts there were a total of 488 and 158 outcomes in the claims and the EMR respectively. Quinolones were found to be twice more likely to be significantly associated with CDAD than macrolides adjusting for risk factors (IRR 2.75, 95%CI 2.18,3.48). Conclusions Use of a semi-automated method was successfully applied to two observational databases and was able to rapidly identify a potential for increased risk of developing CDAD with quinolones. Copyright © 2010 John Wiley & Sons, Ltd. [source]


Popitam: Towards new heuristic strategies to improve protein identification from tandem mass spectrometry data

PROTEINS: STRUCTURE, FUNCTION AND BIOINFORMATICS, Issue 6 2003
Patricia Hernandez
Abstract In recent years, proteomics research has gained importance due to increasingly powerful techniques in protein purification, mass spectrometry and identification, and due to the development of extensive protein and DNA databases from various organisms. Nevertheless, current identification methods from spectrometric data have difficulties in handling modifications or mutations in the source peptide. Moreover, they have low performance when run on large databases (such as genomic databases), or with low quality data, for example due to bad calibration or low fragmentation of the source peptide. We present a new algorithm dedicated to automated protein identification from tandem mass spectrometry (MS/MS) data by searching a peptide sequence database. Our identification approach shows promising properties for solving the specific difficulties enumerated above. It consists of matching theoretical peptide sequences issued from a database with a structured representation of the source MS/MS spectrum. The representation is similar to the spectrum graphs commonly used by de novo sequencing software. The identification process involves the parsing of the graph in order to emphazise relevant sections for each theoretical sequence, and leads to a list of peptides ranked by a correlation score. The parsing of the graph, which can be a highly combinatorial task, is performed by a bio-inspired algorithm called Ant Colony Optimization algorithm. [source]


The breast cancer experience of rural women: a literature review

PSYCHO-ONCOLOGY, Issue 10 2007
B. Ann Bettencourt
Abstract This report is a review of studies that focus on rural breast cancer survivorship. It includes a total of 14 studies using large databases and 27 other studies using qualitative and quantitative methods. In our review of this literature, we identified four broad themes, including access to treatment and treatment type, medical providers and health information, psychosocial adjustment and coping, and social support and psychological support services. We review the findings of the rural breast cancer survivorship studies within each of these broad themes. A few of the findings of the review include that rural and urban women receive different primary treatments for breast cancer, that rural women may have greater difficulty negotiating their traditional gender roles during and after treatment, that rural women desire greater health-related information about their breast cancer, and that rural women have less access to mental health therapy. The review discusses the implications of these findings as well as the weakness in the literature. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Image-based crystal detection: a machine-learning approach

ACTA CRYSTALLOGRAPHICA SECTION D, Issue 12 2008
Roy Liu
The ability of computers to learn from and annotate large databases of crystallization-trial images provides not only the ability to reduce the workload of crystallization studies, but also an opportunity to annotate crystallization trials as part of a framework for improving screening methods. Here, a system is presented that scores sets of images based on the likelihood of containing crystalline material as perceived by a machine-learning algorithm. The system can be incorporated into existing crystallization-analysis pipelines, whereby specialists examine images as they normally would with the exception that the images appear in rank order according to a simple real-valued score. Promising results are shown for 319,112 images associated with 150 structures solved by the Joint Center for Structural Genomics pipeline during the 2006,2007 year. Overall, the algorithm achieves a mean receiver operating characteristic score of 0.919 and a 78% reduction in human effort per set when considering an absolute score cutoff for screening images, while incurring a loss of five out of 150,structures. [source]


Calculation of ligand-nucleic acid binding free energies with the generalized-born model in DOCK

BIOPOLYMERS, Issue 2 2004
Xinshan Kang
Abstract The calculation of ligand-nucleic acid binding free energies is investigated by including solvation effects computed with the generalized-Born model. Modifications of the solvation module in DOCK, including introduction of all-atom parameters and revision of coefficients in front of different terms, are shown to improve calculations involving nucleic acids. This computing scheme is capable of calculating binding energies, with reasonable accuracy, for a wide variety of DNA-ligand complexes, RNA-ligand complexes, and even for the formation of double-stranded DNA. This implementation of GB/SA is also shown to be capable of discriminating strong ligands from poor ligands for a series of RNA aptamers without sacrificing the high efficiency of the previous implementation. These results validate this approach to screening large databases against nucleic acid targets. © 2003 Wiley Periodicals, Inc. Biopolymers 73:192,204, 2004 [source]


Birth Centers in Australia: A National Population-Based Study of Perinatal Mortality Associated with Giving Birth in a Birth Center

BIRTH, Issue 3 2007
Sally K Tracy DMid
ABSTRACT: Background: Perinatal mortality is a rare outcome among babies born at term in developed countries after normal uncomplicated pregnancies; consequently, the numbers involved in large databases of routinely collected statistics provide a meaningful evaluation of these uncommon events. The National Perinatal Data Collection records the place of birth and information on the outcomes of pregnancy and childbirth for all women who give birth each year in Australia. Our objective was to describe the perinatal mortality associated with giving birth in "alongside hospital" birth centers in Australia during 1999 to 2002 using nationally collected data. Methods: This population-based study included all 1,001,249 women who gave birth in Australia during 1999 to 2002. Of these women, 21,800 (2.18%) gave birth in a birth center. Selected perinatal outcomes (including stillbirths and neonatal deaths) were described for the 4-year study period separately for first-time mothers and for women having a second or subsequent birth. A further comparison was made between deaths of low-risk term babies born in hospitals compared with deaths of term babies born in birth centers. Results: The total perinatal death rate attributed to birth centers was significantly lower than that attributed to hospitals (1.51/1,000 vs 10.03/1,000). The perinatal mortality rate among term births to primiparas in birth centers compared with term births among low-risk primiparas in hospitals was 1.4 versus 1.9 per 1,000; the perinatal mortality rate among term births to multiparas in birth centers compared with term births among low-risk multiparas in hospitals was 0.6 versus 1.6 per 1,000. Conclusions: This study using Australian national data showed that the overall rate of perinatal mortality was lower in alongside hospital birth centers than in hospitals irrespective of the mother's parity. (BIRTH 34:3 September 2007) [source]


Methods for Computer-Aided Chemical Biology.

CHEMICAL BIOLOGY & DRUG DESIGN, Issue 6 2008
Bayesian Classification, Part 3: Analysis of Structure, Selectivity Relationships through Single- or Dual-Step Selectivity Searching
The identification of small molecules that are selective for individual targets within target families is an important task in chemical biology. We aim at the development of computational approaches for the study of structure,selectivity relationships and prediction of target-selective ligands. In previous studies, we have introduced the concept of selectivity searching. Here we study compound selectivity on the basis of 18 selectivity sets that are designed to contain target-selective molecules and compounds that are comparably active against related targets. These sets consist of a total of 432 compounds and focus on eight targets belonging to four target families. This compound source has enabled us to evaluate different computational approaches to search for target-selective compounds in large databases. These investigations have revealed a preferred search strategy to enrich database selection sets with target-selective compounds. The selectivity sets reported here are made publicly available to support the development of other computational tools for applications in chemical biology and medicinal chemistry. [source]