Knowledge Discovery (knowledge + discovery)

Distribution by Scientific Domains


Selected Abstracts


KDDML-G: a grid-enabled knowledge discovery system

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 13 2007
Andrea Romei
Abstract KDDML-G is a middleware language and system for knowledge discovery on the grid. The challenge that motivated the development of a grid-enabled version of the ,standalone' KDDML (Knowledge Discovery in Databases Markup Language) environment was on one side to exploit the parallelism offered by the grid environment, and on the other side to overcome the problem of data immovability, a quite frequent restriction on real-world data collections that has principally a privacy-preserving purpose. The last question is addressed by moving the code and ,mining' the data ,on the place', that is by adapting the computation to the availability and localization of the data. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Knowledge Discovery in Databases Using Formal Concept Analysis

BULLETIN OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE & TECHNOLOGY (ELECTRONIC), Issue 1 2000
Uta Priss Assistant Professor
No abstract is available for this article. [source]


The development of dentist practice profiles and management

JOURNAL OF EVALUATION IN CLINICAL PRACTICE, Issue 1 2009
Chinho Lin PhD
Abstract Rationale and objectives, With the current large computerized payment systems and increase in the number of claims, unusual dental practice patterns to cover up fraud are becoming widespread and sophisticated. Clustering the characteristic of dental practice patterns is an essential task for improving the quality of care and cost containment. This study aims at providing an easy, efficient and practical alternative approach to developing patterns of dental practice profiles. This will help the third-party payer to recognize and describe novel or unusual patterns of dental practice and thus adopt various strategies in order to prevent fraudulent claims and overcharges. Methodology, Knowledge discovery (or data mining) was used to cluster the dentists' profiles by carrying out clustering techniques based on the features of service rates. It is a hybrid of the knowledge discovery, statistical and artificial neural network methodologies that extracts knowledge from the dental claim database. Results, The results of clustering highlight characteristics related to dentists' practice patterns, and the detailed managerial guidance is illustrated to support the third-party payer in the management of various patterns of dentist practice. Conclusion, This study integrates the development of dentists' practice patterns with the knowledge discovery process. These findings will help the third-party payer to discriminate the patterns of practice, and also shed more light on the suspicious claims and practice patterns among dentists. [source]


BUILDING A DATA-MINING GRID FOR MULTIPLE HUMAN BRAIN DATA ANALYSIS

COMPUTATIONAL INTELLIGENCE, Issue 2 2005
Ning Zhong
E-science is about global collaboration in key areas of science such as cognitive science and brain science, and the next generation of infrastructure such as the Wisdom Web and Knowledge Grids. As a case study, we investigate human multiperception mechanism by cooperatively using various psychological experiments, physiological measurements, and data mining techniques for developing artificial systems which match human ability in specific aspects. In particular, we observe fMRI (functional magnetic resonance imaging) and EEG (electroencephalogram) brain activations from the viewpoint of peculiarity oriented mining and propose a way of peculiarity oriented mining for knowledge discovery in multiple human brain data. Based on such experience and needs, we concentrate on the architectural aspect of a brain-informatics portal from the perspective of the Wisdom Web and Knowledge Grids. We describe how to build a data-mining grid on the Wisdom Web for multiaspect human brain data analysis. The proposed methodology attempts to change the perspective of cognitive scientists from a single type of experimental data analysis toward a holistic view at a long-term, global field of vision. [source]


KDDML-G: a grid-enabled knowledge discovery system

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 13 2007
Andrea Romei
Abstract KDDML-G is a middleware language and system for knowledge discovery on the grid. The challenge that motivated the development of a grid-enabled version of the ,standalone' KDDML (Knowledge Discovery in Databases Markup Language) environment was on one side to exploit the parallelism offered by the grid environment, and on the other side to overcome the problem of data immovability, a quite frequent restriction on real-world data collections that has principally a privacy-preserving purpose. The last question is addressed by moving the code and ,mining' the data ,on the place', that is by adapting the computation to the availability and localization of the data. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Scientific workflow management and the Kepler system

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 10 2006
Bertram Ludäscher
Abstract Many scientific disciplines are now data and information driven, and new scientific knowledge is often gained by scientists putting together data analysis and knowledge discovery ,pipelines'. A related trend is that more and more scientific communities realize the benefits of sharing their data and computational services, and are thus contributing to a distributed data and computational community infrastructure (a.k.a. ,the Grid'). However, this infrastructure is only a means to an end and ideally scientists should not be too concerned with its existence. The goal is for scientists to focus on development and use of what we call scientific workflows. These are networks of analytical steps that may involve, e.g., database access and querying steps, data analysis and mining steps, and many other steps including computationally intensive jobs on high-performance cluster computers. In this paper we describe characteristics of and requirements for scientific workflows as identified in a number of our application projects. We then elaborate on Kepler, a particular scientific workflow system, currently under development across a number of scientific data management projects. We describe some key features of Kepler and its underlying Ptolemy II system, planned extensions, and areas of future research. Kepler is a community-driven, open source project, and we always welcome related projects and new contributors to join. Copyright © 2005 John Wiley & Sons, Ltd. [source]


A framework for evidence-based mental health care and policy

ACTA PSYCHIATRICA SCANDINAVICA, Issue 2006
L. Salvador-Carulla
Objective:, Care planning integrates a growing number of disciplines, research fields and analysis techniques. A framework of the main areas of interest with regard to evidence-based health care in mental health is provided here. Method:, The framework is based on the experience of working with data analysts and health and social decision makers at the PSICOST/RIRAG network, a Spanish research association which includes psychiatrists, health economists and health policy experts, as well as on a review of the literature. Results:, Three main areas have been identified and described here: outcomes management, knowledge discovery from data, and decision support systems. Their use in mental health care is reviewed. Conclusion:, It is important to promote bridging strategies among these new fields in order to enhance communication and information transfer between the different parts involved in mental health decision making: i) clinicians and epidemiologists, ii) data analysts, iii) care policy makers and other end-users. [source]


The application of knowledge discovery in databases to post-marketing drug safety: example of the WHO database

FUNDAMENTAL & CLINICAL PHARMACOLOGY, Issue 2 2008
A. Bate
Abstract After market launch, new information on adverse effects of medicinal products is almost exclusively first highlighted by spontaneous reporting. As data sets of spontaneous reports have become larger, and computational capability has increased, quantitative methods have been increasingly applied to such data sets. The screening of such data sets is an application of knowledge discovery in databases (KDD). Effective KDD is an iterative and interactive process made up of the following steps: developing an understanding of an application domain, creating a target data set, data cleaning and pre-processing, data reduction and projection, choosing the data mining task, choosing the data mining algorithm, data mining, interpretation of results and consolidating and using acquired knowledge. The process of KDD as it applies to the analysis of spontaneous reports can be exemplified by its routine use on the 3.5 million suspected adverse drug reaction (ADR) reports in the WHO ADR database. Examples of new adverse effects first highlighted by the KDD process on WHO data include topiramate glaucoma, infliximab vasculitis and the association of selective serotonin reuptake inhibitors (SSRIs) and neonatal convulsions. The KDD process has already improved our ability to highlight previously unsuspected ADRs for clinical review in spontaneous reporting, and we anticipate that such techniques will be increasingly used in the successful screening of other healthcare data sets such as patient records in the future. [source]


End-user access to multiple sources: incorporating knowledge discovery into knowledge management

INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE & MANAGEMENT, Issue 4 2002
Katharina Morik
The End-User Access to Multiple Sources,Eams system,integrates given information sources into a knowledge management system. It relates the world of documents with the database world using an ontology. The focus of developing the Eams system is on the acquisition and maintenance of knowledge. Hence, in both worlds, machine learning is applied. In the document world, a learning search engine adapts to user behaviour by analysing the click-through-data. This eases the personalization of selecting appropriate documents for users and does not require further maintenance. In the database world, knowledge discovery in databases (KDD) bridges the gap between the ,ne granularity of relational databases and the actual information needs of users. KDD extracts knowledge from data and, therefore, allows the knowledge management system to make good use of already existing company data,without further acquisition or maintenance. A graphical user interface provides users with a uniform access to document collections on the Internet (Intranet) as well as to relational databases. Since the ontology generates the items in the user interface, a change in the ontology automatically changes the user interface without further efforts. The Eams system has been applied to customer relationship management in the insurance domain. Questions to be answered by the system concern customer acquisition (e.g. direct marketing), customer up- and cross-selling (e.g. which products sell well together), and customer retention (here, which customers are likely to leave the insurance company or ask for a return of a capital life insurance). Documents about other insurance companies and demographic data published on the Internet contribute to the answers, as do the results of data analysis of the company's contracts. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Adaptive modeling and discovery in bioinformatics: The evolving connectionist approach

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 5 2008
Nikola Kasabov
Most biological processes that are currently being researched in bioinformatics are complex, dynamic processes that are difficult to model and understand. The paper presents evolving connectionist systems (ECOS) as a general approach to adaptive modeling and knowledge discovery in bioinformatics. This approach extends the traditional machine learning approaches with various adaptive learning and rule extraction procedures. ECOS belong to the class of incremental local learning and knowledge-based neural networks. They are applied here to challenging problems in Bioinformatics, such as: microarray gene expression profiling, gene regulatory network (GRN) modeling, computational neurogenetic modeling. The ECOS models have several advantages when compared to the traditional techniques: fast learning, incremental adaptation to new data, facilitating knowledge discovery through fuzzy rules. © 2008 Wiley Periodicals, Inc. [source]


Using the extended quarter degree grid cell system to unify mapping and sharing of biodiversity data

AFRICAN JOURNAL OF ECOLOGY, Issue 3 2009
R. Larsen
Abstract Information on the distribution of animal populations is essential for conservation planning and management. Unfortunately, shared coordinate-level data may have the potential to compromise sensitive species and generalized data are often shared instead to facilitate knowledge discovery and communication regarding species distributions. Sharing of generalized data is, unfortunately, often ad hoc and lacks scalable conventions that permit consistent sharing at larger scales and varying resolutions. One common convention in African applications is the Quarter Degree Grid Cells (QDGC) system. However, the current standard does not support unique references across the Equator and Prime Meridian. We present a method for extending QDGC nomenclature to support unique references at a continental scale for Africa. The extended QDGC provides an instrument for sharing generalized biodiversity data where laws, regulations or other formal considerations prevent or prohibit distribution of coordinate-level information. We recommend how the extended QDGC may be used as a standard, scalable solution for exchange of biodiversity information through development of tools for the conversion and presentation of multi-scale data at a variety of resolutions. In doing so, the extended QDGC represents an important alternative to existing approaches for generalized mapping and can help planners and researchers address conservation issues more efficiently. Résumé L'information sur la distribution des populations animales est essentielle pour la planification de la conservation et la gestion. Malheureusement, les données partagées au niveau des coordonnées risquent de compromettre les espèces sensibles, et les données généralisées sont souvent partagées pour faciliter la découverte et la communication des connaissances concernant la distribution des espèces. Le partage de données généralisées est, malheureusement, souvent opportuniste et manque de conventions mesurables qui permettraient le partage cohérent sur une plus grande échelle et à des résolutions variées. Une convention commune pour des applications africaines est le système de Quarter Degree Grid Cells (QDGC). Cependant, la norme actuelle ne supporte pas l'emploi des références uniques à travers l'Equateur et le premier méridien. Nous présentons une méthode pour étendre la nomenclature QDGC pour soutenir l'adoption de références uniques à l'échelle du continent, en Afrique. Le QDGC étendu fournit un instrument pour partager les données généralisées sur la biodiversité là où les lois, les réglementations et les autres considérations formelles empêchent ou interdisent la distribution de l'information au niveau coordonné. Nous disons dans quelle mesure le QDGC étendu peut être utilisé comme norme, une solution mesurable pour l'échange d'informations sur la biodiversité grâce au développement d'instruments pour la conversion et la présentation de données àéchelle multiple à des résolutions diverses. Ce faisant, le QDGC étendu représente une alternative importante aux approches existantes pour la cartographie généralisée et il peut aider les planificateurs et les chercheurs à traiter les problèmes de conservation plus efficacement. [source]


The development of dentist practice profiles and management

JOURNAL OF EVALUATION IN CLINICAL PRACTICE, Issue 1 2009
Chinho Lin PhD
Abstract Rationale and objectives, With the current large computerized payment systems and increase in the number of claims, unusual dental practice patterns to cover up fraud are becoming widespread and sophisticated. Clustering the characteristic of dental practice patterns is an essential task for improving the quality of care and cost containment. This study aims at providing an easy, efficient and practical alternative approach to developing patterns of dental practice profiles. This will help the third-party payer to recognize and describe novel or unusual patterns of dental practice and thus adopt various strategies in order to prevent fraudulent claims and overcharges. Methodology, Knowledge discovery (or data mining) was used to cluster the dentists' profiles by carrying out clustering techniques based on the features of service rates. It is a hybrid of the knowledge discovery, statistical and artificial neural network methodologies that extracts knowledge from the dental claim database. Results, The results of clustering highlight characteristics related to dentists' practice patterns, and the detailed managerial guidance is illustrated to support the third-party payer in the management of various patterns of dentist practice. Conclusion, This study integrates the development of dentists' practice patterns with the knowledge discovery process. These findings will help the third-party payer to discriminate the patterns of practice, and also shed more light on the suspicious claims and practice patterns among dentists. [source]


Knowing , in Medicine

JOURNAL OF EVALUATION IN CLINICAL PRACTICE, Issue 5 2008
Joachim P. Sturmberg MBBS DORACOG MFM PhD FRACGP
Abstract In this paper we argue that knowledge in health care is a multidimensional dynamic construct, in contrast to the prevailing idea of knowledge being an objective state. Polanyi demonstrated that knowledge is personal, that knowledge is discovered, and that knowledge has explicit and tacit dimensions. Complex adaptive systems science views knowledge simultaneously as a thing and a flow, constructed as well as in constant flux. The Cynefin framework is one model to help our understanding of knowledge as a personal construct achieved through sense making. Specific knowledge aspects temporarily reside in either one of four domains , the known, knowable, complex or chaotic, but new knowledge can only be created by challenging the known by moving it in and looping it through the other domains. Medical knowledge is simultaneously explicit and implicit with certain aspects already well known and easily transferable, and others that are not yet fully known and must still be learned. At the same time certain knowledge aspects are predominantly concerned with content, whereas others deal with context. Though in clinical care we may operate predominately in one knowledge domain, we also will operate some of the time in the others. Medical knowledge is inherently uncertain, and we require a context-driven flexible approach to knowledge discovery and application, in clinical practice as well as in health service planning. [source]


Data mining of fractured experimental data using neurofuzzy logic,discovering and integrating knowledge hidden in multiple formulation databases for a fluid-bed granulation process

JOURNAL OF PHARMACEUTICAL SCIENCES, Issue 6 2008
Q. Shao
Abstract In the pharmaceutical field, current practice in gaining process understanding by data analysis or knowledge discovery has generally focused on dealing with single experimental databases. This limits the level of knowledge extracted in the situation where data from a number of sources, so called fractured data, contain interrelated information. This situation is particularly relevant for complex processes involving a number of operating variables, such as a fluid-bed granulation. This study investigated three data mining strategies to discover and integrate knowledge "hidden" in a number of small experimental databases for a fluid-bed granulation process using neurofuzzy logic technology. Results showed that more comprehensive domain knowledge was discovered from multiple databases via an appropriate data mining strategy. This study also demonstrated that the textual information excluded in individual databases was a critical parameter and often acted as the precondition for integrating knowledge extracted from different databases. Consequently generic knowledge of the domain was discovered, leading to an improved understanding of the granulation process. © 2007 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 97:2091,2101, 2008 [source]


Engineering a search engine (weblib) and browser (knowledge navigator) for digital libraries: global knowledge discovery tools exclusively for librarians and libraries on the web

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 14 2002
V. Sreenivasulu
[source]


Here is the evidence, now what is the hypothesis?

BIOESSAYS, Issue 1 2004
The complementary roles of inductive, hypothesis-driven science in the post-genomic era
It is considered in some quarters that hypothesis-driven methods are the only valuable, reliable or significant means of scientific advance. Data-driven or ,inductive' advances in scientific knowledge are then seen as marginal, irrelevant, insecure or wrong-headed, while the development of technology,which is not of itself ,hypothesis-led' (beyond the recognition that such tools might be of value),must be seen as equally irrelevant to the hypothetico-deductive scientific agenda. We argue here that data- and technology-driven programmes are not alternatives to hypothesis-led studies in scientific knowledge discovery but are complementary and iterative partners with them. Many fields are data-rich but hypothesis-poor. Here, computational methods of data analysis, which may be automated, provide the means of generating novel hypotheses, especially in the post-genomic era. BioEssays 26:99,105, 2004. © 2003 Wiley Periodicals, Inc. [source]