Home About us Contact | |||
Data Mining (data + mining)
Terms modified by Data Mining Selected AbstractsData Mining for Bioprocess OptimizationENGINEERING IN LIFE SCIENCES (ELECTRONIC), Issue 3 2004S. Rommel Abstract Although developed for completely different applications, the great technological potential of data analysis methods called "data mining" has increasingly been realized as a method for efficiently analyzing potentials for optimization and for troubleshooting within many application areas of process, technology. This paper presents the successful application of data mining methods for the optimization of a fermentation process, and discusses diverse characteristics of data mining for biological processes. For the optimization of biological processes a huge amount of possibly relevant process parameters exist. Those input variables can be parameters from devices as well as process control parameters. The main challenge of such optimizations is to robustly identify relevant combinations of parameters among a huge amount of process parameters. For the underlying process we found with the application of data mining methods, that the moment a special carbohydrate component is added has a strong impact on the formation of secondary components. The yield could also be increased by using 2 m3 fermentors instead of 1 m3 fermentors. [source] Multiple classifier integration for the prediction of protein structural classesJOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 14 2009Lei Chen Abstract Supervised classifiers, such as artificial neural network, partition trees, and support vector machines, are often used for the prediction and analysis of biological data. However, choosing an appropriate classifier is not straightforward because each classifier has its own strengths and weaknesses, and each biological dataset has its own characteristics. By integrating many classifiers together, people can avoid the dilemma of choosing an individual classifier out of many to achieve an optimized classification results (Rahman et al., Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variation, Springer, Berlin, 2002, 167,178). The classification algorithms come from Weka (Witten and Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, 2005) (a collection of software tools for machine learning algorithms). By integrating many predictors (classifiers) together through simple voting, the correct prediction (classification) rates are 65.21% and 65.63% for a basic training dataset and an independent test set, respectively. These results are better than any single machine learning algorithm collected in Weka when exactly the same data are used. Furthermore, we introduce an integration strategy which takes care of both classifier weightings and classifier redundancy. A feature selection strategy, called minimum redundancy maximum relevance (mRMR), is transferred into algorithm selection to deal with classifier redundancy in this research, and the weightings are based on the performance of each classifier. The best classification results are obtained when 11 algorithms are selected by mRMR method, and integrated together through majority votes with weightings. As a result, the prediction correct rates are 68.56% and 69.29% for the basic training dataset and the independent test dataset, respectively. The web-server is available at http://chemdata.shu.edu.cn/protein_st/. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009 [source] Data Mining for GoldNURSING FOR WOMENS HEALTH, Issue 1 2005Finding Buried Treasure in Unit Log Books First page of article [source] The Needs and Benefits of Applying Textual Data Mining within the Product Development ProcessQUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, Issue 1 2004Rakesh Menon Abstract As a result of the growing competition in recent years, new trends such as increased product complexity, changing customer requirements and shortening development time have emerged within the product development process (PDP). These trends have added more challenges to the already-difficult task of quality and reliability prediction and improvement. They have given rise to an increase in the number of unexpected events in the PDP. Traditional tools are only partially adequate to cover these unexpected events. As such, new tools are being sought to complement traditional ones. This paper investigates the use of one such tool, textual data mining for the purpose of quality and reliability improvement. The motivation for this paper stems from the need to handle ,loosely structured textual data' within the product development process. Thus far, most of the studies on data mining within the PDP have focused on numerical databases. In this paper, the need for the study of textual databases is established. Possible areas within a generic PDP for consumer and professional products, where textual data mining could be employed are highlighted. In addition, successful implementations of textual data mining within two large multi-national companies are presented. Copyright © 2003 John Wiley & Sons, Ltd. [source] Scientific Data Mining: A Practical Perspective.BIOMETRICAL JOURNAL, Issue 3 2010No abstract is available for this article. [source] Re-Engineering the Immigration System: A Case for Data Mining and Information Assurance to Enhance Homeland Security: Part I: Identifying the Current ProblemsBULLETIN OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE & TECHNOLOGY (ELECTRONIC), Issue 1 2002Lee S. Strickland Visiting Professor First page of article [source] Re-Engineering the Immigration System: A Case for Data Mining and Information Assurance to Enhance Homeland Security: Part II: Where Do We Go from Here?BULLETIN OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE & TECHNOLOGY (ELECTRONIC), Issue 1 2002Lee S. Strickland Visiting Professor First page of article [source] Coping With Missing Attribute Values Based on Closest Fit in Preterm Birth Data: A Rough Set ApproachCOMPUTATIONAL INTELLIGENCE, Issue 3 2001Jerzy W. Grzymala-Busse Data mining is frequently applied to data sets with missing attribute values. A new approach to missing attribute values, called closest fit, is introduced in this paper. In this approach, for a given case (example) with a missing attribute value we search for another case that is as similar as possible to the given case. Cases can be considered as vectors of attribute values. The search is for the case that has as many as possible identical attribute values for symbolic attributes, or as the smallest possible value differences for numerical attributes. There are two possible ways to conduct a search: within the same class (concept) as the case with the missing attribute values, or for the entire set of all cases. For comparison, we also experimented with another approach to missing attribute values, where the missing values are replaced by the most common value of the attribute for symbolic attributes or by the average value for numerical attributes. All algorithms were implemented in the system OOMIS. Our experiments were performed on the preterm birth data sets provided by the Duke University Medical Center. [source] Data mining: How hackers steal sensitive electronic informationJOURNAL OF CORPORATE ACCOUNTING & FINANCE, Issue 4 2009Gordon Smith You may want to rethink your electronic auditing procedures,and others,after you read this article. Computer security expert Gordon Smith reveals how hackers can easily penetrate your safeguards. © 2008 Canaudit Inc. [source] Data mining of fractured experimental data using neurofuzzy logic,discovering and integrating knowledge hidden in multiple formulation databases for a fluid-bed granulation processJOURNAL OF PHARMACEUTICAL SCIENCES, Issue 6 2008Q. Shao Abstract In the pharmaceutical field, current practice in gaining process understanding by data analysis or knowledge discovery has generally focused on dealing with single experimental databases. This limits the level of knowledge extracted in the situation where data from a number of sources, so called fractured data, contain interrelated information. This situation is particularly relevant for complex processes involving a number of operating variables, such as a fluid-bed granulation. This study investigated three data mining strategies to discover and integrate knowledge "hidden" in a number of small experimental databases for a fluid-bed granulation process using neurofuzzy logic technology. Results showed that more comprehensive domain knowledge was discovered from multiple databases via an appropriate data mining strategy. This study also demonstrated that the textual information excluded in individual databases was a critical parameter and often acted as the precondition for integrating knowledge extracted from different databases. Consequently generic knowledge of the domain was discovered, leading to an improved understanding of the granulation process. © 2007 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 97:2091,2101, 2008 [source] Applications of ACORN to data at 1.45 Å resolutionJOURNAL OF SYNCHROTRON RADIATION, Issue 1 2004V. Rajakannan One of the main interests in the molecular biosciences is in understanding structure,function relations and X-ray crystallography plays a major role in this. ACORN can be used as a comprehensive and efficient phasing procedure for the determination of protein structures when atomic resolution data are available. An initial model can automatically be built by ARP/wARP followed by REFMAC for refinement. The , helices and , sheets occurring in many protein structures can be taken as starting fragments for structure solution in ACORN. ACORN, along with ARP/wARP followed by REFMAC, can be an ab initio method for solving protein structure for which data are better than 1.2 Å (atomic resolution). Attempts are here made in extending its applications to real data at 1.45 Å resolution and also to truncated data at 1.6 Å resolution. Two previously known structures, congerin II and alkaline cellulase N257, were resolved using the above approach. Automatic structure solution, phasing and refinement for real data at still lower resolutions for proteins of various complexities are being carried out. Data mining of the secondary structural features using PDB is being carried out for this new approach for `seed-phasing' to ACORN. [source] Data mining for signals in spontaneous reporting databases: proceed with caution,PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, Issue 4 2007Wendy P. Stephenson MD Abstract Purpose To provide commentary and points of caution to consider before incorporating data mining as a routine component of any Pharmacovigilance program, and to stimulate further research aimed at better defining the predictive value of these new tools as well as their incremental value as an adjunct to traditional methods of post-marketing surveillance. Methods/Results Commentary includes review of current data mining methodologies employed and their limitations, caveats to consider in the use of spontaneous reporting databases and caution against over-confidence in the results of data mining. Conclusions Future research should focus on more clearly delineating the limitations of the various quantitative approaches as well as the incremental value that they bring to traditional methods of pharmacovigilance. Copyright © 2006 John Wiley & Sons, Ltd. [source] Comparing data mining methods on the VAERS database,PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, Issue 9 2005David Banks PhD Abstract Purpose Data mining may enhance traditional surveillance of vaccine adverse events by identifying events that are reported more commonly after administering one vaccine than other vaccines. Data mining methods find signals as the proportion of times a condition or group of conditions is reported soon after the administration of a vaccine; thus it is a relative proportion compared across vaccines, and not an absolute rate for the condition. The Vaccine Adverse Event Reporting System (VAERS) contains approximately 150,000 reports of adverse events that are possibly associated with vaccine administration. Methods We studied four data mining techniques: empirical Bayes geometric mean (EBGM), lower-bound of the EBGM's 90% confidence interval (EB05), proportional reporting ratio (PRR), and screened PRR (SPRR). We applied these to the VAERS database and compared the agreement among methods and other performance properties, particularly focusing on the vaccine,event combinations with the highest numerical scores in the various methods. Results The vaccine,event combinations with the highest numerical scores varied substantially among the methods. Not all combinations representing known associations appeared in the top 100 vaccine,event pairs for all methods. Conclusions The four methods differ in their ranking of vaccine,COSTART pairs. A given method may be superior in certain situations but inferior in others. This paper examines the statistical relationships among the four estimators. Determining which method is best for public health will require additional analysis that focuses on the true alarm and false alarm rates using known vaccine,event associations. Evaluating the properties of these data mining methods will help determine the value of such methods in vaccine safety surveillance. Copyright © 2005 John Wiley & Sons, Ltd. [source] Screening strategy for the rapid detection of in vitro generated glutathione conjugates using high-performance liquid chromatography and low-resolution mass spectrometry in combination with LightSight® software for data processingRAPID COMMUNICATIONS IN MASS SPECTROMETRY, Issue 22 2009César Ramírez-Molina The knowledge of drug metabolism in the early phases of the drug discovery process is vital for minimising compound failure at later stages. As chemically reactive metabolites may cause adverse drug reactions, it is generally accepted that avoiding formation of reactive metabolites increases the chances of success of a molecule. In order to generate this important information, a screening strategy for the rapid detection of invitro generated reactive metabolites trapped by glutathione has been developed. The bioassay incorporated the use of native glutathione and its close analogue the glutathione ethyl ester. The generic conditions for detecting glutathione conjugates that undergo constant neutral loss of 129 Da were optimised using a glutathione-based test mix of four compounds. The final liquid chromatography/tandem mass spectrometry constant neutral loss method used low-resolution settings and a scanning window of 200 amu. Data mining was rapidly and efficiently performed using LightSight® software. Unambiguous identification of the glutathione conjugates was significantly facilitated by the analytical characteristics of the conjugate pairs formed with glutathione and glutathione ethyl ester, i.e. by chromatographic retention time and mass differences. The reliability and robustness of the screening strategy was tested using a number of compounds known to form reactive metabolites. Overall, the developed screening strategy provided comprehensive and reliable identification of glutathione conjugates and is well suited for rapid routine detection of trapped reactive metabolites. This new approach allowed the identification of a previously unreported diclofenac glutathione conjugate. Copyright © 2009 John Wiley & Sons, Ltd. [source] Spurious Regressions in Financial Economics?THE JOURNAL OF FINANCE, Issue 4 2003Wayne E. Ferson ABSTRACT Even though stock returns are not highly autocorrelated, there is a spurious regression bias in predictive regressions for stock returns related to the classic studies of Yule (1926) and Granger and Newbold (1974). Data mining for predictor variables interacts with spurious regression bias. The two effects reinforce each other, because more highly persistent series are more likely to be found significant in the search for predictor variables. Our simulations suggest that many of the regressions in the literature, based on individual predictor variables, may be spurious. [source] EURONEAR: Data mining of asteroids and Near Earth AsteroidsASTRONOMISCHE NACHRICHTEN, Issue 7 2009O. Vaduvescu Abstract Besides new observations, mining old photographic plates and CCD image archives represents an opportunity to recover and secure newly discovered asteroids, also to improve the orbits of Near Earth Asteroids (NEAs), Potentially Hazardous Asteroids (PHAs) and Virtual Impactors (VIs). These are the main research aims of the EURONEAR network. As stated by the IAU, the vast collection of image archives stored worldwide is still insufficiently explored, and could be mined for known NEAs and other asteroids appearing occasionally in their fields. This data mining could be eased using a server to search and classify findings based on the asteroid class and the discovery date as "precoveries" or "recoveries". We built PRECOVERY, a public facility which uses the Virtual Observatory SkyBoT webservice of IMCCE to search for all known Solar System objects in a given observation. To datamine an entire archive, PRECOVERY requires the observing log in a standard format and outputs a database listing the sorted encounters of NEAs, PHAs, numbered and un-numbered asteroids classified as precoveries or recoveries based on the daily updated IAU MPC database. As a first application, we considered an archive including about 13 000 photographic plates exposed between 1930 and 2005 at the Astronomical Observatory in Bucharest, Romania. Firstly, we updated the database, homogenizing dates and pointings to a common format using the JD dating system and J2000 epoch. All the asteroids observed in planned mode were recovered, proving the accuracy of PRECOVERY. Despite the large field of the plates imaging mostly 2.27° × 2.27° fields, no NEA or PHA could be encountered occasionally in the archive due to the small aperture of the 0.38m refractor insufficiently to detect objects fainter than V , 15. PRECOVERY can be applied to other archives, being intended as a public facility offered to the community by the EURONEAR project. This is the first of a series of papers aimed to improve orbits of PHAs and NEAs using precovered data derived from archives of images to be data mined in collaboration with students and amateurs. In the next paper we will search the CFHT Legacy Survey, while data mining of other archives is planned for the near future (© 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) [source] Data mining: proprietary rights, people and proposalsBUSINESS ETHICS: A EUROPEAN REVIEW, Issue 3 2009Dinah Payne This article focuses on the issue of data mining as it relates to the consumer and to the issue of whether the consumer's private information has any proprietary status. A brief review of data mining is provided as a background for a better understanding of the purposes and uses of data mining. Also examined are several issues of the ethics of data mining, including a review of stakeholders, who they are and which may be most seriously affected by unethical data mining practices. Several suggestions for the improvement of data mining as it relates to the consumer are further presented: suggestions that would allow for data mining that would be beneficial to both the business community and the consumer. [source] Clustering revealed in high-resolution simulations and visualization of multi-resolution features in fluid,particle modelsCONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 2 2003Krzysztof Boryczko Abstract Simulating natural phenomena at greater accuracy results in an explosive growth of data. Large-scale simulations with particles currently involve ensembles consisting of between 106 and 109 particles, which cover 105,106 time steps. Thus, the data files produced in a single run can reach from tens of gigabytes to hundreds of terabytes. This data bank allows one to reconstruct the spatio-temporal evolution of both the particle system as a whole and each particle separately. Realistically, for one to look at a large data set at full resolution at all times is not possible and, in fact, not necessary. We have developed an agglomerative clustering technique, based on the concept of a mutual nearest neighbor (MNN). This procedure can be easily adapted for efficient visualization of extremely large data sets from simulations with particles at various resolution levels. We present the parallel algorithm for MNN clustering and its timings on the IBM SP and SGI/Origin 3800 multiprocessor systems for up to 16 million fluid particles. The high efficiency obtained is mainly due to the similarity in the algorithmic structure of MNN clustering and particle methods. We show various examples drawn from MNN applications in visualization and analysis of the order of a few hundred gigabytes of data from discrete particle simulations, using dissipative particle dynamics and fluid particle models. Because data clustering is the first step in this concept extraction procedure, we may employ this clustering procedure to many other fields such as data mining, earthquake events and stellar populations in nebula clusters. Copyright © 2003 John Wiley & Sons, Ltd. [source] Metrics in the Science of SurgeACADEMIC EMERGENCY MEDICINE, Issue 11 2006Jonathan A. Handler MD Metrics are the driver to positive change toward better patient care. However, the research into the metrics of the science of surge is incomplete, research funding is inadequate, and we lack a criterion standard metric for identifying and quantifying surge capacity. Therefore, a consensus working group was formed through a "viral invitation" process. With a combination of online discussion through a group e-mail list and in-person discussion at a breakout session of the Academic Emergency Medicine 2006 Consensus Conference, "The Science of Surge," seven consensus statements were generated. These statements emphasize the importance of funded research in the area of surge capacity metrics; the utility of an emergency medicine research registry; the need to make the data available to clinicians, administrators, public health officials, and internal and external systems; the importance of real-time data, data standards, and electronic transmission; seamless integration of data capture into the care process; the value of having data available from a single point of access through which data mining, forecasting, and modeling can be performed; and the basic necessity of a criterion standard metric for quantifying surge capacity. Further consensus work is needed to select a criterion standard metric for quantifying surge capacity. These consensus statements cover the future research needs, the infrastructure needs, and the data that are needed for a state-of-the-art approach to surge and surge capacity. [source] Data Mining for Bioprocess OptimizationENGINEERING IN LIFE SCIENCES (ELECTRONIC), Issue 3 2004S. Rommel Abstract Although developed for completely different applications, the great technological potential of data analysis methods called "data mining" has increasingly been realized as a method for efficiently analyzing potentials for optimization and for troubleshooting within many application areas of process, technology. This paper presents the successful application of data mining methods for the optimization of a fermentation process, and discusses diverse characteristics of data mining for biological processes. For the optimization of biological processes a huge amount of possibly relevant process parameters exist. Those input variables can be parameters from devices as well as process control parameters. The main challenge of such optimizations is to robustly identify relevant combinations of parameters among a huge amount of process parameters. For the underlying process we found with the application of data mining methods, that the moment a special carbohydrate component is added has a strong impact on the formation of secondary components. The yield could also be increased by using 2 m3 fermentors instead of 1 m3 fermentors. [source] The application of knowledge discovery in databases to post-marketing drug safety: example of the WHO databaseFUNDAMENTAL & CLINICAL PHARMACOLOGY, Issue 2 2008A. Bate Abstract After market launch, new information on adverse effects of medicinal products is almost exclusively first highlighted by spontaneous reporting. As data sets of spontaneous reports have become larger, and computational capability has increased, quantitative methods have been increasingly applied to such data sets. The screening of such data sets is an application of knowledge discovery in databases (KDD). Effective KDD is an iterative and interactive process made up of the following steps: developing an understanding of an application domain, creating a target data set, data cleaning and pre-processing, data reduction and projection, choosing the data mining task, choosing the data mining algorithm, data mining, interpretation of results and consolidating and using acquired knowledge. The process of KDD as it applies to the analysis of spontaneous reports can be exemplified by its routine use on the 3.5 million suspected adverse drug reaction (ADR) reports in the WHO ADR database. Examples of new adverse effects first highlighted by the KDD process on WHO data include topiramate glaucoma, infliximab vasculitis and the association of selective serotonin reuptake inhibitors (SSRIs) and neonatal convulsions. The KDD process has already improved our ability to highlight previously unsuspected ADRs for clinical review in spontaneous reporting, and we anticipate that such techniques will be increasingly used in the successful screening of other healthcare data sets such as patient records in the future. [source] New multivariate test for linkage, with application to pleiotropy: Fuzzy Haseman-ElstonGENETIC EPIDEMIOLOGY, Issue 4 2003Belhassen Kaabi Abstract We propose a new method of linkage analysis based on using the grade of membership scores resulting from fuzzy clustering procedures to define new dependent variables for the various Haseman-Elston approaches. For a single continuous trait with low heritability, the aim was to identify subgroups such that the grade of membership scores to these subgroups would provide more information for linkage than the original trait. For a multivariate trait, the goal was to provide a means of data reduction and data mining. Simulation studies using continuous traits with relatively low heritability (H=0.1, 0.2, and 0.3) showed that the new approach does not enhance power for a single trait. However, for a multivariate continuous trait (with three components), it is more powerful than the principal component method and more powerful than the joint linkage test proposed by Mangin et al. ([1998] Biometrics 54:88,99) when there is pleiotropy. Genet Epidemiol 24:253,264, 2003. © 2003 Wiley-Liss, Inc. [source] Comparative gene expression profiling of olfactory ensheathing glia and Schwann cells indicates distinct tissue repair characteristics of olfactory ensheathing gliaGLIA, Issue 12 2008Elske H.P. Franssen Abstract Olfactory ensheathing glia (OEG) are a specialized type of glia that support the growth of primary olfactory axons from the neuroepithelium in the nasal cavity to the brain. Transplantation of OEG in the injured spinal cord promotes sprouting of injured axons and results in reduced cavity formation, enhanced axonal and tissue sparing, remyelination, and angiogenesis. Gene expression analysis may help to identify the molecular mechanisms underlying the ability of OEG to recreate an environment that supports regeneration in the central nervous system. Here, we compared the transcriptome of cultured OEG (cOEG) with the transcriptomes of cultured Schwann cells (cSCs) and of OEG directly obtained from their natural environment (nOEG), the olfactory nerve layer of adult rats. Functional data mining by Gene Ontology (GO)-analysis revealed a number of overrepresented GO-classes associated with tissue repair. These classes include "response to wounding," "blood vessel development," "cell adhesion," and GO-classes related to the extracellular matrix and were overrepresented in the set of differentially expressed genes between both comparisons. The current screening approach combined with GO-analysis has identified distinct molecular properties of OEG that may underlie their efficacy and interaction with host tissue after implantation in the injured spinal cord. These observations can form the basis for studies on the function of novel target molecules for therapeutic intervention after neurotrauma. © 2008 Wiley-Liss, Inc. [source] Upregulation of the tumor suppressor gene menin in hepatocellular carcinomas and its significance in fibrogenesis,HEPATOLOGY, Issue 5 2006Pierre J. Zindy The molecular mechanisms underlying the progression of cirrhosis toward hepatocellular carcinoma were investigated by a combination of DNA microarray analysis and literature data mining. By using a microarray screening of suppression subtractive hybridization cDNA libraries, we first analyzed genes differentially expressed in tumor and nontumor livers with cirrhosis from 15 patients with hepatocellular carcinomas. Seventy-four genes were similarly recovered in tumor (57.8% of differentially expressed genes) and adjacent nontumor tissues (64% of differentially expressed genes) compared with histologically normal livers. Gene ontology analyses revealed that downregulated genes (n = 35) were mostly associated with hepatic functions. Upregulated genes (n = 39) included both known genes associated with extracellular matrix remodeling, cell communication, metabolism, and post-transcriptional regulation gene (e.g., ZFP36L1), as well as the tumor suppressor gene menin (multiple endocrine neoplasia type 1; MEN1). MEN1 was further identified as an important node of a regulatory network graph that integrated array data with array-independent literature mining. Upregulation of MEN1 in tumor was confirmed in an independent set of samples and associated with tumor size (P = .016). In the underlying liver with cirrhosis, increased steady-state MEN1 mRNA levels were correlated with those of collagen ,2(I) mRNA (P < .01). In addition, MEN1 expression was associated with hepatic stellate cell activation during fibrogenesis and involved in transforming growth factor beta (TGF-,),dependent collagen ,2(I) regulation. In conclusion, menin is a key regulator of gene networks that are activated in fibrogenesis associated with hepatocellular carcinoma through the modulation of TGF-, response. (HEPATOLOGY 2006;44:1296,1307.) [source] Diatomaceous Lessons in Nanotechnology and Advanced MaterialsADVANCED MATERIALS, Issue 29 2009Dusan Losic Abstract Silicon, in its various forms, finds widespread use in electronic, optical, and structural materials. Research on uses of silicon and silica has been intense for decades, raising the question of how much diversity is left for innovation with this element. Shape variation is particularly well examined. Here, we review the principles revealed by diatom frustules, the porous silica shells of diatoms, microscopic, unicellular algae. The frustules have nanometer-scale detail, and the almost 100,000 species with unique frustule morphologies suggest nuanced structural and optical functions well beyond the current ranges used in advanced materials. The unique frustule morphologies have arisen through tens of millions of years of evolutionary selection, and so are likely to reflect optimized design and function. Performing the structural and optical equivalent of data mining, and understanding and adopting these designs, affords a new paradigm in materials science, an alternative to combinatorial materials synthesis approaches in spurring the development of new material and more nuanced materials. [source] Combining random forest and copula functions: A heuristic approach for selecting assets from a financial crisis perspectiveINTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE & MANAGEMENT, Issue 2 2010Giovanni De Luca Abstract In this paper we propose a heuristic strategy aimed at selecting and analysing a set of financial assets, focusing attention on their multivariate tail dependence structure. The selection, obtained through an algorithmic procedure based on data mining tools, assumes the existence of a reference asset we are specifically interested to. The procedure allows one to opt for two alternatives: to prefer those assets exhibiting either a minimum lower tail dependence or a maximum upper tail dependence. The former could be a recommendable opportunity in a financial crisis period. For the selected assets, the tail dependence coefficients are estimated by means of a proper multivariate copula function. Copyright © 2010 John Wiley & Sons, Ltd. [source] Integrating intelligent systems into marketing to support market segmentation decisionsINTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE & MANAGEMENT, Issue 3 2006Sally MckechnieArticle first published online: 13 MAR 200 For the last 50 years market segmentation has been considered to be a key concept in marketing strategy. As a means of tackling market heterogeneity, the underlying logic and managerial rationale for market segmentation is well established in the marketing literature. However, there is evidence to suggest that attempts by organizations to classify customers into distinct segments for whom product or services can be specifically tailored are proving to be difficult to implement in practice. As the business environment in which many organizations operate becomes increasingly uncertain and highly competitive, greater importance is now being attached to marketing knowledge. The purpose of this paper is to highlight market segmentation problems as a relevant area for a greater level of engagement of intelligent systems academic researchers and practitioners with their counterparts within the marketing discipline, in order to explore how data mining approaches can assist marketers in gaining valuable insights into patterns of consumer behaviour, which can then be used to inform market segmentation decision-making. Since the application of data mining within the marketing domain is only in its infancy, a research agenda is proposed to encourage greater interdisciplinary collaboration between information systems and marketing so that data mining can more noticeably enter the repertoire of analytical techniques being employed for segmentation. Copyright © 2007 John Wiley & Sons, Ltd. [source] A novel clustering algorithm using hypergraph-based granular computingINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 2 2010Qun Liu Clustering is an important technique in data mining. In this paper, we introduce a new clustering algorithm. This algorithm, based on granular computing, constructs a hypergraph (simplicial complex) by the hypergraph bisection algorithm. It will discover the similarities and associations among documents. In some experiments on Web data, the proposed algorithm is used; the results are quite satisfactory. © 2009 Wiley Periodicals, Inc. [source] Intelligent Fril/SQL interrogatorINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 3 2007Dong (Walter) Xie The intelligent Fril/SQL interrogator is an object-oriented and knowledge-based support query system, which is implemented by the set of logic objects linking one another. These logic objects integrate SQL query, support logic programming language,Fril and Fril query together by processing them in sequence in slots of each logic object. This approach therefore takes advantage of both object-oriented system and a logic programming-based system. Fuzzy logic data mining and a machine learning tool kit built in the intelligent interrogator can automatically provide a knowledge base or rules to assist a human to analyze huge data sets or create intelligent controllers. Alternatively, users can write or edit the knowledge base or rules according to their requirements, so that the intelligent interrogator is also a support logic programming environment where users can write and run various Fril programs through these logic objects. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 279,302, 2007. [source] A data warehouse/online analytic processing framework for web usage mining and business intelligence reportingINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 7 2004Xiaohua Hu Web usage mining is the application of data mining techniques to discover usage patterns and behaviors from web data (clickstream, purchase information, customer information, etc.) in order to understand and serve e-commerce customers better and improve the online business. In this article, we present a general data warehouse/online analytic processing (OLAP) framework for web usage mining and business intelligence reporting. When we integrate the web data warehouse construction, data mining, and OLAP into the e-commerce system, this tight integration dramatically reduces the time and effort for web usage mining, business intelligence reporting, and mining deployment. Our data warehouse/OLAP framework consists of four phases: data capture, webhouse construction (clickstream marts), pattern discovery and cube construction, and pattern evaluation and deployment. We discuss data transformation operations for web usage mining and business reporting in clickstream, session, and customer levels; describe the problems and challenging issues in each phase in detail; provide plausible solutions to the issues; and demonstrate the framework with some examples from some real web sites. Our data warehouse/OLAP framework has been integrated into some commercial e-commerce systems. We believe this data warehouse/OLAP framework would be very useful for developing any real-world web usage mining and business intelligence reporting systems. © 2004 Wiley Periodicals, Inc. [source] |