Clustering Procedure (clustering + procedure)

Distribution by Scientific Domains


Selected Abstracts


Clustering revealed in high-resolution simulations and visualization of multi-resolution features in fluid,particle models

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 2 2003
Krzysztof Boryczko
Abstract Simulating natural phenomena at greater accuracy results in an explosive growth of data. Large-scale simulations with particles currently involve ensembles consisting of between 106 and 109 particles, which cover 105,106 time steps. Thus, the data files produced in a single run can reach from tens of gigabytes to hundreds of terabytes. This data bank allows one to reconstruct the spatio-temporal evolution of both the particle system as a whole and each particle separately. Realistically, for one to look at a large data set at full resolution at all times is not possible and, in fact, not necessary. We have developed an agglomerative clustering technique, based on the concept of a mutual nearest neighbor (MNN). This procedure can be easily adapted for efficient visualization of extremely large data sets from simulations with particles at various resolution levels. We present the parallel algorithm for MNN clustering and its timings on the IBM SP and SGI/Origin 3800 multiprocessor systems for up to 16 million fluid particles. The high efficiency obtained is mainly due to the similarity in the algorithmic structure of MNN clustering and particle methods. We show various examples drawn from MNN applications in visualization and analysis of the order of a few hundred gigabytes of data from discrete particle simulations, using dissipative particle dynamics and fluid particle models. Because data clustering is the first step in this concept extraction procedure, we may employ this clustering procedure to many other fields such as data mining, earthquake events and stellar populations in nebula clusters. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Color reduction for complex document images

INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, Issue 1 2009
Nikos Nikolaou
Abstract A new technique for color reduction of complex document images is presented in this article. It reduces significantly the number of colors of the document image (less than 15 colors in most of the cases) so as to have solid characters and uniform local backgrounds. Therefore, this technique can be used as a preprocessing step by text information extraction applications. Specifically, using the edge map of the document image, a representative set of samples is chosen that constructs a 3D color histogram. Based on these samples in the 3D color space, a relatively large number of colors (usually no more than 100 colors) are obtained by using a simple clustering procedure. The final colors are obtained by applying a mean-shift based procedure. Also, an edge preserving smoothing filter is used as a preprocessing stage that enhances significantly the quality of the initial image. Experimental results prove the method's capability of producing correctly segmented complex color documents where the character elements can be easily extracted as connected components. © 2009 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 19, 14,26, 2009 [source]


Automatic inference of protein quaternary structure from crystals

JOURNAL OF APPLIED CRYSTALLOGRAPHY, Issue 5 2003
Hannes Ponstingl
The arrangement of the subunits in an oligomeric protein often cannot be inferred without ambiguity from crystallographic studies. The annotation of the functional assembly of protein structures in the Protein Data Bank (PDB) is incomplete and frequently inconsistent. Instructions for the reconstruction, by symmetry, of the functional assembly from the deposited coordinates are often absent. An automatic procedure is proposed for the inference of assembly structures that are likely to be physiologically relevant. The method scores crystal contacts by their contact size and chemical complementarity. The subunit assembly is then inferred from these scored contacts by a clustering procedure involving a single adjustable parameter. When predicting the oligomeric state for a non-redundant set of 55 monomeric and 163 oligomeric proteins from dimers up to hexamers, a classification error rate of 16% was observed. [source]


Multispecies conservation planning: identifying landscapes for the conservation of viable populations using local and continental species priorities

JOURNAL OF APPLIED ECOLOGY, Issue 2 2007
REGAN EARLY
Summary 1Faced with unpredictable environmental change, conservation managers face the dual challenges of protecting species throughout their ranges and protecting areas where populations are most likely to persist in the long term. The former can be achieved by protecting locally rare species, to the potential detriment of protecting species where they are least endangered and most likely to survive in the long term. 2Using British butterflies as a model system, we compared the efficacy of two methods of identifying persistent areas of species' distributions: a single-species approach and a new multispecies prioritization tool called ZONATION. This tool identifies priority areas using population dynamic principles (prioritizing areas that contain concentrations of populations of each species) and the reserve selection principle of complementarity. 3ZONATION was generally able to identify the best landscapes for target (i.e. conservation priority) species. This ability was improved by assigning higher numerical weights to target species and implementing a clustering procedure to identify coherent biological management units. 4Weighting British species according to their European rather than UK status substantially increased the protection offered to species at risk throughout Europe. The representation of species that are rare or at risk in the UK, but not in Europe, was not greatly reduced when European weights were used, although some species of UK-only concern were no longer assigned protection inside their best landscapes. The analysis highlights potential consequences of implementing parochial vs. wider-world priorities within a region. 5Synthesis and applications. Wherever possible, reserve planning should incorporate an understanding of population processes to identify areas that are likely to support persistent populations. While the multispecies prioritization tool ZONATION compared favourably to the selection of ,best' areas for individual species, a user-defined input of species weights was required to produce satisfactory solutions for long-term conservation. Weighting species can allow international conservation priorities to be incorporated into regional action plans but the potential consequences of any putative solution should always be assessed to ensure that no individual species of local concern will be threatened. [source]


Unsupervised segmentation of predefined shapes in multivariate images

JOURNAL OF CHEMOMETRICS, Issue 4 2003
J. C. Noordam
Abstract Fuzzy C-means (FCM) is an unsupervised clustering technique that is often used for the unsupervised segmentation of multivariate images. In traditional FCM the clustering is based on spectral information only and the geometrical relationship between neighbouring pixels is not used in the clustering procedure. In this paper, the spatially guided FCM (SG-FCM) algorithm is presented which segments multivariate images by incorporating both spatial and spectral information. Spatial information is described by a geometrical shape description and can vary from a local neighbourhood to a more extended shape model such as Hough circle detection. A modified FCM objective function uses the spatial information as described by the shape model. This results in a segmented image in which the construction of the cluster prototypes is influenced by spatial information. The performance of SG-FCM is compared with both FCM and the sequence of FCM and a majority filter. The SG-FCM segmented image shows more homogeneous regions and less spurious pixels. Copyright © 2003 John Wiley & Sons, Ltd. [source]


Usefulness of Nonlinear Analysis of ECG Signals for Prediction of Inducibility of Sustained Ventricular Tachycardia by Programmed Ventricular Stimulation in Patients with Complex Spontaneous Ventricular Arrhythmias

ANNALS OF NONINVASIVE ELECTROCARDIOLOGY, Issue 3 2008
Ornella Durin M.D.
Introduction: The aim of our study was to assess the effectiveness of the nonlinear analysis (NLA) of ECG in predicting the results of invasive electrophysiologic study (EPS) in patients with ventricular arrhythmias. Methods: We evaluated 25 patients with history of cardiac arrest, syncope, sustained, or nonsustained ventricular tachycardia (VT). All patients underwent electrophysiologic study (EPS) and nonlinear analysis (NLA) of ECG. The study group was compared with a control group of 25 healthy subjects, in order to define the normal range of NLA. ECG was processed in order to obtain numerical values, which were analyzed by nonlinear mathematical functions. Patients were classified through the application of a clustering procedure to the whole set of functions, and the correlation between the results of nonlinear analysis of ECG and EPS was tested. Results: NLA assigned all patients with negative EPS to the same class of healthy subjects, whereas the patients in whom VT was inducible had been correctly and clearly isolated into a separate cluster. In our study, the result of NLA with application of the clustering technique was significantly correlated to that of EPS (P < 0.001), and was able to predict the result of EPS, with a negative predictive value of 100% and a positive predictive value of 100%. Conclusions: NLA can predict the results of EPS with good negative and positive predictive value. However, further studies are needed in order to verify the usefulness of this noninvasive tool for sudden death risk stratification in patients with ventricular arrhythmias. [source]


New multivariate test for linkage, with application to pleiotropy: Fuzzy Haseman-Elston

GENETIC EPIDEMIOLOGY, Issue 4 2003
Belhassen Kaabi
Abstract We propose a new method of linkage analysis based on using the grade of membership scores resulting from fuzzy clustering procedures to define new dependent variables for the various Haseman-Elston approaches. For a single continuous trait with low heritability, the aim was to identify subgroups such that the grade of membership scores to these subgroups would provide more information for linkage than the original trait. For a multivariate trait, the goal was to provide a means of data reduction and data mining. Simulation studies using continuous traits with relatively low heritability (H=0.1, 0.2, and 0.3) showed that the new approach does not enhance power for a single trait. However, for a multivariate continuous trait (with three components), it is more powerful than the principal component method and more powerful than the joint linkage test proposed by Mangin et al. ([1998] Biometrics 54:88,99) when there is pleiotropy. Genet Epidemiol 24:253,264, 2003. © 2003 Wiley-Liss, Inc. [source]