Clustering Approach (clustering + approach)

Distribution by Scientific Domains


Selected Abstracts


Estimating within-field variation using a nonparametric density algorithm

ENVIRONMETRICS, Issue 5 2006
A. Castrignanň
Abstract The application of site-specific techniques and technologies in precision farming requires subdividing a field into a generally small number of contiguous homogeneous zones. The proposed algorithm of clustering is based on nonparametric density estimate, where a cluster is defined as a region surrounding a local maximum of the probability density function. Soil samples were collected in a 2-ha field of the experimental farm of the Agricultural Research Institute, located in Foggia (Southern Italy) and some of the most production-affecting soil properties were interpolated by using the geostatistical techniques of kriging and cokriging. The application of the clustering approach to the (co)kriged surface variables produced the subdivision of the field into five distinct classes. The proposed algorithm proves quite promising in identifying spatially contiguous zones, which are more homogeneous in soil properties than the whole-field. Its great advantage consists in giving an additional description of the residual variation within the class and such a piece of information is very useful in precision farming as a basis for the variable-rate application of agronomic inputs. Copyright © 2005 John Wiley & Sons, Ltd. [source]


Diagnostic evaluation of conceptual rainfall,runoff models using temporal clustering

HYDROLOGICAL PROCESSES, Issue 20 2010
N. J. de Vos
Abstract Given the structural shortcomings of conceptual rainfall,runoff models and the common use of time-invariant model parameters, these parameters can be expected to represent broader aspects of the rainfall,runoff relationship than merely the static catchment characteristics that they are commonly supposed to quantify. In this article, we relax the common assumption of time-invariance of parameters, and instead seek signature information about the dynamics of model behaviour and performance. We do this by using a temporal clustering approach to identify periods of hydrological similarity, allowing the model parameters to vary over the clusters found in this manner, and calibrating these parameters simultaneously. The diagnostic information inferred from these calibration results, based on the patterns in the parameter sets of the various clusters, is used to enhance the model structure. This approach shows how diagnostic model evaluation can be used to combine information from the data and the functioning of the hydrological model in a useful manner. Copyright © 2010 John Wiley & Sons, Ltd. [source]


Assessing early warning signals of currency crises: a fuzzy clustering approach

INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE & MANAGEMENT, Issue 4 2006
Shuhua Liu
In the period of 1990s alone, four waves of financial crises occurred around the world. The repeated occurrence of financial crises stimulated a large number of theoretical and empirical studies on the phenomena, in particular studies on the determinants of or early warning signals of financial crises. Nonetheless, the different studies of early warning systems have achieved mixed results and there remains much room for further investigation. Since, so far, the empirical studies have focused on conventional economic modelling methods such as simplified probabilistic models and regression models, in this study we examine whether new insights can be gained from the application of the fuzzy clustering method. The theories of fuzzy sets and fuzzy logic offer us the means to deal with uncertainties inherent in a wide variety of tasks, especially when the uncertainty is not the result of randomness but the result of unknown factors and relationships that are difficult to explain. They also provide us with the instruments to treat vague and imprecise linguistic values and to model nonlinear relationships. This paper presents empirical results from analysing the Finnish currency crisis in 1992 using the fuzzy C-means clustering method. We first provide the relevant background knowledge and introduce the fuzzy clustering method. We then show how the use of fuzzy C-means method can help us to identify the critical levels of important economic indicators for predicting of financial crises. Copyright © 2007 John Wiley & Sons, Ltd. [source]


Spatial grouping of United States climate stations using a hybrid clustering approach

INTERNATIONAL JOURNAL OF CLIMATOLOGY, Issue 7 2001
Arthur T. DeGaetano
Abstract The development of a hybrid clustering technique based on the geographic proximity of observing stations and some application-driven measure of statistical similarity (in this case rank correlation) is described. The procedure is then applied to temperature and precipitation data from the United States (US) Historical Climatology Network. The resulting station groups provide some insight into the number of observation stations that are necessary to monitor adequately the climate of the US. Based on temperature data alone, a 287-station subset of the original 1145 sites would be adequate to account for 80% of the spatial variability in seasonal temperature across the US. Geographically the distribution of these stations would be relatively sparse in the centre of the country with higher station density along the East Coast and from the Rocky Mountains to the West Coast. Generally, the temperature clusters match the existing US climate divisions to some extent. To monitor adequately the spatial variability of precipitation, a network of similar size could be used. However, such a network would only account for 65% of the spatial variability in precipitation. In this case, fairly uniform station density is indicated across the country with the highest station density in Florida and the Dakotas. A similar number of stations, but with slightly different geographic groupings would be adequate to monitor precipitation and temperature simultaneously. Copyright © 2001 Royal Meteorological Society [source]


SPICKER: A clustering approach to identify near-native protein folds

JOURNAL OF COMPUTATIONAL CHEMISTRY, Issue 6 2004
Yang Zhang
Abstract We have developed SPICKER, a simple and efficient strategy to identify near-native folds by clustering protein structures generated during computer simulations. In general, the most populated clusters tend to be closer to the native conformation than the lowest energy structures. To assess the generality of the approach, we applied SPICKER to 1489 representative benchmark proteins ,200 residues that cover the PDB at the level of 35% sequence identity; each contains up to 280,000 structure decoys generated using the recently developed TASSER (Threading ASSembly Refinement) algorithm. The best of the top five identified folds has a root-mean-square deviation from native (RMSD) in the top 1.4% of all decoys. For 78% of the proteins, the difference in RMSD from native to the identified models and RMSD from native to the absolutely best individual decoy is below 1 Ĺ; the majority belong to the targets with converged conformational distributions. Although native fold identification from divergent decoy structures remains a challenge, our overall results show significant improvement over our previous clustering algorithms. © 2004 Wiley Periodicals, Inc. J Comput Chem 25: 865,871, 2004 [source]


A robust clustering approach for NMR spectra of natural product extracts

MAGNETIC RESONANCE IN CHEMISTRY, Issue 5 2005
Gregory K. Pierens
Abstract A robust method was developed to cluster similar NMR spectra from partially purified extracts obtained from a range of marine sponges and a plant biota. The NMR data were acquired using microtiter plate NMR (VAST) in protonated solvents. A sample data set which contained several clusters was used to optimize the protocol. The evaluation of the robustness was performed using three different clustering methods: tree clustering analysis, K-means clustering and multidimensional scaling. These methods were compared for consistency using the sample data set and the optimized methodology was applied to clustering of a set of spectra from partially purified biota extracts. Copyright © 2005 John Wiley & Sons, Ltd. [source]


The genetic structure of cattle populations (Bos taurus) in northern Eurasia and the neighbouring Near Eastern regions: implications for breeding strategies and conservation

MOLECULAR ECOLOGY, Issue 18 2007
MENG-HUA LI
Abstract We investigated the genetic structure and variation of 21 populations of cattle (Bos taurus) in northern Eurasia and the neighbouring Near Eastern regions of the Balkan, the Caucasus and Ukraine employing 30 microsatellite markers. By analyses of population relationships, as well as by a Bayesian-based clustering approach, we identified a genetic distinctness between populations of modern commercial origin and those of native origin. Our data suggested that northern European Russia represents the most heavily colonized area by modern commercial cattle. Further genetic mixture analyses based on individual assignment tests found that native Red Steppe cattle were also employed in the historical breeding practices in Eastern Europe, most probably for incorporating their strong and extensive adaptability. In analysis of molecular variance, within-population differences accounted for ~90% of the genetic variation. Despite some correspondence between geographical proximity and genetic similarity, genetic differentiation was observed to be significantly associated with the difference in breeding purpose among the European populations (percentage of variance among groups and significance: 2.99%, P = 0.02). Our findings give unique genetic insight into the historical patterns of cattle breeding practices in the former Soviet Union. The results identify the neighbouring Near Eastern regions such as the Balkan, the Caucasus and Ukraine, and the isolated Far Eastern Siberia as areas of ,genetic endemism', where cattle populations should be given conservation priority. The results will also be of importance for cost-effective management of their future utilization. [source]


Genetic variation and relationships among eight Indian riverine buffalo breeds

MOLECULAR ECOLOGY, Issue 3 2006
SATISH KUMAR
Abstract Twenty-seven microsatellite loci were used to define genetic variation and relationships among eight Indian riverine buffalo breeds. The total number of alleles ranged from 166 in the Toda breed to 194 each in the Mehsana and the Murrah. Significant departures from the Hardy,Weinberg equilibrium were observed for 26 locus-breed combinations due to heterozygote deficiency. Breed differentiation was analysed by estimation of FST index (values ranging from 0.75% to 6.00%) for various breed combinations. The neighbour-joining tree constructed from chord distances, multidimensional scaling (MDS) display of FST values and Bayesian clustering approach consistently identified the Toda, Jaffarabadi, and Pandharpuri breeds as one lineage each, and the Bhadawari, Nagpuri, Surati, Mehsana and Murrah breeds as admixture. Analysis of molecular variance refuted the earlier classification of these breeds proposed on the basis of morphological and geographical parameters. The Toda buffaloes, reared by a tribe of the same name, represent an endangered breed from the Nilgiri hills in South India. Divergence time of the Toda buffaloes from the other main breeds, calculated from Nei's standard genetic distances based on genotyping data on seven breeds and 20 microsatellite loci, suggested separation of this breed approximately 1800,2700 years ago. The results of the present study will be useful for development of rational breeding and conservation strategies for Indian buffaloes. [source]


Novel Approach for Clustering Zeolite Crystal Structures

MOLECULAR INFORMATICS, Issue 4 2010
M. Lach-hab
Abstract Informatics approaches play an increasingly important role in the design of new materials. In this work we apply unsupervised statistical learning for identifying four framework-type attractors of zeolite crystals in which several of the zeolite framework types are grouped together. Zeolites belonging to these super-classes manifest important topological, chemical and physical similarities. The zeolites form clusters located around four core framework types: LTA, FAU, MFI and the combination of EDI, HEU, LTL and LAU. Clustering is performed in a 9-dimensional space of attributes that reflect topological, chemical and physical properties for each individual zeolite crystalline structure. The implemented machine learning approach relies on hierarchical top-down clustering approach and the expectation maximization method. The model is trained and tested on ten partially independent data sets from the FIZ/NIST Inorganic Crystal Structure Database [source]


A clustering approach to identify the time of a step change in Shewhart control charts

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, Issue 7 2008
Mehdi Ghazanfari
Abstract Control charts are the most popular statistical process control tools used to monitor process changes. When a control chart indicates an out-of-control signal it means that the process has changed. However, control chart signals do not indicate the real time of process changes, which is essential for identifying and removing assignable causes and ultimately improving the process. Identifying the real time of the change is known as the change-point estimation problem. Most of the traditional methods of estimating the process change point are developed based on the assumption that the process follows a normal distribution with known parameters, which is seldom true. In this paper, we propose clustering techniques to estimate Shewhart control chart change points. The proposed approach does not depend on the true values of the parameters and even the distribution of the process variables. Accordingly, it is applicable to both phase-I and phase-II of normal and non-normal processes. At the end, we discuss the performance of the proposed method in comparison with the traditional procedures through extensive simulation studies. Copyright © 2008 John Wiley & Sons, Ltd. [source]


A simple and linear time randomized algorithm for computing sparse spanners in weighted graphs,

RANDOM STRUCTURES AND ALGORITHMS, Issue 4 2007
Surender Baswana
Abstract Let G = (V,E) be an undirected weighted graph on |V | = n vertices and |E| = m edges. A t -spanner of the graph G, for any t , 1, is a subgraph (V,ES), ES , E, such that the distance between any pair of vertices in the subgraph is at most t times the distance between them in the graph G. Computing a t -spanner of minimum size (number of edges) has been a widely studied and well-motivated problem in computer science. In this paper we present the first linear time randomized algorithm that computes a t -spanner of a given weighted graph. Moreover, the size of the t -spanner computed essentially matches the worst case lower bound implied by a 43-year old girth lower bound conjecture made independently by Erd,s, Bollobás, and Bondy & Simonovits. Our algorithm uses a novel clustering approach that avoids any distance computation altogether. This feature is somewhat surprising since all the previously existing algorithms employ computation of some sort of local or global distance information, which involves growing either breadth first search trees up to ,(t)-levels or full shortest path trees on a large fraction of vertices. The truly local approach of our algorithm also leads to equally simple and efficient algorithms for computing spanners in other important computational environments like distributed, parallel, and external memory. © 2006 Wiley Periodicals, Inc. Random Struct. Alg., 2007 [source]


Genetic variation and relationships among Turkish water buffalo populations

ANIMAL GENETICS, Issue 1 2010
M. Gargani
Summary The genetic variation and relationships among six Turkish water buffalo populations, typical of different regions, were assessed using a set of 26 heterologous (bovine) microsatellite markers. Between seven and 17 different alleles were identified per microsatellite in a total of 254 alleles. The average number of alleles across all loci in all the analysed populations was found to be 12.57. The expected mean heterozygosity (He) per population ranged between 0.5 and 0.58. Significant departures from Hardy,Weinberg equilibrium were observed for 44 locus,population combinations. Population differentiation was analysed by estimation of the Fst index (values ranging from 0.053 to 0.123) among populations. A principal component analysis of variation revealed the Merzifon population to show the highest differentiation compared with the others. In addition, some individuals of the Danamandira population appeared clearly separated, while the Afyon, Coskun, Pazar and Thural populations represented a single cluster. The assignment of individuals to their source populations, performed using the Bayesian clustering approach implemented in the structure 2.2 software, supports a high differentiation of Merzifon and Danamandira populations. The results of this study are useful for the development of conservation strategies for the Turkish buffalo. [source]


Clustering composition vectors using uncertainty information

ENVIRONMETRICS, Issue 8 2007
William F. Christensen
Abstract In the biological and environmental sciences, interest often lies in using multivariate observations to discover natural clusters of objects. In this manuscript, the incorporation of measurement uncertainty information into a cluster analysis is discussed. This study is motivated by a problem involving the clustering of composition vectors associated with each of several chemical species. The observed abundance of each component is available along with its estimated uncertainty (measurement error standard deviation). An approach is proposed for converting the abundance vectors into composition (relative abundance) vectors, obtaining the covariance matrix associated with each composition vector, and defining a Mahalanobis distance between composition vectors that are suitable for cluster analysis. The approach is illustrated using particle size distributions obtained near Houston, Texas in 2000. Computer simulation is used to compare the performance of Mahalanobis-distance-based and Euclidean-distance-based clustering approaches. The use of a modified Mahalanobis distance along with Ward's method is recommended for use. Copyright © 2007 John Wiley & Sons, Ltd. [source]


SMIXTURE: strategy for mixture model clustering of multivariate images

JOURNAL OF CHEMOMETRICS, Issue 11-12 2005
Thanh N. Tran
Abstract SMIXTURE, a novel strategy for mixture model clustering of multivariate images, has been developed. Most other clustering approaches require good guesses of the number of components (clusters) and the initial statistical parameters. In our approach, the initial parameters are determined by agglomerative clustering on homogenous regions, identified by region growing segmentation. SMIXTURE can be used in both a normal situation of mixture modeling, where the density of a cluster is modeled by a single normal distribution; and in a more complex situation, where the density of a single cluster is a mixture of several normal sub-clusters. The method has proven to be very robust to noise/outliers, overlapping clusters, is reasonably fast and is suitable for moderate to large images. Copyright © 2006 John Wiley & Sons, Ltd. [source]