Home About us Contact | |||
Clustering Process (clustering + process)
Selected AbstractsEvaluation of reduced rank semiparametric models to assess excess of risk in cluster analysisENVIRONMETRICS, Issue 4 2009Marco Geraci Abstract The existence of multiple environmental hazards is obviously a threat to human health and, from a statistical point of view, the modeling and the detection of disease clusters potentially related to those hazards offer challenging tasks. In this paper, we consider low rank thin plate spline (TPS) models within a semiparametric approach to focused clustering for small area health data. Both the distance from a putative source and a general, unspecified clustering process are modeled in the same fashion and they are entered log-additively in mixed Poisson-Normal models. Some issues related to the identification of the random effects arising from this approach are investigated. Under different simulated scenarios, we evaluate the proposed models using conditional Akaike's weights and tests for variance components, providing a comprehensive model selection methodology easy to implement. We examine observations of lung cancer deaths taken in Ohio between 1987 and 1988. These data were analyzed on several occasions to investigate the risk associated with a putative source in Hamilton county. In our analysis, we found a strong south-eastward spatial trend which is confounded with a significant radial distance effect decreasing between 0 and 150 km from the point source. Copyright © 2008 John Wiley & Sons, Ltd. [source] Augmentation of a nearest neighbour clustering algorithm with a partial supervision strategy for biomedical data classificationEXPERT SYSTEMS, Issue 1 2009Sameh A. Salem Abstract: In this paper, a partial supervision strategy for a recently developed clustering algorithm, the nearest neighbour clustering algorithm (NNCA), is proposed. The proposed method (NNCA-PS) offers classification capability with a smaller amount of a priori knowledge, where a small number of data objects from the entire data set are used as labelled objects to guide the clustering process towards a better search space. Experimental results show that NNCA-PS gives promising results of 89% sensitivity at 95% specificity when used to segment retinal blood vessels, and a maximum classification accuracy of 99.5% with 97.2% average accuracy when applied to a breast cancer data set. Comparisons with other methods indicate the robustness of the proposed method in classification. Additionally, experiments on parallel environments indicate the suitability and scalability of NNCA-PS in handling larger data sets. [source] Using clustering methods to improve ontology-based query term disambiguationINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 7 2006Ernesto William De Luca In this article we describe results of our research on the disambiguation of user queries using ontologies for categorization. We present an approach to cluster search results by using classes or "Sense Folders" (prototype categories) derived from the concepts of an assigned ontology, in our case WordNet. Using the semantic relations provided from such a resource, we can assign categories to prior, not annotated documents. The disambiguation of query terms in documents with respect to a user-specific ontology is an important issue in order to improve the retrieval performance for the user. Furthermore, we show that a clustering process can enhance the semantic classification of documents, and we discuss how this clustering process can be further enhanced using only the most descriptive classes of the ontology. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 693,709, 2006. [source] Identifying similar pages in Web applications using a competitive clustering algorithmJOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE, Issue 5 2007Andrea De Lucia Abstract We present an approach based on Winner Takes All (WTA), a competitive clustering algorithm, to support the comprehension of static and dynamic Web applications during Web application reengineering. This approach adopts a process that first computes the distance between Web pages and then identifies and groups similar pages using the considered clustering algorithm. We present an instance of application of the clustering process to identify similar pages at the structural level. The page structure is encoded into a string of HTML tags and then the distance between Web pages at the structural level is computed using the Levenshtein string edit distance algorithm. A prototype to automate the clustering process has been implemented that can be extended to other instances of the process, such as the identification of groups of similar pages at content level. The approach and the tool have been evaluated in two case studies. The results have shown that the WTA clustering algorithm suggests heuristics to easily identify the best partition of Web pages into clusters among the possible partitions. Copyright © 2007 John Wiley & Sons, Ltd. [source] Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and solutionsJOURNAL OF VEGETATION SCIENCE, Issue 5 2005János Podani Abstract Questions: Are ordinal data appropriately treated by multivariate methods in numerical ecology? If not, what are the most common mistakes? Which dissimilarity coefficients, ordination and classification methods are best suited to ordinal data? Should we worry about such problems at all? Methods: A new classification model family, OrdClAn (Ordinal Cluster Analysis), is suggested for hierarchical and non-hierarchical classifications from ordinal ecological data, e.g. the abundance/dominance scores that are commonly recorded in relevés. During the clustering process, the objects are grouped so as to minimize a measure calculated from the ranks of within-cluster and between-cluster distances or dissimilarities. Results and Conclusions: Evaluation of the various steps of exploratory data analysis of ordinal ecological data shows that consistency of methodology throughout the study is of primary importance. In an optimal situation, each methodological step is order invariant. This property ensures that the results are independent of changes not affecting ordinal relationships, and guarantees that no illusory precision is introduced into the analysis. However, the multivariate procedures that are most commonly applied in numerical ecology do not satisfy these requirements and are therefore not recommended. For example, it is inappropriate to analyse Braun-Blanquet abudance/dominance data by methods assuming that Euclidean distance is meaningful. The solution of all problems is that the dissimilarity coefficient should be compatible with ordinal variables and the subsequent ordination or clustering method should consider only the rank order of dissimilarities. A range of artificial data sets exemplifying different subtypes of ordinal variables, e.g. indicator values or species scores from relevés, illustrate the advocated approach. Detailed analyses of an actual phytosociological data set demonstrate the classification by OrdClAn of relevés and species and the subsequent tabular rearrangement, in a numerical study remaining within the ordinal domain from the first step to the last. [source] Fixation for distributed clustering processesCOMMUNICATIONS ON PURE & APPLIED MATHEMATICS, Issue 7 2010M. R. Hilário We study a discrete-time resource flow in \input amssym ${\Bbb Z}^d $ where wealthier vertices attract the resources of their less rich neighbors. For any translation-invariant probability distribution of initial resource quantities, we prove that the flow at each vertex terminates after finitely many steps. This answers (a generalized version of) a question posed by van den Berg and Meester in 1991. The proof uses the mass transport principle and extends to other graphs. © 2010 Wiley Periodicals, Inc. [source] |