Web Data (web + data)

Distribution by Scientific Domains

Selected Abstracts

A novel clustering algorithm using hypergraph-based granular computing

Qun Liu
Clustering is an important technique in data mining. In this paper, we introduce a new clustering algorithm. This algorithm, based on granular computing, constructs a hypergraph (simplicial complex) by the hypergraph bisection algorithm. It will discover the similarities and associations among documents. In some experiments on Web data, the proposed algorithm is used; the results are quite satisfactory. 2009 Wiley Periodicals, Inc. [source]

Data cleansing for Web information retrieval using query independent features

Yiqun Liu
Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance. [source]

On web communities mining and recommendation

Yanchun Zhang
Abstract Because of the lack of a uniform schema for web documents and the sheer amount and dynamics of web data, both the effectiveness and the efficiency of information management and retrieval of web data are often unsatisfactory when using conventional data management and searching techniques. To address this issue, we have adopted web mining and web community analysis approaches. On the basis of the analysis of web document contents, hyperlinks analysis, user access logs and semantic analysis, we have developed various approaches or algorithms to construct and analyze web communities, and to make recommendations. This paper will introduce and discuss several approaches on web community mining and recommendation. Copyright 2009 John Wiley & Sons, Ltd. [source]

A landscape theory for food web architecture

Neil Rooney
Abstract Ecologists have long searched for structures and processes that impart stability in nature. In particular, food web ecology has held promise in tackling this issue. Empirical patterns in food webs have consistently shown that the distributions of species and interactions in nature are more likely to be stable than randomly constructed systems with the same number of species and interactions. Food web ecology still faces two fundamental challenges, however. First, the quantity and quality of food web data required to document both the species richness and the interaction strengths among all species within food webs is largely prohibitive. Second, where food webs have been well documented, spatial and temporal variation in food web structure has been ignored. Conversely, research that has addressed spatial and temporal variation in ecosystems has generally ignored the full complexity of food web architecture. Here, we incorporate empirical patterns, largely from macroecology and behavioural ecology, into a spatially implicit food web structure to construct a simple landscape theory of food web architecture. Such an approach both captures important architectural features of food webs and allows for an exploration of food web structure across a range of spatial scales. Finally, we demonstrated that food webs are hierarchically organized along the spatial and temporal niche axes of species and their utilization of food resources in ways that stabilize ecosystems. [source]

A data warehouse/online analytic processing framework for web usage mining and business intelligence reporting

Xiaohua Hu
Web usage mining is the application of data mining techniques to discover usage patterns and behaviors from web data (clickstream, purchase information, customer information, etc.) in order to understand and serve e-commerce customers better and improve the online business. In this article, we present a general data warehouse/online analytic processing (OLAP) framework for web usage mining and business intelligence reporting. When we integrate the web data warehouse construction, data mining, and OLAP into the e-commerce system, this tight integration dramatically reduces the time and effort for web usage mining, business intelligence reporting, and mining deployment. Our data warehouse/OLAP framework consists of four phases: data capture, webhouse construction (clickstream marts), pattern discovery and cube construction, and pattern evaluation and deployment. We discuss data transformation operations for web usage mining and business reporting in clickstream, session, and customer levels; describe the problems and challenging issues in each phase in detail; provide plausible solutions to the issues; and demonstrate the framework with some examples from some real web sites. Our data warehouse/OLAP framework has been integrated into some commercial e-commerce systems. We believe this data warehouse/OLAP framework would be very useful for developing any real-world web usage mining and business intelligence reporting systems. 2004 Wiley Periodicals, Inc. [source]