Web Documents (web + document)

Distribution by Scientific Domains


Selected Abstracts


Relevance of Web documents: Ghosts consensus method

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 10 2002
Andrey L. Gorbunov
The dominant method currently used to improve the quality of Internet search systems is often called "digital democracy." Such an approach implies the utilization of the majority opinion of Internet users to determine the most relevant documents: for example, citation index usage for sorting of search results (google.com) or an enrichment of a query with terms that are asked frequently in relation with the query's theme. "Digital democracy" is an effective instrument in many cases, but it has an unavoidable shortcoming, which is a matter of principle: the average intellectual and cultural level of Internet users is very low,everyone knows what kind of information is dominant in Internet query statistics. Therefore, when one searches the Internet by means of "digital democracy" systems, one gets answers that reflect an underlying assumption that the user's mind potential is very low, and that his cultural interests are not demanding. Thus, it is more correct to use the term "digital ochlocracy" to refer to Internet search systems with "digital democracy." Based on the well-known mathematical mechanism of linear programming, we propose a method to solve the indicated problem. [source]


Subject categorization of query terms for exploring Web users' search interests

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 8 2002
Hsiao-Tieh Pu
Subject content analysis of Web query terms is essential to understand Web searching interests. Such analysis includes exploring search topics and observing changes in their frequency distributions with time. To provide a basis for in-depth analysis of users' search interests on a larger scale, this article presents a query categorization approach to automatically classifying Web query terms into broad subject categories. Because a query is short in length and simple in structure, its intended subject(s) of search is difficult to judge. Our approach, therefore, combines the search processes of real-world search engines to obtain highly ranked Web documents based on each unknown query term. These documents are used to extract cooccurring terms and to create a feature set. An effective ranking function has also been developed to find the most appropriate categories. Three search engine logs in Taiwan were collected and tested. They contained over 5 million queries from different periods of time. The achieved performance is quite encouraging compared with that of human categorization. The experimental results demonstrate that the approach is efficient in dealing with large numbers of queries and adaptable to the dynamic Web environment. Through good integration of human and machine efforts, the frequency distributions of subject categories in response to changes in users' search interests can be systematically observed in real time. The approach has also shown potential for use in various information retrieval applications, and provides a basis for further Web searching studies. [source]


Small-worlds: A review of recent books

NETWORKS: AN INTERNATIONAL JOURNAL, Issue 3 2003
I. Frommer
Abstract Small-worlds research and related fields study a set of network structures with well-defined properties. This new area has been gaining momentum lately. Theoretical studies have advanced our understanding of such networks while empirical studies have shown these networks to be ubiquitous in both nature and society. In particular, systems that appear to be well modeled by such networks include World Wide Web documents, Internet routers, the cellular metabolic network, ecological food webs, social networks, and many others. The two main structures being investigated are small-world networks and scale-free networks. Three recent books, including two just published this summer, describe the research being undertaken in this burgeoning field. We survey and review these books through a discussion of the field of small-worlds research with numerous examples and considerations of the future of the field. © 2003 Wiley Periodicals, Inc. [source]


On web communities mining and recommendation

CONCURRENCY AND COMPUTATION: PRACTICE & EXPERIENCE, Issue 5 2009
Yanchun Zhang
Abstract Because of the lack of a uniform schema for web documents and the sheer amount and dynamics of web data, both the effectiveness and the efficiency of information management and retrieval of web data are often unsatisfactory when using conventional data management and searching techniques. To address this issue, we have adopted web mining and web community analysis approaches. On the basis of the analysis of web document contents, hyperlinks analysis, user access logs and semantic analysis, we have developed various approaches or algorithms to construct and analyze web communities, and to make recommendations. This paper will introduce and discuss several approaches on web community mining and recommendation. Copyright © 2009 John Wiley & Sons, Ltd. [source]