Query Terms (query + term)

Distribution by Scientific Domains


Selected Abstracts


Subject categorization of query terms for exploring Web users' search interests

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 8 2002
Hsiao-Tieh Pu
Subject content analysis of Web query terms is essential to understand Web searching interests. Such analysis includes exploring search topics and observing changes in their frequency distributions with time. To provide a basis for in-depth analysis of users' search interests on a larger scale, this article presents a query categorization approach to automatically classifying Web query terms into broad subject categories. Because a query is short in length and simple in structure, its intended subject(s) of search is difficult to judge. Our approach, therefore, combines the search processes of real-world search engines to obtain highly ranked Web documents based on each unknown query term. These documents are used to extract cooccurring terms and to create a feature set. An effective ranking function has also been developed to find the most appropriate categories. Three search engine logs in Taiwan were collected and tested. They contained over 5 million queries from different periods of time. The achieved performance is quite encouraging compared with that of human categorization. The experimental results demonstrate that the approach is efficient in dealing with large numbers of queries and adaptable to the dynamic Web environment. Through good integration of human and machine efforts, the frequency distributions of subject categories in response to changes in users' search interests can be systematically observed in real time. The approach has also shown potential for use in various information retrieval applications, and provides a basis for further Web searching studies. [source]


Using clustering methods to improve ontology-based query term disambiguation

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 7 2006
Ernesto William De Luca
In this article we describe results of our research on the disambiguation of user queries using ontologies for categorization. We present an approach to cluster search results by using classes or "Sense Folders" (prototype categories) derived from the concepts of an assigned ontology, in our case WordNet. Using the semantic relations provided from such a resource, we can assign categories to prior, not annotated documents. The disambiguation of query terms in documents with respect to a user-specific ontology is an important issue in order to improve the retrieval performance for the user. Furthermore, we show that a clustering process can enhance the semantic classification of documents, and we discuss how this clustering process can be further enhanced using only the most descriptive classes of the ontology. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 693,709, 2006. [source]


Analysis of query keywords of sports-related queries using visualization and clustering

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 8 2009
Jin Zhang
The authors investigated 11 sports-related query keywords extracted from a public search engine query log to better understand sports-related information seeking on the Internet. After the query log contents were cleaned and query data were parsed, popular sports-related keywords were identified, along with frequently co-occurring query terms associated with the identified keywords. Relationships among each sports-related focus keyword and its related keywords were characterized and grouped using multidimensional scaling (MDS) in combination with traditional hierarchical clustering methods. The two approaches were synthesized in a visual context by highlighting the results of the hierarchical clustering analysis in the visual MDS configuration. Important events, people, subjects, merchandise, and so on related to a sport were illustrated, and relationships among the sports were analyzed. A small-scale comparative study of sports searches with and without term assistance was conducted. Searches that used search term assistance by relying on previous query term relationships outperformed the searches without the search term assistance. The findings of this study provide insights into sports information seeking behavior on the Internet. The developed method also may be applied to other query log subject areas. [source]


Passage detection using text classification

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 4 2009
Saket Mengle
Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy. [source]


User preference: A measure of query-term quality

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 12 2006
Nina Wacholder
The goal of this research is to understand what characteristics, if any, lead users engaged in interactive information seeking to prefer certain sets of query terms. Underlying this work is the assumption that query terms that information seekers prefer induce a kind of cognitive efficiency: They require less mental effort to process and therefore reduce the energy required in the interactive information-seeking process. Conceptually, this work applies insights from linguistics and cognitive science to the study of query-term quality. We report on an experiment in which we compare user preference for three sets of terms; one had been preconstructed by a human indexer, and two were identified automatically. Twenty-four participants used a merged list of all terms to answer a carefully created set of questions. By design, the interface constrained users to access the text exclusively via the displayed list of query terms. We found that participants displayed a preference for the human-constructed set of terms eight times greater than the preference for either set of automatically identified terms. We speculate about reasons for this strong preference and discuss the implications for information access. The primary contributions of this research are (a) explication of the concept of user preference as a measure of query-term quality and (b) identification of a replicable procedure for measuring preference for sets of query terms created by different methods, whether human or automatic. All other factors being equal, query terms that users prefer clearly are the best choice for real-world information-access systems. [source]


Subject categorization of query terms for exploring Web users' search interests

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 8 2002
Hsiao-Tieh Pu
Subject content analysis of Web query terms is essential to understand Web searching interests. Such analysis includes exploring search topics and observing changes in their frequency distributions with time. To provide a basis for in-depth analysis of users' search interests on a larger scale, this article presents a query categorization approach to automatically classifying Web query terms into broad subject categories. Because a query is short in length and simple in structure, its intended subject(s) of search is difficult to judge. Our approach, therefore, combines the search processes of real-world search engines to obtain highly ranked Web documents based on each unknown query term. These documents are used to extract cooccurring terms and to create a feature set. An effective ranking function has also been developed to find the most appropriate categories. Three search engine logs in Taiwan were collected and tested. They contained over 5 million queries from different periods of time. The achieved performance is quite encouraging compared with that of human categorization. The experimental results demonstrate that the approach is efficient in dealing with large numbers of queries and adaptable to the dynamic Web environment. Through good integration of human and machine efforts, the frequency distributions of subject categories in response to changes in users' search interests can be systematically observed in real time. The approach has also shown potential for use in various information retrieval applications, and provides a basis for further Web searching studies. [source]


Automatic semantic mapping between query terms and controlled vocabulary through using WordNet and Wikipedia

PROCEEDINGS OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE & TECHNOLOGY (ELECTRONIC), Issue 1 2008
Xiaozhong Liu
First page of article [source]