Home About us Contact | |||
Information Retrieval (information + retrieval)
Terms modified by Information Retrieval Selected AbstractsMEMORY ORGANIZATION AS THE MISSING LINK BETWEEN CASE-BASED REASONING AND INFORMATION RETRIEVAL IN BIOMEDICINECOMPUTATIONAL INTELLIGENCE, Issue 3-4 2006Isabelle Bichindaritz Mémoire proposes a general framework for reasoning from cases in biology and medicine. Part of this project is to propose a memory organization capable of handling large cases and case bases as occur in biomedical domains. This article presents the essential principles for an efficient memory organization based on pertinent work in information retrieval (IR). IR systems have been able to scale up to terabytes of data taking advantage of large databases research to build Internet search engines. They search for pertinent documents to answer a query using term-based ranking and/or global ranking schemes. Similarly, case-based reasoning (CBR) systems search for pertinent cases using a scoring function for ranking the cases. Mémoire proposes a memory organization based on inverted indexes which may be powered by databases to search and rank efficiently through large case bases. It can be seen as a first step toward large-scale CBR systems, and in addition provides a framework for tight cooperation between CBR and IR. [source] Integrating Web-Based Documents, Shared Knowledge Bases, and Information Retrieval for User HelpCOMPUTATIONAL INTELLIGENCE, Issue 1 2000Doug Skuce We describe a prototype system, IKARUS, with which we investigated the potential of integrating web-based documents, shared knowledge bases, and information retrieval for improving knowledge storage and retrieval. As an example, we discuss how to implement both a user manual and an online help system as one system. The following technologies are combined: a web-based design, a frame-based knowledge engine, use of an advanced full-text search engine, and simple techniques to control terminology. We have combined graphical browsing with several unusual forms of text retrieval,for example, to the sentence and paragraph level. [source] Interactive Information Retrieval in Digital EnvironmentsJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 9 2009Pia Borlund [source] Task-based information retrieval: Structuring undergraduate history essays for better course evaluation using essay-type visualizationsJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 9 2007John E. Leide When domain novices are in C.C. Kuhlthau's (1993) Stage 3, the exploration stage of researching an assignment, they often do not know their information need; this causes them to go back to Stage 2, the topic-selection stage, when they are selecting keywords to formulate their query to an Information Retrieval (IR) system. Our hypothesis is that instead of going backward, they should be going forward toward a goal state,the performance of the task for which they are seeking the information. If they can somehow construct their goal state into a query, this forward-looking query better operationalizes their information need than does a topic-based query. For domain novice undergraduates seeking information for a course essay, we define their task as selecting a high-impact essay structure which will put the students' learning on display for the course instructor who will evaluate the essay. We report a study of first-year history undergraduate students which tested the use and effectiveness of "essay type" as a task-focused query-formulation device. We randomly assigned 78 history undergraduates to an intervention group and a control group. The dependent variable was essay quality, based on (a) an evaluation of the student's essay by a research team member, and (b) the marks given to the student's essay by the course instructor. We found that conscious or formal consideration of essay type is inconclusive as a basis of a task-focused query-formulation device for IR. [source] Strategic help in user interfaces for information retrievalJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 5 2002Giorgio Brajnik Although no unified definition of the concept of search strategy in Information Retrieval (IR) exists so far, its importance is manifest: nonexpert users, directly interacting with an IR system, apply a limited portfolio of simple actions; they do not know how to react in critical situations; and they often do not even realize that their difficulties are due to strategic problems. A user interface to an IR system should therefore provide some strategic help, focusing user's attention on strategic issues and providing tools to generate better strategies. Because neither the user nor the system can autonomously solve the information problem, but they complement each other, we propose a collaborative coaching approach, in which the two partners cooperate: the user retains the control of the session and the system provides suggestions. The effectiveness of the approach is demonstrated by a conceptual analysis, a prototype knowledge-based system named FIRE, and its evaluation through informal laboratory experiments. [source] Should edentulous patients be constrained to removable complete dentures?GERODONTOLOGY, Issue 1 2010The use of dental implants to improve the quality of life for edentulous patients doi:10.1111/j.1741-2358.2009.00294.x Should edentulous patients be constrained to removable complete dentures? The use of dental implants to improve the quality of life for edentulous patients Background:, Nowadays, there is some speculation among dental educators that the need for complete dentures will significantly decrease in the future and that training in their provision should be removed from the dental curriculum. Objective:, To sensitise the reader to the functional shortcomings of complete denture therapy in the edentulous patient and present restorative options including implants to improve edentulous quality of life in these patients. Methods:, Information retrieval followed a systematic approach using PubMed. English articles published from 1964 to 2008, in which the masticatory performance of patients with implant-supported dentures was assessed by objective methods and compared with performance with conventional dentures, were included. Results:, National epidemiological survey data suggested that the adult population in need of one or two complete dentures will increase from 35.4 million adults in 2000 to 37.9 million adults in 2020. Clinical studies have showed that the ratings of general satisfaction were significantly better in the patients treated with implant overdentures post-delivery compared with the complete denture users. In addition, the implant group gave significantly higher ratings on comfort, stability and ability to chew. Furthermore, patients who received mandibular implant overdentures had significantly fewer oral health-related quality of life problems than did the conventional group. Conclusion:, Implant-supported dentures including either complete overdentures or a hybrid prosthesis significantly improve the quality of life for edentulous patients compared with conventional removable complete dentures. Therefore, the contemporary dental practitioner should consider other options as well as conventional removable complete dentures to restore edentulous patients. [source] Fuzzy quantification in two real scenarios: Information retrieval and mobile roboticsINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 6 2009Félix Díaz-Hermida Fuzzy quantification supplies powerful tools for handling linguistic expressions. Nevertheless, its advantages are usually shown at the theoretical level without a proper empirical validation. In this work, we review the application of fuzzy quantification in two application domains. We provide empirical evidence on the adequacy of fuzzy quantification to support different tasks in the context of mobile robotics and information retrieval. This practical perspective aims at exemplifying the actual benefits that real application can get from fuzzy quantifiers. © 2009 Wiley Periodicals, Inc. [source] Tuning the matching function for a threshold weighting semantics in a linguistic information retrieval systemINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 9 2005E. Herrera-Viedma Information retrieval is an activity that attempts to produce documents that better fulfill user information needs. To achieve this activity an information retrieval system uses matching functions that specify the degree of relevance of a document with respect to a user query. Assuming linguistic-weighted queries we present a new linguistic matching function for a threshold weighting semantics that is defined using a 2-tuple fuzzy linguistic approach (Herrera F, Martínez L. IEEE Trans Fuzzy Syst 2000;8:746,752). This new 2-tuple linguistic matching function can be interpreted as a tuning of that defined in "Modelling the Retrieval Process for an Information Retrieval System Using an Ordinal Fuzzy Linguistic Approach" (Herrera-Viedma E. J Am Soc Inform Sci Technol 2001;52:460,475). We show that it simplifies the processes of computing in the retrieval activity, avoids the loss of precision in final results, and, consequently, can help to improve the users' satisfaction. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 921,937, 2005. [source] Information retrieval and the philosophy of languageANNUAL REVIEW OF INFORMATION SCIENCE & TECHNOLOGY (ELECTRONIC), Issue 1 2003David C. BlairArticle first published online: 31 JAN 200 First page of article [source] MEMORY ORGANIZATION AS THE MISSING LINK BETWEEN CASE-BASED REASONING AND INFORMATION RETRIEVAL IN BIOMEDICINECOMPUTATIONAL INTELLIGENCE, Issue 3-4 2006Isabelle Bichindaritz Mémoire proposes a general framework for reasoning from cases in biology and medicine. Part of this project is to propose a memory organization capable of handling large cases and case bases as occur in biomedical domains. This article presents the essential principles for an efficient memory organization based on pertinent work in information retrieval (IR). IR systems have been able to scale up to terabytes of data taking advantage of large databases research to build Internet search engines. They search for pertinent documents to answer a query using term-based ranking and/or global ranking schemes. Similarly, case-based reasoning (CBR) systems search for pertinent cases using a scoring function for ranking the cases. Mémoire proposes a memory organization based on inverted indexes which may be powered by databases to search and rank efficiently through large case bases. It can be seen as a first step toward large-scale CBR systems, and in addition provides a framework for tight cooperation between CBR and IR. [source] Integrating Web-Based Documents, Shared Knowledge Bases, and Information Retrieval for User HelpCOMPUTATIONAL INTELLIGENCE, Issue 1 2000Doug Skuce We describe a prototype system, IKARUS, with which we investigated the potential of integrating web-based documents, shared knowledge bases, and information retrieval for improving knowledge storage and retrieval. As an example, we discuss how to implement both a user manual and an online help system as one system. The following technologies are combined: a web-based design, a frame-based knowledge engine, use of an advanced full-text search engine, and simple techniques to control terminology. We have combined graphical browsing with several unusual forms of text retrieval,for example, to the sentence and paragraph level. [source] An information retrieval system for telephone dialogue in load dispatch centerELECTRICAL ENGINEERING IN JAPAN, Issue 3 2008Osamu Segawa Abstract We have developed an information retrieval system for telephone dialogue in a load dispatch center. In load dispatching operations, the needs for recording and information retrieval of a telephone dialogue are high. The proposed system gives a solution for the task and realizes an information retrieval function with any keywords. The effectiveness of the system is verified by telephone dialogue transcription and information retrieval experiments. With 30 telephone dialogues in a load dispatch center, we obtain 59.5% in average word correct and 44.4% in average word accuracy. In the information retrieval experiment, with 20 keywords, we obtain 87.3% in average precision and 67.2% in average recall. © 2007 Wiley Periodicals, Inc. Electr Eng Jpn, 162(3): 44, 50, 2008; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/eej.20402 [source] Key words and their role in information retrievalHEALTH INFORMATION & LIBRARIES JOURNAL, Issue 3 2010Maria J. Grant As any good library or information worker knows the accurate and consistent application of keywords can serve to enhance the content representation and retrieval of literature. Research has demonstrated that this aspect of the library and information science evidence base is particularly well represented. Drawing on the thesauri of the Library & Science Abstracts, Library, Information Science & Technology Abstracts and medline databases, the Health Information and Libraries Journal (HILJ) has recently updated and expanded the HILJ keyword list. Based on the content of reviews and original articles published in HILJ over the past 4 years, the keyword list will be used by submitting authors to represent the content of the manuscripts and enable more accurate matching of manuscript to HILJ referees. [source] The number needed to retrieve: a practically useful measure of information retrieval?HEALTH INFORMATION & LIBRARIES JOURNAL, Issue 3 2006Andrew Booth First page of article [source] Face as an index: Knowing who is who using a PDAINTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, Issue 1 2003Jie Yang Abstract In this article, we present a PDA-based system for extending human memory or/and information retrieval using a human face as the lookup index. The system can help a user to remember names of people whom he/she has met before and find useful information, such as names and research interests, about people whom he/she is interested in talking to. The system uses a captured face image as the lookup index to retrieve information from some available resource such as departmental directory, web sites, personal homepages, etc. We describe the development of a PDA-based face recognition system and introduce algorithms for image preprocessing to enhance the quality of the image by sharpening focus and normalizing both lighting condition and head rotation. We use a unified LDA/PCA algorithm for face recognition. We address design issues of the interface to assist in visualization and comprehension of retrieved information. We present user study and experiment results to demonstrate the feasibility of the proposed system. © 2003 Wiley Periodicals, Inc. Int J Imaging Syst Technol 13: 33,41, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ima.10046 [source] Fuzzy quantification in two real scenarios: Information retrieval and mobile roboticsINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 6 2009Félix Díaz-Hermida Fuzzy quantification supplies powerful tools for handling linguistic expressions. Nevertheless, its advantages are usually shown at the theoretical level without a proper empirical validation. In this work, we review the application of fuzzy quantification in two application domains. We provide empirical evidence on the adequacy of fuzzy quantification to support different tasks in the context of mobile robotics and information retrieval. This practical perspective aims at exemplifying the actual benefits that real application can get from fuzzy quantifiers. © 2009 Wiley Periodicals, Inc. [source] An algorithm for modelling key termsINTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, Issue 1 2008D. Cai The ability to formally analyse and represent semantic relations of terms is a major challenge for many areas of computing science and an intriguing problem for other sciences. In applications of evidence theory to, for instance, information retrieval, the problem of analysis and representation becomes apparent because evidence theory is based on set theory and individual key terms have to be modelled as subsets of the frame of discernment. How to find the frame and model the key terms is a challenge. The problem leads to other practical problems, as pointed out repeatedly in the literature. In this study, we focus on such a problem, present a method for simplifying and normalizing a thesaurus, and propose an algorithm for establishing the frame of discernment and for modelling individual key terms as a subset of the frame. The key aim of this study is to treat semantic relations of terms by means of a normalized thesaurus. © 2008 Wiley Periodicals, Inc. [source] Transitions in search tactics during the Web-based search processJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 11 2010Iris Xie Although many studies have identified search tactics, few studies have explored tactic transitions. This study investigated the transitions of search tactics during the Web-based search process. Bringing their own 60 search tasks, 31 participants, representing the general public with different demographic characteristics, participated in the study. Data collected from search logs and verbal protocols were analyzed by applying both qualitative and quantitative methods. The findings of this study show that participants exhibited some unique Web search tactics. They overwhelmingly employed accessing and evaluating tactics; they used fewer tactics related to modifying search statements, monitoring the search process, organizing search results, and learning system features. The contributing factors behind applying most and least frequently employed search tactics are in relation to users' efforts, trust in information retrieval (IR) systems, preference, experience, and knowledge as well as limitation of the system design. A matrix of search-tactic transitions was created to show the probabilities of transitions from one tactic to another. By applying fifth-order Markov chain, the results also presented the most common search strategies representing patterns of tactic transition occurring at the beginning, middle, and ending phases within one search session. The results of this study generated detailed and useful guidance for IR system design to support the most frequently applied tactics and transitions, to reduce unnecessary transitions, and support transitions at different phases. [source] Concepts and semantic relations in information scienceJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 10 2010Wolfgang G. Stock Concept-based information retrieval and knowledge representation are in need of a theory of concepts and semantic relations. Guidelines for the construction and maintenance of knowledge organization systems (KOS) (such as ANSI/NISO Z39.19-2005 in the U.S.A. or DIN 2331:1980 in Germany) do not consider results of concept theory and theory of relations to the full extent. They are not able to unify the currently different worlds of traditional controlled vocabularies, of the social web (tagging and folksonomies) and of the semantic web (ontologies). Concept definitions as well as semantic relations are based on epistemological theories (empiricism, rationalism, hermeneutics, pragmatism, and critical theory). A concept is determined via its intension and extension as well as by definition. We will meet the problem of vagueness by introducing prototypes. Some important definitions are concept explanations (after Aristotle) and the definition of family resemblances (in the sense of Wittgenstein). We will model concepts as frames (according to Barsalou). The most important paradigmatic relation in KOS is hierarchy, which must be arranged into different classes: Hyponymy consists of taxonomy and simple hyponymy, meronymy consists of many different part-whole-relations. For practical application purposes, the transitivity of the given relation is very important. Unspecific associative relations are of little help to our focused applications and should be replaced by generalizable and domain-specific relations. We will discuss the reflexivity, symmetry, and transitivity of paradigmatic relations as well as the appearance of specific semantic relations in the different kinds of KOS (folksonomies, nomenclatures, classification systems, thesauri, and ontologies). Finally, we will pick out KOS as a central theme of the Semantic Web. [source] A structuration approach to online communities of practice: The case of Q&A communitiesJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 9 2010Howard Rosenbaum This article describes an approach based on structuration theory (Giddens, 1979, 1984; Orlikowski, 1992, 2000) and communities of practice (Wenger, 1998) that can be used to guide investigation into the dynamics of online question and answer (Q&A) communities. This approach is useful because most research on Q&A sites has focused attention on information retrieval, information-seeking behavior, and information intermediation and has assumed uncritically that the online Q&A community plays an important role in these domains of study. Assuming instead that research on online communities should take into account social, technical, and contextual factors (Kling, Rosenbaum, & Sawyer, 2005), the utility of this approach is demonstrated with an analysis of three online Q&A communities seen as communities of practice. This article makes a theoretical contribution to the study of online Q&A communities and, more generally, to the domain of social reference. [source] Investigating information retrieval support techniques for different information-seeking strategiesJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 8 2010Xiaojun Yuan We report on a study that investigated the efficacy of four different interactive information retrieval (IIR) systems, each designed to support a specific information-seeking strategy (ISS). These systems were constructed using different combinations of IR techniques (i.e., combinations of different methods of representation, comparison, presentation and navigation), each of which was hypothesized to be well suited to support a specific ISS. We compared the performance of searchers in each such system, designated "experimental," to an appropriate "baseline" system, which implemented the standard specified query and results list model of current state-of-the-art experimental and operational IR systems. Four within-subjects experiments were conducted for the purpose of this comparison. Results showed that each of the experimental systems was superior to its baseline system in supporting user performance for the specific ISS (that is, the information problem leading to that ISS) for which the system was designed. These results indicate that an IIR system, which intends to support more than one kind of ISS, should be designed within a framework which allows the use and combination of different IR support techniques for different ISSs. [source] Analyzing user interaction with the ViewFinder video retrieval systemJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 2 2010Dan Albertson This study investigates interactive video retrieval. The basis for this study is that user- and search task-centric research in video information retrieval can assist efforts for developing effective user interfaces and help complement the existing corpus of video retrieval research by providing evidence for the benefits of evaluating systems using such an approach. Accordingly, the results were collected and analyzed from the perspective of certain users and search tasks (i.e., information needs). The methodology of this study employed specially designed interactive search experiments to examine a number of different factors in a video retrieval context, including those that correspond to search tasks of a particular domain, interface features and functions, system effectiveness, and user interactions. The results indicated that the use and effectiveness of certain interface features and functions were dependent on the type of search task, while others were more consistent across the full experiment. Also included is a review of prior research pertaining to visual search tasks, systems development, and user interaction. ViewFinder, the prototype system used to carry out the interactive search experiments of this study, is fully described. [source] Unified linear subspace approach to semantic analysisJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 1 2010Dandan Li The Basic Vector Space Model (BVSM) is well known in information retrieval. Unfortunately, its retrieval effectiveness is limited because it is based on literal term matching. The Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) are two prominent semantic retrieval methods, both of which assume there is some underlying latent semantic structure in a dataset that can be used to improve retrieval performance. However, while this structure may be derived from both the term space and the document space, GVSM exploits only the former and LSI the latter. In this article, the latent semantic structure of a dataset is examined from a dual perspective; namely, we consider the term space and the document space simultaneously. This new viewpoint has a natural connection to the notion of kernels. Specifically, a unified kernel function can be derived for a class of vector space models. The dual perspective provides a deeper understanding of the semantic space and makes transparent the geometrical meaning of the unified kernel function. New semantic analysis methods based on the unified kernel function are developed, which combine the advantages of LSI and GVSM. We also prove that the new methods are stable because although the selected rank of the truncated Singular Value Decomposition (SVD) is far from the optimum, the retrieval performance will not be degraded significantly. Experiments performed on standard test collections show that our methods are promising. [source] Mobile information retrieval with search results clustering: Prototypes and evaluationsJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 5 2009Claudio Carpineto Web searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list-based search interface paradigm does not scale well to mobile devices due to their inherent limitations. In this article, we investigate the application of search results clustering, used with some success for desktop computer searches, to the mobile scenario. Building on CREDO (Conceptual Reorganization of Documents), a Web clustering engine based on concept lattices, we present its mobile versions Credino and SmartCREDO, for PDAs and cell phones, respectively. Next, we evaluate the retrieval performance of the three prototype systems. We measure the effectiveness of their clustered results compared to a ranked list of results on a subtopic retrieval task, by means of the device-independent notion of subtopic reach time together with a reusable test collection built from Wikipedia ambiguous entries. Then, we make a cross-comparison of methods (i.e., clustering and ranked list) and devices (i.e., desktop, PDA, and cell phone), using an interactive information-finding task performed by external participants. The main finding is that clustering engines are a viable complementary approach to plain search engines both for desktop and mobile searches especially, but not only, for multitopic informational queries. [source] Data fusion according to the principle of polyrepresentationJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 4 2009Birger Larsen We report data fusion experiments carried out on the four best-performing retrieval models from TREC 5. Three were conceptually/algorithmically very different from one another; one was algorithmically similar to one of the former. The objective of the test was to observe the performance of the 11 logical data fusion combinations compared to the performance of the four individual models and their intermediate fusions when following the principle of polyrepresentation. This principle is based on cognitive IR perspective (Ingwersen & Järvelin, 2005) and implies that each retrieval model is regarded as a representation of a unique interpretation of information retrieval (IR). It predicts that only fusions of very different, but equally good, IR models may outperform each constituent as well as their intermediate fusions. Two kinds of experiments were carried out. One tested restricted fusions, which entails that only the inner disjoint overlap documents between fused models are ranked. The second set of experiments was based on traditional data fusion methods. The experiments involved the 30 TREC 5 topics that contain more than 44 relevant documents. In all tests, the Borda and CombSUM scoring methods were used. Performance was measured by precision and recall, with document cutoff values (DCVs) at 100 and 15 documents, respectively. Results show that restricted fusions made of two, three, or four cognitively/algorithmically very different retrieval models perform significantly better than do the individual models at DCV100. At DCV15, however, the results of polyrepresentative fusion were less predictable. The traditional fusion method based on polyrepresentation principles demonstrates a clear picture of performance at both DCV levels and verifies the polyrepresentation predictions for data fusion in IR. Data fusion improves retrieval performance over their constituent IR models only if the models all are quite conceptually/algorithmically dissimilar and equally and well performing, in that order of importance. [source] English-Arabic proper-noun transliteration-pairs creationJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 10 2008Mohamed Abdel Fattah Proper nouns may be considered the most important query words in information retrieval. If the two languages use the same alphabet, the same proper nouns can be found in either language. However, if the two languages use different alphabets, the names must be transliterated. Short vowels are not usually marked on Arabic words in almost all Arabic documents (except very important documents like the Muslim and Christian holy books). Moreover, most Arabic words have a syllable consisting of a consonant-vowel combination (CV), which means that most Arabic words contain a short or long vowel between two successive consonant letters. That makes it difficult to create English-Arabic transliteration pairs, since some English letters may not be matched with any romanized Arabic letter. In the present study, we present different approaches for extraction of transliteration proper-noun pairs from parallel corpora based on different similarity measures between the English and romanized Arabic proper nouns under consideration. The strength of our new system is that it works well for low-frequency proper noun pairs. We evaluate the new approaches presented using two different English-Arabic parallel corpora. Most of our results outperform previously published results in terms of precision, recall, and F -Measure. [source] Controlled user evaluations of information visualization interfaces for text retrieval: Literature review and meta-analysisJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 6 2008Charles-Antoine Julien This review describes experimental designs (users, search tasks, measures, etc.) used by 31 controlled user studies of information visualization (IV) tools for textual information retrieval (IR) and a meta-analysis of the reported statistical effects. Comparable experimental designs allow research designers to compare their results with other reports, and support the development of experimentally verified design guidelines concerning which IV techniques are better suited to which types of IR tasks. The studies generally use a within-subject design with 15 or more undergraduate students performing browsing to known-item tasks on sets of at least 1,000 full-text articles or Web pages on topics of general interest/news. Results of the meta-analysis (N = 8) showed no significant effects of the IV tool as compared with a text-only equivalent, but the set shows great variability suggesting an inadequate basis of comparison. Experimental design recommendations are provided which would support comparison of existing IV tools for IR usability testing. [source] The influence of indexing practices and weighting algorithms on document spacesJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 1 2008Dietmar Wolfram Index modeling and computer simulation techniques are used to examine the influence of indexing frequency distributions, indexing exhaustivity distributions, and three weighting methods on hypothetical document spaces in a vector-based information retrieval (IR) system. The way documents are indexed plays an important role in retrieval. The authors demonstrate the influence of different indexing characteristics on document space density (DSD) changes and document space discriminative capacity for IR. Document environments that contain a relatively higher percentage of infrequently occurring terms provide lower density outcomes than do environments where a higher percentage of frequently occurring terms exists. Different indexing exhaustivity levels, however, have little influence on the document space densities. A weighting algorithm that favors higher weights for infrequently occurring terms results in the lowest overall document space densities, which allows documents to be more readily differentiated from one another. This in turn can positively influence IR. The authors also discuss the influence on outcomes using two methods of normalization of term weights (i.e., means and ranges) for the different weighting methods. [source] Relevance: A review of the literature and a framework for thinking on the notion in information science.JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 13 2007Part II: nature, manifestations of relevance Relevant: Having significant and demonstrable bearing on the matter at hand., Relevance: The ability as of an information retrieval system to retrieve material that satisfies the needs of the user. ,Merriam-Webster Dictionary 2005 Relevance is a, if not even the, key notion in information science in general and information retrieval in particular. This two-part critical review traces and synthesizes the scholarship on relevance over the past 30 years and provides an updated framework within which the still widely dissonant ideas and works about relevance might be interpreted and related. It is a continuation and update of a similar review that appeared in 1975 under the same title, considered here as being Part I. The present review is organized into two parts: Part II addresses the questions related to nature and manifestations of relevance, and Part III addresses questions related to relevance behavior and effects. In Part II, the nature of relevance is discussed in terms of meaning ascribed to relevance, theories used or proposed, and models that have been developed. The manifestations of relevance are classified as to several kinds of relevance that form an interdependent system of relevances. In Part III, relevance behavior and effects are synthesized using experimental and observational works that incorporate data. In both parts, each section concludes with a summary that in effect provides an interpretation and synthesis of contemporary thinking on the topic treated or suggests hypotheses for future research. Analyses of some of the major trends that shape relevance work are offered in conclusions. [source] Data cleansing for Web information retrieval using query independent featuresJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, Issue 12 2007Yiqun Liu Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance. [source] |