Abstract
QUERY CATEGORIZATION WEB SEARCH RESULTS INTO MEANINGFUL AND STABLE CATEGORIES USING FAST-FEATURE TECHNIQUES
Pravendra Singh Chauhan*
ABSTRACT
When search results against digital libraries and web resources have limited metadata, augmenting them with meaningful and stable category information can enable better overviews and support user exploration. This paper proposes six ?fast-feature? techniques that use only features available in the search result list, such as title, snippet, and URL, to categorize results into meaningful categories. They use credible knowledge resources, including a US government organizational hierarchy, a thematic hierarchy from the Open Directory Project (ODP) web directory and personal browse histories, to add valuable metadata to search results. In three tests the percent of results categorized for five representative queries was high enough to suggest practical benefits: general web search (76-90%), government web search (39-100%), and the Bureau of Labor Statistics website (48-94%). An additional test submitted 250 TREC queries to a search engine and successfully categorized 66% of the top 100 using the ODP and 61% of the top 350. Fast-feature techniques have been implemented in a prototype search engine. We propose research directions to improve categorization rates and make suggestions about how web site designers could re-organize their sites to support fast categorization of search results. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval; H.3.7 [Information Storage and Retrieval]: Digital Libraries General Terms Measurement, Design, Experimentation, Human Factors.
[Full Text Article] [Download Certificate]