Arrow Research search
Back to AAAI

AAAI 1999

AI & the World Wide WebRecognizing Structure in Web Pages Using Similarity Queries

Conference Paper Technical Papers Artificial Intelligence

Abstract

Wepresent general-purpose methodsfor recognizing certain types of structure in HTML documents. The methodsare implementedusing WHIRL, a "soft" logic that incorporates a notion of textual similarity developed in the information retrieval community. In an experimental evaluation on 82 Web pages, the structure ranked first byour methodis "meaningful"--i. e. , a structure that wasused in a hand-coded"wrapper", or extraction program, for the page--nearly 70%of the time. This improveson a value of 50%obtained by an earlier method. With appropriate backgroundinformation, the structure-recognition methodswedescribe can also be used to learn a wrapper from examples, or for maintaining a wrapper as a Web page changes format. In these settings, the top-rankedstructure is meaningfulnearly 85%of the time.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
AAAI Conference on Artificial Intelligence
Archive span
1980-2026
Indexed papers
28718
Paper id
728602220295830593