Mining the Web

One application area we are interested in can be broadly describing as mining information on the Web. Knowbot Programs offer new opportunities for searching, browsing, and extracting information on the Internet. The use of mobile code allows new information management techniques to be developed and deployed quickly and efficiently.

Before describing several specific applications we are developing, there are a few general problems with the existing Web infrastructure that motivate the use of mobile agents.

Web servers and the HTTP protocol are poorly suited for providing database-like access to information. Current protocols do not support operations performed on groups of pages, both because the protocol is page-at-a-time oriented and because the interfaces do not expose the larger structure of a collection of Web pages.
The requirements for information management applications varies widely and many diverse groups of developing new systems. No single interface can provide the right interface for each application; nor can designers of the underlying infrastructure, working in a standards body, be expected to keep up with the new demands on applications.
Mobile agents eliminate the need for developing new interfaces and deploying implementations at each Web site. The basic agent infrastructure allows any application designer to send new code to a remote server on demand. The server can provide a simple, efficient low-level interface and the application writer can perform higher level operations using the agent. The use of agents allows the higher level interface to perform operations that wouldn't make sense if they were performed over a wide-area network, because, e.g., the high latency of low-level operations.

Several specific applications we are developing or are interested:

A system for mapping the structure of a Web site. The agent can crawl over a large collection of Web pages and return, e.g., a summary of the contents or a table of contents.
Distributed or meta searching agents. An agent can perform query operations at multiple sites and efficient combine the results without bringing all of the results back to the user.
Ad hoc queries and searches. Many sites have search engines, which are quite helpful when they support the kinds of queries a user is interested in making. But when the query is more complex, e.g. when it combines text search and structural queries, the search engine is no help.
Tracking modifications and sending notifications. A resident agent could provide value-added services for a Web site. One example is tracking changes to pages on the Web site.
New interfaces for meta-information. One example would be providing prefetching hints based on analysis of the Web server logs.