Mining the Web
One application area we are interested in can be broadly describing as
mining information on the Web. Knowbot Programs offer new
opportunities for searching, browsing, and extracting information on
the Internet. The use of mobile code allows new information
management techniques to be developed and deployed quickly and
Before describing several specific applications we are developing,
there are a few general problems with the existing Web infrastructure
that motivate the use of mobile agents.
- Web servers and the HTTP protocol are poorly suited for providing
database-like access to information. Current protocols do not support
operations performed on groups of pages, both because the protocol is
page-at-a-time oriented and because the interfaces do not expose the
larger structure of a collection of Web pages.
- The requirements for information management applications varies
widely and many diverse groups of developing new systems. No single
interface can provide the right interface for each application; nor
can designers of the underlying infrastructure, working in a standards
body, be expected to keep up with the new demands on applications.
- Mobile agents eliminate the need for developing new interfaces and
deploying implementations at each Web site. The basic agent
infrastructure allows any application designer to send new code to a
remote server on demand. The server can provide a simple, efficient
low-level interface and the application writer can perform higher
level operations using the agent. The use of agents allows the higher
level interface to perform operations that wouldn't make sense if they
were performed over a wide-area network, because, e.g., the high
latency of low-level operations.
Several specific applications we are
developing or are interested:
- A system for mapping the structure of a Web site. The agent
can crawl over a large collection of Web pages and return, e.g., a
summary of the contents or a table of contents.
- Distributed or meta searching agents. An agent can perform
query operations at multiple sites and efficient combine the results
without bringing all of the results back to the user.
- Ad hoc queries and searches. Many sites have search
engines, which are quite helpful when they support the kinds of
queries a user is interested in making. But when the query is more
complex, e.g. when it combines text search and structural queries, the
search engine is no help.
- Tracking modifications and sending notifications. A
resident agent could provide value-added services for a Web site. One
example is tracking changes to pages on the Web site.
- New interfaces for meta-information. One example would be
providing prefetching hints based on analysis of the Web server logs.