Web Consultant Agent Architecture

Bob Marcus
bob_marcus@mail.amsinc.com


Introduction:

This paper describes an architecture for integrating diverse knowledge sources in specialized domains using Web and data modeling technologies. Currently information about specific topics is usually scattered across many different Web sites and databases. End-users gather information by following links or using some form of search engine. The search engines generally using full-text indexing combined with some topic indexing(e.g. Yahoo), limited metadata access(e.g. applet:string in Altavista) or automatic semantic grouping(e.g. Visual Live Topics in Altavista). These methods are reaching the limit of their effectivenes as the size of the Internet increases. The information that is accessible through dynamic Web pages, such as the result of database queries, is not generally available to the search engines.

On individual Web sites or Intranets, it is possible to define a standardized metadata model that is used to describe information. For example, several years ago I built a simple registry system for Web documents, data, and components that could be used to gather and store metadata. Links to data, documents and components could then be retrieved by queries on the metadata.

A more robust approach is to store the Web documents, components and links in a database. The new Universal Servers(e.g. Informix/Illustra) and Web Object Databases(e.g. ObjectStore) make this feasible. It is possible to store and query the metadata within these databases to assemble dynamic documents. I worked closely with Illustra on the original Web Data Blade for these reasons. Unfortunately providing a unified view of this metadata across products and organizations is a very difficult problem. Information about the same specific topic area may be described differently, stored in different formats, and spread across many Web sites. The key question for this paper is how to make this information available to an end-user within a reasonable amount of time and effort.

The problem can be visualized by considering a person who is doing research on a specific topic in a library system where all of the world's books are stored. The books can be located at sites across the world and written in different languages. The library system has a full-text index of all of the books including those written in different languages. There are extensive cross-references in the bibliographies of the books. Searches based on simple metadata such as title, author, or topic are possible. For most people, it would be very difficult if not impossible to gather useful information in this environment. The size and completeness of the library would actually be a handicap. It is hard to see how any uniform metadata scheme could be imposed on the contents of the library to solve this problem.

In the real world, people are forced to obtain useful timely information in environments that are even more unstructured than this hypothetical library. The solution in many cases is to turn to people who have developed expertise in the topic area over a long period of time. These consultants should be aware of the different information sources and vocabularies that are used in their domain. They are able to filter out incorrect, misleading or incomplete data and to summarize important facts in a clear fashion. On the Web, some of the most valuable pages are collections of links gathered and annotated by an expert in a specific domain. The basic goal of the Web Consultant Agent Architecture is to duplicate this type of service in software.

Building a Web consultant agent:

The initial construction of a Web consultant agent will require the skills of a domain expert, a data modeler, and a Web expert. It will be necessary to build an initial model of the domain, its synonyms, terminology, and relationships, that can be used as a metadata resource for describing and classifying Web data, documents and components. (This metadata resource is similar to the tree-structured thesauruses already available in some domains for indexing.). The next step is to collect a set of Web documents relevant to the domain indexed by and linked to the metadata. This collection will serve as the knowledge base for the consultant agent. Hopefully tools will be available to automate much of this process. Many years ago at Hewlett-Packard, I developed a hypertext knowledge acquisition and retrieval tool (KAT) for computer support documents that used this structure.

Once an initial knowledge base is available, it will be necessary to provide users flexible interfaces to the consultant agent. The consultant agent should have a Web page that can be used as an entry point for interactive end-users. Browsing and full-text searching capabilities similar to the standard Web capabilities should be made available to the metadata repository. Queries to retrieve metadata and Web documents based on metadata attributes should be possible. It should also be possible to obtain information from the consultant agent through an API that could be accessed remotely. For example, this could be done through Web server scripts, distributed object middleware, servlets, or mail agent interfaces. Several years ago I developed an agent that was interfaced with a Web server who could communicate through e-mail, remote procedure calls or the Web using the old Mosaic Common Client Interface.

One of the key usability questions is how to determine the format and types of queries to which the consultant agent can respond. One solution is to define a standard set of protocols for describing and discovering the syntax of consultant agent interfaces. Another possibility is for consultant agents to provide elaborate help facilities that are available to end-users.

Consultant agent operational issues:

Once a consultant agent has been developed, the next step is making it accessible to end-users. There are several ways this can be accomplished. One possibility is to have a Web site("agency") where consultant agents could register their capabilities and locations. The agents could then be accessed by manual browsing or a querying mechanism. It will probably be necessary to developed a higher level metadata scheme for classifying consultant agents. This raises some of the same issues that occur when trying to locate reusable software components.

One of the most important critical issues related to operating consultant agent system over time is the maintenance of the knowledge base. Since documents and data are continually changing in content and location, how can the agent consultant provide accurate and up-to-date information. One solution is to use mobile agents to periodically monitor changes in the Web documents and data. Storing the Web objects in a database would allow this to be done using queries. However, this is very resource intensive and is probably not scaleable as the number and size of consultant agents increase. An alternate approach is to have Web sites provide local agents that can accept subscriptions to local data and notify subscribers when changes take place. The maintenance of metadata concerning domains will probably require manual processes for the foreseeable future.

An interesting question for the future is whether it will be possible for agents to communicate among themselves to satisfy user requests. For consultant agents, the ability to exchange information would permit the partitioning of domains into smaller modules. The consultant agents for the modules would be far easier to develop and maintain than a large more monolithic agent. However many of the issues related to the communication between agents with different knowledge bases are still unsolved.

Conclusion:

Transforming Web objects into a source of usable information for the non-expert is an extremely difficult problem. In this paper, an architecture for building consultant agents to assist in this task is proposed and outlined. The motivation for the architecture is the observation that human domain search experts are often used to search large information sources. The capturing and storing in a structured fashion of the metadata used by these experts is a first step in building software consultant agents. The indexing of Web information sources by this metadata is a second major requirement for building a knowledge base. Once the knowledge base is available, flexible methods for retrieving and maintaining information and metadata must be provided. There are many challenging issues that must be resolved to create and operate usable, maintainable, scaleable systems for information and data retrieval across the Internet. The use of consultant agents is an approach worth exploring.