bob_marcus@mail.amsinc.com
This paper describes an architecture for integrating diverse knowledge sources in specialized domains using Web and data modeling technologies. Currently information about specific topics is usually scattered across many different Web sites and databases. End-users gather information by following links or using some form of search engine. The search engines generally using full-text indexing combined with some topic indexing(e.g. Yahoo), limited metadata access(e.g. applet:string in Altavista) or automatic semantic grouping(e.g. Visual Live Topics in Altavista). These methods are reaching the limit of their effectivenes as the size of the Internet increases. The information that is accessible through dynamic Web pages, such as the result of database queries, is not generally available to the search engines.
On individual Web sites or Intranets, it is possible to define a standardized metadata model that is used to describe information. For example, several years ago I built a simple registry system for Web documents, data, and components that could be used to gather and store metadata. Links to data, documents and components could then be retrieved by queries on the metadata.
A more robust approach is to store the Web documents, components and links in a database. The new Universal Servers(e.g. Informix/Illustra) and Web Object Databases(e.g. ObjectStore) make this feasible. It is possible to store and query the metadata within these databases to assemble dynamic documents. I worked closely with Illustra on the original Web Data Blade for these reasons. Unfortunately providing a unified view of this metadata across products and organizations is a very difficult problem. Information about the same specific topic area may be described differently, stored in different formats, and spread across many Web sites. The key question for this paper is how to make this information available to an end-user within a reasonable amount of time and effort.
The problem can be visualized by considering a person who is doing research on a specific topic in a library system where all of the world's books are stored. The books can be located at sites across the world and written in different languages. The library system has a full-text index of all of the books including those written in different languages. There are extensive cross-references in the bibliographies of the books. Searches based on simple metadata such as title, author, or topic are possible. For most people, it would be very difficult if not impossible to gather useful information in this environment. The size and completeness of the library would actually be a handicap. It is hard to see how any uniform metadata scheme could be imposed on the contents of the library to solve this problem.
In the real world, people are forced to obtain useful timely information in environments that are even more unstructured than this hypothetical library. The solution in many cases is to turn to people who have developed expertise in the topic area over a long period of time. These consultants should be aware of the different information sources and vocabularies that are used in their domain. They are able to filter out incorrect, misleading or incomplete data and to summarize important facts in a clear fashion. On the Web, some of the most valuable pages are collections of links gathered and annotated by an expert in a specific domain. The basic goal of the Web Consultant Agent Architecture is to duplicate this type of service in software.
Once an initial knowledge base is available, it will be necessary to provide users flexible interfaces to the consultant agent. The consultant agent should have a Web page that can be used as an entry point for interactive end-users. Browsing and full-text searching capabilities similar to the standard Web capabilities should be made available to the metadata repository. Queries to retrieve metadata and Web documents based on metadata attributes should be possible. It should also be possible to obtain information from the consultant agent through an API that could be accessed remotely. For example, this could be done through Web server scripts, distributed object middleware, servlets, or mail agent interfaces. Several years ago I developed an agent that was interfaced with a Web server who could communicate through e-mail, remote procedure calls or the Web using the old Mosaic Common Client Interface.
One of the key usability questions is how to determine the format and types of queries to which the consultant agent can respond. One solution is to define a standard set of protocols for describing and discovering the syntax of consultant agent interfaces. Another possibility is for consultant agents to provide elaborate help facilities that are available to end-users.
One of the most important critical issues related to operating consultant agent system over time is the maintenance of the knowledge base. Since documents and data are continually changing in content and location, how can the agent consultant provide accurate and up-to-date information. One solution is to use mobile agents to periodically monitor changes in the Web documents and data. Storing the Web objects in a database would allow this to be done using queries. However, this is very resource intensive and is probably not scaleable as the number and size of consultant agents increase. An alternate approach is to have Web sites provide local agents that can accept subscriptions to local data and notify subscribers when changes take place. The maintenance of metadata concerning domains will probably require manual processes for the foreseeable future.
An interesting question for the future is whether it will be possible for agents to communicate among themselves to satisfy user requests. For consultant agents, the ability to exchange information would permit the partitioning of domains into smaller modules. The consultant agents for the modules would be far easier to develop and maintain than a large more monolithic agent. However many of the issues related to the communication between agents with different knowledge bases are still unsolved.