An Overview of the MetaMagic Architecture

Leon Shklar, David Makower, and Weiyeh Lee
Pencom Systems, Inc., 40 Fulton St., New-York, NY 10038
{ leon | davemak | weiyeh }@pencom.com

Position paper for the Workshop on Object-Oriented Web Servers and Data Modeling,
Sixth International World Wide Web Conference.
Monday, April 7, 1997
Santa Clara, USA.

Abstract

This paper provides a brief overview of the MetaMagicTM architecture, a model that utilizes metadata objects, generated by and residing on a Web server, to faciliate the flexible presentation of heterogeneous data, which may itself be located elsewhere. To allow access to data in any number of locations, available via any number of protocols, we define an abstract data source interface -- an interface which the metadata objects utilize in order to obtain data, and in turn implement in order to serve as data sources in their own right. The entire architecture is implemented as a hierarchy of Java classes, extending object-oriented functionality currently provided by W3C's Jigsaw server. Future plans include adapting the hierarchy to work with server-side Java implementations emerging for various servers.

1.0 Introduction

The World-Wide Web is continuing to grow, and while it has become easier to make information available on the Web, it is often impossible to obtain access to exactly the right information in the right form. As a result, organizations and individuals have devoted substantial effort and expense to the task of building new Web sites by copying and reformatting data that already exists on the Internet. Corporations, recognizing the power of the Web browser as the "universal client" for accessing corporate data, are devoting millions of man-hours to duplicating and adapting legacy data into forms more suitable for presentation on a Web-based Intranet. This leads to multiplied data-maintenance costs and the proliferation of obsolete data. Such problems may be avoided by keeping the original data in place and using metadata to support the desired search and presentation of existing resources.

Our model focuses on the creation of MetaMagicTM resources, logical entities that use metadata to retrieve their content by accessing one or more data sources and that have one or more methods of presenting that content via the Web. Our notion of a data source is abstract, in that a data source may represent a local file, a URL, a database query, or any other entity that may yield a stream of data. The notion is also recursive; a MetaMagic resource utilizes data sources in order to access its content, but also, through its presentation method (or methods), may itself be utilized as a data source. MetaMagic resources are grouped together into virtual containers, which are in turn grouped into repositories. The containers and repositories may then be associated with any available content-based indexing technology.

An important advantage of our solution is advanced support for automated generation of MetaMagic resources. As in [shk95-2], we define and implement high-level operations to control data analysis and metadata generation and organization.

In addition to supporting flexible access to existing information, MetaMagic provides a reason to seriously rethink the way to design and build new Web sites. With MetaMagic supporting multiple dynamic views of the same data, it becomes beneficial to separate content from presentation. The main task of the site design is then to logically collect the content in the most transparent form (e.g., plain text files or database entries), without moving data to a single file system or redesigning data maintenance procedures. An important additional benefit of such approach is in protecting Web sites from the assault of new presentation technologies that have long outdated early HTML pages. Achieving a cutting-edge presentation would only involve upgrading presentation methods without changing physical content.

Instead of implementing yet another HTTP server in order to support the MetaMagic model, we have designed and implemented a hierarchy of Java classes that support data modeling capabilities. These classes extend functionality currently provided by W3C's object-oriented Jigsaw server [bai96], and are capable of being adapted to work with the coming generation of commercial object-oriented Web servers.

2.0 The MetaMagic Model

In MetaMagic, we create logical Web resources -- that is, resources accessed by a URL, but represented by object specifications, rather than some physical file system entity -- that are composed of metadata attributes. We call these resources MetaMagic resources. At presentation time, when a request for such a resource arrives at the MetaMagic server, these metadata attributes, in combination with the request information, determine content of the reply.

2.1 Data Sources

A MetaMagic data source is an abstraction for an entity that can present its content as a data stream. Thus, a local file may be represented as a data source, as may a file accessed via an FTP connection, an object obtained via HTTP URL, a resource obtained via a gopher URL, the results of a database query, etc. Although various data sources may use different retrieval mechanisms, all of them maintain the same interface for presenting their content. For example, a data source for an HTTP URL obtains content through the HTTP get request, while a data source for an SQL query may execute a shell command utilizing an ad-hoc query tool. Regardless of these differences, both present their content as a stream after receiving a "present content" request.

Because there may be an arbitrary number of different data source classes, providing a consistent interface, but differing in their operational implementation, objects using data sources make use of a data source factory to construct specific data sources.

2.2 MetaMagic Resources

A MetaMagic resource is a data source capable of packaging its content as an HTTP response. MetaMagic resources can be collected into virtual containers. A virtual container is itself a MetaMagic resource that contains other MetaMagic resources, much in the same way that a directory on a filesystem is seen as containing files. MetaMagic resources that are not containers are referred to as leaf resources. MetaMagic resources are referred to by HTTP URLs

2.3 MetaMagic Repositories

A MetaMagic repository is a collection of leaf resources and virtual containers. Repositories are specified by sequences of operations that, when executed, generate resources that constitute these repositories. The process of generating a MetaMagic repository involves:

3.0 Implementation Considerations

3.1 Platform Independence

Rather than attempt to implement our own HTTP server, we set out to design and implement a hierarchy of classes that would work with a variety of emerging extensible servers.

Albeit with certain qualifications, the choice of Java as an implementation language allows us to take platform-independence in the traditional sense for granted. Instead, we consider portability in terms of server platforms. In this context, the challenge is to ensure that our class hierarchy is general enough to extend different HTTP servers.

We have formulated a number of requirements for HTTP servers that qualify them as acceptable MetaMagic platforms:

  1. The server is object-oriented in the sense that it models exportable information resources as objects.
  2. The server supports the registration of non-physical entities in its exportable space, or may be extended to support such functionality.
  3. The server provides methods that look up and return references to objects registered in its exportable space.
  4. The server provides an extension API whereby attributes and behavior may be defined for new subclasses of server resource objects.
  5. Some mechanism for object serialization and persistence is available, whether provided by the server or the implementation language. This mechanism is desirable so that MetaMagic objects need not remain constantly in memory, and will persist across invocations of the server.

Having adopted such an approach, we determined that the Jigsaw reference server from W3C was the natural and obvious choice as the basis for our initial prototyping. Jigsaw is implemented in Java and provides a Java API with all of our required features. Furthermore, the Jigsaw server is publicly available, including complete source code.

Plans for adapting and generalizing the architecture to work with server-side Java implementations of various other servers are already underway.

3.2 Repository Generation

Before MetaMagic resources can present their content via the Web, they must be generated and installed in repositories on the Web server. The first operation that must be performed to generate a repository is the encapsulate operation. To support this operation, we define a class Encapsulator with an abstract method encapsulate(). Subclasses of Encapsulator must implement this method in order to provide a specification for the analysis of data streams.

Figure 1
Fig. 1. The encapsulation process.

While performing the analysis, the Encapsulator obtains metadata, which it stores in MetaDataNode objects (see fig. 1). On completion of the analysis, these objects are returned as a set. Each MetaDataNode is essentially a collection of name-value pairs, and may contain references to a parent node and zero or more child nodes. Neither parent nor child references are set by the encapsulate() method; parent-child relationships are established later by the group operation.

Figure 2
Fig. 2. Grouping Sets into a Hierarchy.

After the Encapsulators have analyzed their streams, and each has returned a set of MetaDataNodes, set operations may be performed in order to remove elements from the sets, combine sets, or create new sets from existing elements. Sets of MetaDataNodes are then grouped together by creating a new MetaDataNode to contain them. The group operation is illustrated in fig. 2. The container node is designated as the parent, and the contained nodes become its children.

Figure 3
Fig. 3. Converting MetaDataNodes to Jigsaw resources.

Finally, after the MetaDataNodes have been arranged in a suitable hierarchy, the nodes are converted into Jigsaw resources and installed into the server's information space, in a configuration mirroring the hierarchy of MetaDataNodes (fig. 3). This is accomplished via a pre-order traversal of the hierarchy, converting each MetaDataNode as it is visited. MetaDataNodes with children are converted into VirtualContainerResources, and those without children are converted into LeafResources, which are implemented as extensions of Jigsaw's ContainerResource and FilteredResource classes, respectively.

3.3 Processing HTTP Requests

Figure 4
Fig. 4. Processing HTTP Requests.

When a request arrives at a MetaMagic-enabled server, and the target of the request is a MetaMagic resource (as opposed to any ordinary Web resource, which is handled in the normal way), then the MetaMagic resource responds to the request by sending a presentContent() message to itself, initiating the following procedure:

  1. The MetaMagic resource utilizes metadata attributes to obtain data source identifiers, and passes these to the DataSourceFactory, which instantiates the appropriate data source classes.
  2. Having obtained references to its data source objects, the MetaMagic resource invokes the presentContent() method for each data source, obtaining each data source's content as an InputStream.
  3. Now the MetaMagic resource can read from the InputStreams provided by its data sources, filtering, integrating, processing or interpreting these streams to generate its own content.
  4. After processing its data sources, the MetaMagic resource creates an InputStream containing the generated content. This stream is returned by the getContent() method.

To complete its response to the HTTP request, the MetaMagic resource attaches the generated content to an HTTPReply object, fills in any applicable HTTP header fields in the reply, and then passes the reply to the server to send back to the browser.

Note that the MetaMagic resource, rather than the server, is responsible for attaching the reply header. This has important implications, particularly for the Content-type field. While a typical Web server sets the Content-type field based upon a configurable but static mapping between file extensions and MIME types, MetaMagic resources set their MIME types dynamically at presentation time. This allows a single MetaMagic resource to present its content flexibly, based on information that may be passed in through the HTTP request.

4.0 Conclusions and Future Work

We believe MetaMagic to be a step in changing the approach to building Web sites because it provides technology for separating information content from its presentation. This division maps neatly to practical reality: content often changes independently of the method for presenting it, and vice versa. As new extensions to HTML find their way into the latest browsers, only the presentation methods of a MetaMagic-enabled Web site need to change in order to remain in step with cutting-edge trends; the content does not need to change. Conversely, as content is updated, presentation methods can remain the same, providing a consistent look-and-feel to a Web site. It is clear that the double advantage of separating content from presentation is desirable for both the Internet and the corporate Intranets. MetaMagic enables a very smooth transition to this new approach because it was created to provide flexible access to legacy data.

In addition to streamlining Web site design, MetaMagic provides very high flexibility in personalizing data presentation based on client information. We foresee many possible applications for such personalization, including, but not limited to:

Our current work is mainly concentrated on designing and implementing a scripting language for building information repositories and a visual front-end for this language. We are also investigating techniques for using software agents to maintain referential integrity of both logical and physical resources.

5.0 References

[bai96]
A. Baird-Smith. Jigsaw: An object-oriented server, Design document. World Wide Web Consortium, 1996.

[boo94]
G. Booch. Object-Oriented Analysis and Design with Applications, Second Edition. Menlo Park, CA: Addison-Wesley, 1994.

[bro96]
M. Brown and M. Najork. Distributed Active Objects, Computer Networks and ISDN Systems, 28(7/11), Special Issue on the Fifth International WWW Conference'96, pp.1037-1052. Elsevier Science, 1996.

[mer96]
P. Merle, C. Gransart, J.-M. Geib, CorbaWeb: A Generic Object Navigator, Computer Networks and ISDN Systems, 28(7/11), Special Issue on the Fifth International WWW Conference'96, pp.1269-1282. Elsevier Science, 1996.

[ree95]
O. Rees, N. Edwards, M. Madsen, M. Beasley and A. McClenaghan. A Web of Distributed Objects, In Proceedings of the Fourth International WWW Conference'95, pp.75-88. O'Reilly & Assoc., 1995.

[shk95-1]
L. Shklar, A. Sheth, V. Kashyap, and K. Shah. InfoHarness: Use of Automatically Generated Metadata for Search and Retrieval of Heterogeneous Information, Proceedings of CAiSE'95, June 12-16, Jyvaskyla, Finland, Springer-Verlag Lecture Notes in Computer Science #932.

[shk95-2]
L. Shklar, K. Shah, and C. Basu, " Putting Legacy Data on the Web: A Repository Definition Language", Computer Networks and ISDN Systems, 27(6), Special Issue on the Third International WWW Conference'95, pp.939-952. Elsevier Science, 1995.

[shk97]
L. Shklar, D. Makower, and W. Lee, " MetaMagic: Generating Virtual Web Sites Through Data Modeling", Sixth International WWW Conference (Poster Session), Santa Clara, CA, April 1997.