back up on

Web Access to NASA's Remote Sensing Data through GeoHarness *

Leon Shklar, and Clifford Behrens

Bell Communications Research, 445 South St., Morristown, NJ 07960
and Computer Science Department, Rutgers University, New Brunswick, NJ 08903

The GeoScope Project

At the moment, there exists limited information on the location and availability of geospatial data, and most data browsing tools available to the public require considerable technical knowledge about computing, remote sensing, geodesy and cartography to use them effectively. Moreover, existing geospatial data and its metadata are stored in a variety of formats and representations. The goal of the GeoScope project is to make NASA's remote sensing imagery and other geospatial data available to the broader public on the Internet, while simultaneously providing a framework for resolving network and data interoperability issues.

To best serve the needs of non-traditional end-users of geospatial data outside of NASA's community of scientists and researchers, e.g., Geographic Information Systems (GIS) practitioners in state and local government, a solution should be flexible enough to:

  1. support a variety of metadata schema and data models, and
  2. support federating widely distributed and heterogeneous metadata catalogs.

In our earlier work [6] [7], we have discussed InfoHarness, a system that supports off-line generation of metadata entities to encapsulate heterogeneous raw data, and run-time use of these entities to support search and presentation of encapsulated data. But searches of remote sensing imagery and other geospatial data also require support for complex content-descriptive metadata in the form of heterogeneous hierarchical attributes, as in the FGDC metadata content standard [3]. Moreover, the data encapsulation process has to be designed to incorporate previously defined content-descriptive metadata.

In this position paper, we present a brief summary of GeoHarness, a prototype extension of InfoHarness that is capable of supporting multiple heterogeneous metadata schemas and serving as a basis for federating metadata catalogs. GeoHarness is being used to implement the GeoScope catalog server, which is a major component of the GeoScope architecture [4]. The in-memory implementation of GeoHarness is currently available, a scalable version based on a commercial OODBMS will be available in the first half of 1996.

The GeoScope work is being performed for NASA by the Universal Spatial Data Access Consortium (USDAC) under a three-year, $2.3M Cooperative Agreement. Members of the USDAC include Bellcore, Camber Corporation, California Resource Agency, the Open GIS Consortium, Rutgers University's Center for Remote Sensing and Spatial Analysis, and NASA's Space Science Data Operations Office.


Fig. 1. GeoHarness Architecture.


GeoHarness Architecture

As shown in fig. 1, the main components of the current implementation of the GeoHarness architecture include:
  1. GeoHarness Server (GH Server), which uses metadata to traverse, search, and retrieve the original information.
  2. HTTP Gateway, which is used to pass requests from HTTP clients to the GH Server and responses back to the clients.
  3. Metadata Generator, which is used for off-line generation of metadata that represents the desirable view on the structure and organization of the original information. This metadata is used by the server for run-time search and retrieval.
Independent tools for accessing and displaying information (e.g., xv, xman, a GIS, etc.), and for indexing information (e.g., WAIS, LSI), are not part of GeoHarness.

At run-time, HTTP clients may issue query, traversal, or retrieval requests that are passed on to the gateway, which performs the following operations:

  1. Parses the request, and reads input information when the request is associated with an HTML form.
  2. Establishes a socket connection with the GH Server, generates and sends out a request, and waits for a response.
  3. Parses the response, converts it to a combination of HTML forms and hyperlinks, adds the HTTP header, and passes the transformed response to an HTTP browser.
The GeoHarness architecture is open, modular, extensible and scalable. GeoHarness implements an abstract class hierarchy that does not have to be modified to support a new data type, or a new indexing technology. Methods associated with GeoHarness' abstract classes are general enough because they are data-driven and may invoke independent programs. The definitions of terminal classes are also data-driven and are not part of the implementation, which makes the system capable of supporting arbitrary information access and management tools (e.g., browsers, indexing technologies, access methods).

The system addresses the goals of scalability and interoperability in a large and geographically distributed environment by supporting multiple servers and multiple clients, and by decentralizing the administration of information repositories.

Object Model

Metadata entities that encapsulate portions of the original information of interest to end-users are called encapsulation units. We will refer to the encapsulated portions of information as contents of these metadata entities. A unit content may be a file (e.g., a Landsat Multispectral Scanner (MSS) image, or a USGS Digital Line Graph (DLG)), a portion of a file (e.g., a section from the Landsat Data User's Handbook), a set of files (e.g., imagery and digital elevation models related by location), or a request for the retrieval of data from an external source (e.g., a database query). For example, a whole document such as the Landsat Data User's Handbook and its bibliography, may in different contexts, each present a unit of information of interest to end-users. Consequently, they may be encapsulated by separate units.

GeoHarness Attributes

We say that an attribute describes an encapsulation unit if its value is one of the following:
  1. Numerical, textual, or binary items (or item) that in some way describe the encapsulated primary data.
  2. A function of other attributes.
  3. A set of references to other attributes, each of which describes the encapsulation unit.
  4. An encapsulation unit whose content satisfies condition 1.
Notice, that item 2 of the definition introduces derived attributes, a primary mechanism for supporting integration of different but related semantic models of geospatial data by enabling the definition of new attributes as some transformations of those that are already available. Moreover, item 3 provides support for hierarchical attributes, while item 4 serves to extend the notion of encapsulation to portions of metadata. The latter is particularly important to handle lower resolution browse and thumbnail images.

Consider the example in fig. 2, where all attributes describe the same encapsulation unit the content of which is a Digital Elevation Model (DEM) for some area. Here, the areal extent of the DEM is represented as a Bounding Box, whose value is assigned a set of numeric latitude/longitude coordinates. The Bounding Box satisfies condition 2 and the numeric coordinates satisfy condition 1 of the above definition. The Thumbnail Image attribute meets condition 3 of the definition because its value is an encapsulation unit, the content of which is a thumbnail image of the DEM.


Fig. 2. Sample Geospatial Attributes in GeoHarness.


We say that a set of attributes is a meta-description of an encapsulation unit if the following two conditions hold:

Using the example above, a GeoHarness meta-description for a DEM might consist of the available FGDC attributes together with attributes decreed by the Open GIS Data Model [1].

GeoHarness Objects

A GeoHarness object is defined recursively as one of the following:
  1. A simple object, composed of a single encapsulation unit (e.g., a Landsat MSS image) and its meta-description (e.g., an attribute dictionary together with the image header information, and a browse product for this image).
  2. A composite object that combines a simple object (such as the one above) and a set of references to other GeoHarness objects (e.g., encapsulation unit and meta-description for the Landsat Data User's Handbook together with references to encapsulated sections from this guide).
  3. A collection object, composed of a set of references to other GeoHarness objects (its children) and a (possibly empty) set of references to independent indices.

Each object has a unique identifier that is recognized and maintained by the GeoScope catalog server. Simple and composite objects store the location of raw data, the logical address of the encapsulated portion of this data, and typing information that determines the data retrieval method. For example, an object that encapsulates a feature in a USGS Digital Line Graph might be assigned the presentation type SDTS and contain path information for the file and the name of the feature. Here, type serves as an implicit reference to a presentation method that would first separate out a vector feature layer, then convert it to gif and forward it to a browser.

Collection objects may contain references to multiple independent indices that in turn reference children of the collection. Each indexed collection stores sources and locations of index-related data structures, and references to associated query methods. Consequently, any indexed collection may make use of data retrieval methods that are not parts of GeoHarness. As a result, GeoHarness repositories may utilize existing heterogeneous index structures.

Any collection or composite object in a repository may contain references to objects that encapsulate entry points of other repositories. Their encapsulation information determines translations, conversions, etc. that may be necessary to support the retrieval from non-GeoHarness repositories. We are planning to support federations of possibly heterogeneous catalog servers later in this project.

Current Work

We are now working on scaling-up the current in-memory implementation of GeoHarness by using a commercial OODBMS. Later, we hope to support to support using geoHarness with a public OODBMS (e.g., SHORE [2]). We would also like to consider using the LOREL [5] query language once it is fully implemented and available to the public.

References

[1]
K. Buehler (ed.). "The Open Geodata Interoperability Specification," Draft Base Document - OGIS Project Document 94-025R2, December 22, 1994.
[2]
M. Carey, D. DeWitt, J. Naughton, M. Solomon, et. al. "Shoring Up Persistent Applications", Proceedings of the 1994 ACM SIGMOD Conference, Minneapolis, MN, May 1994.
[3]
"Content Standards for Digital Geospatial Metadata", Federal Geographic Data Committee, June 1994.
[4]
A. Cornelio and G. Thomas. "GeoScope Requirements", Bellcore Internal Document, June 1995.
[5]
D. Quass, A. Rajaraman, Y. Sagiv, J.D. Ullman, and J. Widom. "Querying semistructured heterogeneous information", Computer Science Department internal memorandum, Stanford University, November 1994.
[6]
L. Shklar, S. Thatte, H. Marcus, and A. Sheth, "The InfoHarness Information Integration Platform", Proceedings of the Second International WWW Conference `94, Chicago, IL, October 17-20, 1994, pp. 809-819.
[7]
L. Shklar, A. Sheth, V. Kashyap, and K. Shah, "InfoHarness: Use of Automatically Generated Metadata for Search and Retrieval of Heterogeneous Information", Lecture Notes in Computer Science, 932, 1995, pp. 217-230.

Footnotes

*
This work is a part of the GeoScope project and is supported by the NASA Cooperative Agreement NCC5-102.
TM
InfoHarness is a trademark of Bell Communications Research, Inc.
back up on