Web Access to NASA's Remote Sensing Data through GeoHarness
*
Leon Shklar,
and Clifford Behrens
Bell Communications Research,
445 South St., Morristown, NJ 07960
and Computer Science Department,
Rutgers University,
New Brunswick, NJ 08903
The GeoScope Project
At the moment, there exists limited information on the
location and availability of geospatial data, and most data browsing
tools available to the public require considerable technical knowledge
about computing, remote sensing, geodesy and cartography to use them
effectively. Moreover, existing geospatial data and its metadata are
stored in a variety of formats and representations. The goal of the
GeoScope project is to make NASA's remote sensing imagery and
other geospatial data available to the broader public on the Internet,
while simultaneously providing a framework for resolving network and
data interoperability issues.
To best serve the needs of non-traditional end-users of geospatial
data outside of NASA's community of scientists and researchers, e.g.,
Geographic Information Systems (GIS) practitioners in state and local
government, a solution should be flexible enough to:
- support a variety of metadata schema and data models, and
- support federating widely distributed and heterogeneous metadata
catalogs.
In our earlier work [6]
[7], we have discussed InfoHarness,
a system that supports off-line generation of metadata entities to
encapsulate heterogeneous raw data, and run-time use of these entities
to support search and presentation of encapsulated data. But searches
of remote sensing imagery and other geospatial data also require
support for complex content-descriptive metadata in the form of
heterogeneous hierarchical attributes, as in the FGDC metadata content
standard [3]. Moreover, the data
encapsulation process has to be designed to incorporate previously
defined content-descriptive metadata.
In this position paper, we present a brief summary of GeoHarness,
a prototype extension of InfoHarness that is capable of supporting multiple
heterogeneous metadata schemas and serving as a basis for federating metadata
catalogs. GeoHarness is being used to implement the GeoScope catalog server,
which is a major component of the GeoScope architecture
[4]. The in-memory implementation of
GeoHarness is currently available, a scalable version based on a commercial
OODBMS will be available in the first half of 1996.
The GeoScope work is being performed for NASA by the Universal Spatial
Data Access Consortium (USDAC) under
a three-year, $2.3M Cooperative Agreement. Members of the USDAC include
Bellcore, Camber Corporation, California Resource Agency, the Open GIS
Consortium, Rutgers University's Center for Remote Sensing and Spatial
Analysis, and NASA's Space Science Data Operations Office.
Fig. 1. GeoHarness Architecture.
GeoHarness Architecture
As shown in fig. 1, the main components of
the current implementation of the GeoHarness architecture include:
- GeoHarness Server (GH Server), which uses metadata to traverse,
search, and retrieve the original information.
- HTTP Gateway, which is used to pass requests from HTTP clients
to the GH Server and responses back to the clients.
- Metadata Generator, which is used for off-line generation of
metadata that represents the desirable view on the structure and
organization of the original information. This metadata is used by
the server for run-time search and retrieval.
Independent tools for accessing and displaying information (e.g., xv, xman,
a GIS, etc.), and for indexing information (e.g., WAIS, LSI), are not part of
GeoHarness.
At run-time, HTTP clients may issue query, traversal, or retrieval requests
that are passed on to the gateway, which performs the following operations:
- Parses the request, and reads input information when the request is
associated with an HTML form.
- Establishes a socket connection with the GH Server, generates and sends
out a request, and waits for a response.
- Parses the response, converts it to a combination of HTML forms and
hyperlinks, adds the HTTP header, and passes the transformed response
to an HTTP browser.
The GeoHarness architecture is open, modular, extensible and scalable.
GeoHarness implements an abstract class hierarchy that does not have to
be modified to support a new data type, or a new indexing technology.
Methods associated with GeoHarness' abstract classes are general enough
because they are data-driven and may invoke independent programs. The
definitions of terminal classes are also data-driven and are not part of
the implementation, which makes the system capable of supporting arbitrary
information access and management tools (e.g., browsers, indexing
technologies, access methods).
The system addresses the goals of scalability and interoperability in
a large and geographically distributed environment by supporting multiple
servers and multiple clients, and by decentralizing the administration of
information repositories.
Object Model
Metadata entities that encapsulate portions of the original information
of interest to end-users are called encapsulation units. We will
refer to the encapsulated portions of information as contents
of these metadata entities. A unit content may be a file (e.g., a
Landsat Multispectral Scanner (MSS) image, or a USGS Digital
Line Graph (DLG)), a portion of a file (e.g., a section from the
Landsat Data User's Handbook), a
set of files (e.g., imagery and digital elevation models related by
location), or a request for the retrieval of data from an external
source (e.g., a database query). For example, a whole document such as
the Landsat Data User's Handbook and its bibliography, may in
different contexts, each present a unit of information of interest to
end-users. Consequently, they may be encapsulated by separate
units.
GeoHarness Attributes
We say that an attribute describes an encapsulation unit if its
value is one of the following:
- Numerical, textual, or binary items (or item) that in some way
describe the encapsulated primary data.
- A function of other attributes.
- A set of references to other attributes, each of which describes
the encapsulation unit.
- An encapsulation unit whose content satisfies condition 1.
Notice, that item 2 of the definition introduces derived attributes,
a primary mechanism for supporting integration of different but related
semantic models of geospatial data by enabling the definition of new
attributes as some transformations of those that are already
available. Moreover, item 3 provides support for hierarchical
attributes, while item 4 serves to extend the notion of encapsulation
to portions of metadata. The latter is particularly important to
handle lower resolution browse and thumbnail images.
Consider the example in fig. 2,
where all attributes describe the same encapsulation unit the content
of which is a Digital Elevation Model (DEM) for some
area. Here, the areal extent of the DEM is represented as a
Bounding Box, whose value is assigned a set of numeric
latitude/longitude coordinates. The Bounding Box satisfies
condition 2 and the numeric coordinates satisfy condition 1 of the
above definition. The Thumbnail Image attribute meets condition
3 of the definition because its value is an encapsulation unit, the
content of which is a thumbnail image of the DEM.
Fig. 2. Sample Geospatial Attributes in GeoHarness.
We say that a set of attributes is a meta-description of an
encapsulation unit if the following two conditions hold:
- Each of the member attributes describes this unit.
- All available attributes that describe this unit are members of
the set.
Using the example above, a GeoHarness meta-description for a
DEM might consist of the available FGDC attributes together with
attributes decreed by the Open GIS Data Model
[1].
A GeoHarness object is defined recursively as one of the following:
- A simple object, composed of a single encapsulation unit
(e.g., a Landsat MSS image) and its meta-description (e.g., an attribute
dictionary together with the image header information, and a browse
product for this image).
- A composite object that combines a simple object (such as the one
above) and a set of references to other GeoHarness objects (e.g.,
encapsulation unit and meta-description for the Landsat Data User's
Handbook together with references to encapsulated sections from this
guide).
- A collection object, composed of a set of references to other
GeoHarness objects (its children) and a (possibly empty) set of references
to independent indices.
Each object has a unique identifier that is recognized and
maintained by the GeoScope catalog server. Simple and composite objects
store the location of raw data, the logical address of the encapsulated
portion of this data, and typing information that determines the data
retrieval method. For example, an object that encapsulates a feature in
a USGS Digital Line Graph might be assigned the presentation type
SDTS and contain path information for the file and the name of
the feature. Here, type serves as an implicit reference to a presentation
method that would first separate out a vector feature layer, then convert
it to gif and forward it to a browser.
Collection objects may contain references to multiple independent
indices that in turn reference children of the collection.
Each indexed collection stores sources and locations of index-related
data structures, and references to associated query methods. Consequently,
any indexed collection may make use of data retrieval methods that are not
parts of GeoHarness. As a result, GeoHarness repositories may utilize existing
heterogeneous index structures.
Any collection or composite object in a repository may contain references
to objects that encapsulate entry points of other repositories. Their
encapsulation information determines translations, conversions, etc. that
may be necessary to support the retrieval from non-GeoHarness
repositories. We are planning to support federations of possibly
heterogeneous catalog servers later in this project.
Current Work
We are now working on scaling-up the current in-memory implementation of
GeoHarness by using a commercial OODBMS. Later, we hope to support to support
using geoHarness with a public OODBMS (e.g., SHORE
[2]). We would also like to consider using
the LOREL [5] query language
once it is fully implemented and available to the public.
[1]
K. Buehler (ed.). "The Open Geodata Interoperability Specification,"
Draft Base Document - OGIS Project Document 94-025R2, December 22, 1994.
[2]
M. Carey, D. DeWitt, J. Naughton, M. Solomon, et. al.
"Shoring Up Persistent Applications",
Proceedings of the 1994 ACM SIGMOD
Conference, Minneapolis, MN, May 1994.
[3]
"Content Standards for Digital Geospatial Metadata", Federal
Geographic Data Committee, June 1994.
[4]
A. Cornelio and G. Thomas.
"GeoScope Requirements", Bellcore
Internal Document, June 1995.
[5]
D. Quass, A. Rajaraman, Y. Sagiv, J.D. Ullman, and J. Widom.
"Querying semistructured heterogeneous information",
Computer Science Department internal memorandum, Stanford University,
November 1994.
[6]
L. Shklar, S. Thatte, H. Marcus, and A. Sheth,
"The InfoHarness Information Integration Platform",
Proceedings of the Second International WWW Conference `94, Chicago, IL,
October 17-20, 1994, pp. 809-819.
[7]
L. Shklar, A. Sheth, V. Kashyap, and K. Shah, "InfoHarness:
Use of Automatically Generated Metadata for Search and Retrieval of
Heterogeneous Information", Lecture Notes in Computer Science,
932, 1995, pp. 217-230.
Footnotes
-
*
- This work is a part of the GeoScope project and is
supported by the NASA Cooperative Agreement NCC5-102.
- TM
- InfoHarness is a trademark of Bell Communications Research, Inc.