PhD Defense
8/14/2009 10:30 am
CoRE 301

Wayfinder: A Federated Information Sharing and Management System

Christopher Peery, Rutgers

Defense Committee: Dr. Thu Nguyen (Chair), Dr. Richard Martin, Dr. Amelie Marian, Dr. Sandhya Dwarkadas (external member from University of Rochester)

Abstract

The computing environment of a user consists of the set resources and information which are employed to accomplish various computing task. The traditional view of these environments, which consists of single devices operating in isolation, has been altered by several on-going trends. Decreasing costs of computing devices, increased connectivity, and improved performance are allowing users to distribute and access information across many devices. Additionally, groups of users are increasingly able to work collaboratively and develop complex sharing patterns with one another.  Unfortunately, while these trends are significantly enriching the user's computing experience, they are also increasing the data management overhead as users must explicitly reason about data placement and replication across multiple devices and logical sharing groups.

In this thesis, we will present the Wayfinder file system, which was designed to simplify the management of information in federate systems.  To address this goal, we focus on three critical management deficiencies present in most current federated environments: 1) the lack of a consistent view for stored information across devices - the same information is potentially named differently across devices 2) the required manual management of replicated information - users are forced to continuously reason about information and its location to ensure its availability and 3) limited search/ranking capability - most existing search/ranking tools rely on content-centric queries, without considering the large amount of structure and meta-data stored within file systems to improve the ranking of relevant results.

The Wayfinder file system addresses these deficiencies by providing three synergistic abstractions: 1) a global namespace that is uniformly accessible across connected and disconnected operation, 2) user-centric automatic availability management to ensure continuous access to information based on data-centric availability policies, and 3) a multi-dimensional fuzzy search framework that significantly improves relevance ranking.  We will show that these abstractions simplify the management burden by requiring users to reason only about the data and its properties while ignoring the underlying physical complexities of the system.

Underlying all three abstractions is a common implementation layer that adheres to the following three principles. First, any subset of nodes in a Wayfinder community can interact normally when they are interconnected, regardless of the membership of the subset; that is, the absence of specific nodes may limit the amount of accessible data but will never hinder the interaction (data sharing) of a set of connected nodes. Second, all protocols and interactions are tolerant of a weakly consistent model allowing them to suffer unexpected devices departures. Third, devices are assumed to be owned by specific users and so should prioritize the needs of their owners in the presence of resource constraints, while using excess resources to benefit the community as a whole (e.g., increase data availability).

Print Login