Pre-Defense
8/3/2009 10:30 am
CoRE 301

Unified Structure and Content Search for Personal Information Management System

Wei Wang, Rutgers

Defense Committee: Amelie Marian (chair), Thu D. Nguyen, Richard Martin, Divesh Srivastava (external member from AT&T Research Labs)

Abstract

The ability to quickly retrieve files in personal information systems is becoming increasingly important as users store and collect ever larger amounts of data. This explosion of information has led to a need for complex search tools to access often very heterogeneous data in a simple and efficient manner. Such tools should provide both flexible high-quality scoring mechanisms and efficient query processing capabilities.

In this work, we first focus on efficient algorithms to identify the most relevant files that match multi-dimensional queries, with a focus in the (directory) structure dimension. To this end, we implemented flexible indexes and designed efficient traversal algorithms to identify the most relevant candidate answers. We then present efficient algorithms that unify the external directory structure and internal file structure. The algorithms identify the most relevant files that match structural queries containing both the structure and content components. We perform a thorough experimental evaluation of our file search techniques and show that our query processing strategies exhibit good search behavior, resulting in good overall query performance.

Print Login