CAREER: Relaxed Content and Structure Queries over Heterogeneous Data (NSF-IIS 0844935)

The traditional separation between the structure-only Database world and the text-only Information Retrieval world is fading. Databases now routinely include text components while documents are being augmented with structural information. The goal of this project is to design novel techniques and develop tools to efficiently query and retrieve relevant information in a heterogeneous data environment where flexibility in conditions on both the content and the structure of the data is desirable.

The first main contribution of the project will be the design of quality scoring mechanisms that unify content and structure score in an integrated fashion. The scoring techniques will take into account the similarity between the query and the answer to assign scores.

The second main contribution of the project will be the development of heterogeneous data index structures and query processing algorithms to efficiently identify exact and approximate query answers. The work resulting from this project will be evaluated through an in-depth study of the impact of the scoring strategies on answer quality and performance experiments on the query processing techniques.

The results of this project will enable users to identify the data that best fits their needs, in a variety of heterogeneous data environments, without requiring some preexisting knowledge of the underlying data schema or content. This project integrates research and education through curriculum development, student advising, and outreach to women in Computer Science.