![]() |
Mine over Data: |
|
As anyone who regularly uses computers knows all too well, the
incredible growth over the last few years of information available
from online sources has been both a blessing and a curse. Although we
now have vast amounts of information cheaply available at our
fingertips, the sheer quantity of the information has made exploring
and comprehending this information very difficult and
time-consuming. The development of tools to help people discover
trends, patterns and rules in large quantities of data is crucial to
harnessing the immense amount of online information that is now
available to us. This is the motivation behind the Rutgers Data Mining Project. Led by Computer Science Department chairman Tomasz Imielinski and fellow Computer Science Department faculty member Haym Hirsh, the project is striving to do for knowledge discovery applications what the development of relational databases did to business applications over twenty years ago. Twenty years ago database application developers had to create a new system from scratch for each new database need. The advent of relational databases and APIs (Application Programming Interfaces) made it possible to build new database applications quickly in a clean, well-defined, high-level fashion. The Rutgers Data Mining Project is taking a similar approach for data mining. The general data-mining problem is to take large collections of data and find patterns, trends, anomalies, and other interesting information implicit in the data. Until recently, each new data mining application -- even if over a single database -- would be a stand-alone system created from scratch, providing "black-box", closed solutions with poor extensibility. The Rutgers Data Mining Project is creating a second generation of data-mining tools, where users perform data mining activities by specifying high-level information requests in a data-mining query language, analogous to the query-language approach to database access made possible by the advent of the relational database model twenty years ago. Further, data-mining system developers are provided with data-mining APIs to help them quickly build new applications in a clean, high-level fashion. Finally, a management facility for discovered information -- which can sometimes be as overwhelming as the data itself -- enables effective exploration of the results of data mining, as well as their use in helping define subsequent data mining requests. The result is an approach to data mining centered on a user's higher-level data-mining needs and the speedy development of tools that achieve them. For those with access to World Wide Web, further information on the Rutgers Data Mining Project can be found at http://d-major.rutgers.edu |
From the Chairman
|