Abstract
In many applications, large volumes of time-sensitive
textual information require triage: rapid, approximate prioritization
for subsequent action. In this paper, we explore the use of
prospective indications of the importance of a
time-sensitive document, for the purpose of producing better document
filtering or ranking. By prospective, we mean importance that
could be assessed by actions that occur in the future. For example, a
news story may be assessed (retrospectively) as being important,
based on events that occurred after the story appeared, such as a
stock-price plummeting or the issuance of many follow-up stories. If
a system could anticipate (prospectively) such occurrences, it could
provide a timely indication of importance. Clearly, perfect
prescience is impossible. However, sometimes there is sufficient
correlation between the content of an information item and the events
that occur subsequently. We describe a process for creating and
evaluating approximate information-triage procedures that are based on
prospective indications. Unlike many information-retrieval
applications for which document labeling is a laborious, manual
process, for many prospective criteria it is possible to build very
large, labeled, training corpora automatically. Such corpora can be
used to train text classification procedures that will predict the
(prospective) importance of each document. This paper illustrates the
process with two case studies, demonstrating the ability to predict
whether the stock price of one or more companies mentioned in a news
story will move significantly following the appearance of that story.
We conclude by discussing that the comprehensibility of the learned
classifiers can be critical to success.