![]()
Maintained by web@cs.rutgers.edu |
Rutgers University DCIS Qualifying Exam Date: Monday April 12th, 2004 Time: 1:30 P.M. Location: CoRE Building room 301, Busch Campus, Rutgers University
Abstract: Many imaging systems seek a good interpretation of the scene presented --- i.e., a (perhaps maximally) plausible mapping from aspects of the scene to real-world objects. This work addresses the issue of how to find such likely mappings efficiently. In general, an "interpretation policy" specifies when to apply which "imaging operators", which range from low-level edge-detectors and region-growers through high-level token-combination-rules and expectation-driven object-detectors. Given the costs of these operators, and the distribution of possible scenes, we can determine both the expected cost, and the expected accuracy, of any such policy. Our task is to find a maximally effsctive policy, with sufficient accuracy whose cost is minimal. We show, in particular, that policies whose operators maximize information gain (per unit cost) work more effectively than policies that simply try to establish the putative most-likely interpretation. Secondly, we argue that "myopic" policies produce inferior overall results, compared to "non-myopic" policies. All myopic policies, including those that choose operators with maximum information gain per unit cost, rank the operators based only on their immediate rewards. This can lead to inferior overall results: it may be better to use a relatively expensive operator first, if that operator provides information that will significantly reduce the cost of the subsequent operators. This suggests using some lookahead process to compute the quality for operators non-myopically. Unfortunately, this is prohibitively expensive for most domains, especially for domains that have a large number of complex states (like face recognition). We therefore use ideas from reinforcement learning to compute the "utility" of each operator sequence. In particular, our system first uses dynamic programming, over abstract simplifications of interpretation states, to precompute the utility of each relevant sequence. It does this off-line, over a training sample of images. At run time, our interpretation system uses these estimates to decide when to use which imaging operator. We support our claims with many experimental results in various domains, including that of face recogition.
|