Approximate Pattern Matching - the Hamming Distance Case

Amihood Amir, AT&T Labs - Research (on leave from Bar-Ilan University)

January 29, 4:30 PM, Rutgers Univ. CORE 431

Abstract.

Many different meanings can be assigned to the words "approximation" of pattern matching. One possibility is that of allowing "local" errors.

Intuitively, we group under the label "local" errors that take place in a bounded location, as opposed to changes that permeate the entire data (e.g. scaling, rotation).

Specifically, consider the most limited local error -- the mismatch. This error occurs in a single symbol and effects only its location. In contrast, insertions and deletions have a global effect, although the error itself is confined to a single location.

We discuss known techniques and upper bounds for matching with the mismatch local error. We also present a new algorithm that finds in a length-n text all locations where a length-m pattern matches with at most k mismatches, in time O(n sqrt(k log k).

The talk uses the approximate Hamming distance problem only as an excuse for a quick review of some main techniques in the field. Some of these may come in handy even if the listener is not (heaven forbid) a researcher in pattern matching.

(The new result is joint with Moshe Lewenstein and Ely Porat.)



Back to Discrete Math/Theory of Computing seminar