|Count-Min sketches (Cormode,
Muthukrishnan '04). They support:
Finding frequent items, Returning point estimates, Approximating Inner-products
|Described in What's Hot and
What's Not: Tracking Frequent Items Dynamically,
PODS 2003. They support: finding frequent items.
|Count sketches from Charikar,
Chen, Farach-Colton '02. They support: finding
frequent items, returning point estimates
|"Tug of War" sketches due to
Alon, Matias and Szegedy 96, and
Matias and Szegedy, 99. Some hashing tricks used for faster implementation.
They support: Returning point estimates, Approximating Inner-products, Estimating
the L2 norm of a vector
|Sketches based on stable
Based on Indyk'00, Cormode, Indyk,
Koudas, Muthukrishnan '02, and Cormode, Data, Indyk, Muthukrishnan, '02.
They support estimating the L2 norm of a vector, estimating the L1 norm of a
vector, estimating the L0 norm of a vector, estimating arbitrary Lp norms for 0 < p < 2
|Implements the change detection
group testing for absolute, relative,
and variational changes
(Cormode-Muthukrishnan, Infocom 2004).
See also the sample file changewrapper.c for examples of usage.
|Algorithm Frequent for insert only streams. They are derived from code used in the paper What's Hot and What's Not: Tracking Frequent Items Dynamically, PODS 2003.|
|Lossy Counting algorithm due to Manku and Motwani|
|Each of the .h files exports
Init -- Initialize the data structure. Takes some parameters defining the accuracy and quality of the results. Returns a pointer to an object which must be supplied to the following routines.
Update -- Takes an item identifier. Some routines take an additional parameter of the count to add on for this arrival (positive or negative increment). Others interpret a positive count as an arrival, a negative count as a departure.
Output -- Takes an integer threshold, returns a list of items which should exceed this threshold.
Size -- Returns the space used by the data structure
Destroy -- Destroys the data structure and frees the associated memory.
Example usage of the above routines, finding frequent items on a stream drawn from zipf distributions.
Example usage of the above routines to estimate the L1, L2, and L0 norms of data streams.
Example usage of the change.c library to find items with large changes.
makefile A makefile for the above code
||massdalsketches.zip||All the above .c and .h files together in one easy to download .zip file.|
||This works for text streams.
README.txt file explains usage.