Title |
Code |
Notes |
Count-Min Sketch |
countmin.h countmin.c |
Count-Min sketches (Cormode,
Muthukrishnan '04). They support: Finding frequent items, Returning point estimates, Approximating Inner-products Finding quantiles |
Group testing |
cgt.h cgt.c |
Described in What's Hot and
What's Not: Tracking Frequent Items Dynamically, PODS 2003. They support: finding frequent items. |
CCFC Sketch |
ccfc.h ccfc.c |
Count sketches from Charikar,
Chen, Farach-Colton '02. They support: finding frequent items, returning point estimates |
AMS Sketch |
ams.h ams.c |
"Tug of War" sketches due to
Alon, Matias and Szegedy 96, and
Alon, Gibbons, Matias and Szegedy, 99. Some hashing tricks used for faster implementation. They support: Returning point estimates, Approximating Inner-products, Estimating the L2 norm of a vector |
Stable distributions and sketches |
stable.h stable.c |
Sketches based on stable
distributions.
Based on Indyk'00, Cormode, Indyk, Koudas, Muthukrishnan '02, and Cormode, Data, Indyk, Muthukrishnan, '02. They support estimating the L2 norm of a vector, estimating the L1 norm of a vector, estimating the L0 norm of a vector, estimating arbitrary Lp norms for 0 < p < 2 |
Random number generators |
prng.h prng.c |
|
Common routines |
massdal.h massdal.c |
|
Change (deltoid) detection |
change.h change.c |
Implements the change detection
group testing for absolute, relative,
and variational changes
(Cormode-Muthukrishnan, Infocom 2004). See also the sample file changewrapper.c for examples of usage. |
Frequent items |
frequent.h frequent.c |
Algorithm Frequent for insert only streams. They are derived from code used in the paper What's Hot and What's Not: Tracking Frequent Items Dynamically, PODS 2003. |
Lossy Counting |
lossycount.h lossycount.c |
Lossy Counting algorithm due to Manku and Motwani |
hotitems.c teststab.c changewrapper.c makefile |
Each of the .h files exports
various methods: Init -- Initialize the data structure. Takes some parameters defining the accuracy and quality of the results. Returns a pointer to an object which must be supplied to the following routines. Update -- Takes an item identifier. Some routines take an additional parameter of the count to add on for this arrival (positive or negative increment). Others interpret a positive count as an arrival, a negative count as a departure. Output -- Takes an integer threshold, returns a list of items which should exceed this threshold. Size -- Returns the space used by the data structure Destroy -- Destroys the data structure and frees the associated memory. Example usage of the above routines, finding frequent items on a stream drawn from zipf distributions. Example usage of the above routines to estimate the L1, L2, and L0 norms of data streams. Example usage of the change.c library to find items with large changes. makefile A makefile for the above code |
|
Code BUNDLE |
massdalsketches.zip | All the above .c and .h files together in one easy to download .zip file. |
Title |
Code |
Notes |
Lossy Counting |
Lossy-Counting-Java.zip |
This works for text streams.
README.txt file explains usage. |