A balls and bins analysis of insertion and update buffers

Alexander Conway, Dept. of Computer Science

Examination Committee: Prof. Martin Farach-Colton (Chair), Prof. William Steiger, Prof. Shan Muthukrishnan, Prof. Ahmed Elgammal

Abstract

External memory databases are traditionally based on B-trees. Insertion buffers aggregate database insertions, and only flush into the external memory structure itself when full, attempting to batch multiple insertions destined for the same B-tree leaf, and therefore perform those insertions with only one disk access. Insertion buffers are implemented, for example, in the InnoDB storage engine. We develop a balls-and-bins model for insertion buffers and show that they have inherent limits on the amount of optimization they provide. This speedup is a small factor of m/n, where m is the buffer size and n is the number of leaves. Thus in databases with large rows as well as in big data settings, we might expect m/n = O(1), and therefore insertion buffers would only provide a constant improvement.

Go Back to Colloquiua Listing