next up previous
Next: About this document ...


Title Slide

B+-trees









Multi-level file indexing


Why bother with indices?


Tree-structured Indices


B+ Tree Indices

A B+ Tree:


B+ Tree Nodes

Nodes in a B+ Tree:


Valid B+ Trees

The following non-recursive conditions must hold on a valid B+ tree:

1.
Each node may have at most n pointers, and must have at least ceiling(n / 2) pointers.

2.
The keys in each node must be in increasing order from the left (i.e. no holes).

3.
There must be a non-null left and right pointer for each key.

4.
The leaves must be chained in sequence by the rightmost pointer (with the exception of the rightmost leaf).


Valid B+ Trees

The following recursive conditions must hold on a valid B+tree:

1.
It must be a fully-connected, acyclic tree.

2.
The length of every path from the root to a leaf node must be the same.

3.
Each subtree rooted in a pointer Pi between two keys Ki-1and Ki must contain only keys that are greater than or equal to Ki-1 and strictly less than Ki. For pointers on the ends, only the appropriate half of the condition applies.


The B+ Tree Algorithm

Node definition:

#define ORDER 5     // change for any order > 2

typedef struct {
   int keys[ORDER-1];
   Node *pointers[ORDER];
   int num_ptrs;
   Node *parent;
   int leafp;
   } Node;


The B+ Tree Algorithm

The Print function:

// The print routine prints a depth-first traversal 
// of the tree
void print (Node *tree)
{
   for (i=0; i<(tree->num_ptrs-1); i++) {
      printf("key[%d]=%d",i,tree->keys[i]);
   }

   if (!tree->leafp) {
      for (i=0; i<tree->num_ptrs; i++) {
      print(tree->pointers[i]);
      }
   }
}


The B+ Tree Algorithm

The Find function:

// The find routine returns the node the key *should* be 
//  in, with a flag indicating whether it is.
Node *find (Node *tree, int key, int *foundp)
{
   if (!tree->leafp) {
      for (i=0; i<(tree->num_ptrs-1); i++) {
         if (key < tree->keys[i]) 
           return find(tree->pointers[i],key,foundp);
         else 
          return 
           find(tree->pointers[tree->num_ptrs-1],
                 key,foundp);
      }
   }
   else {
      foundp = 0;
      for (i=0; i<(tree->num_ptrs-1); i++) {
         if (key == tree->keys[i]) foundp = 1; 
         }
      return tree;
   }
}


The B+ Tree Algorithm

The Validate function:

// The validate routine returns 1 if the subtree is 
//  a valid B+ tree
int validate (Node *tree, int min, int max, int *ndepth)
{
   // Check the non-recursive conditions first:
   //   - ceiling(ORDER/2) <= tree->num_ptrs <= ORDER
   //   - keys in ascending order with non-null left 
   //     and right pointers

   if (!tree->leafp) {
      // Check the recursive conditions:
      for (i=0; i<tree->num_ptrs; i++) {
         if (i == 0) smin = min;
         else        smin = tree->keys[i-1];
         if (i == (tree->num_ptrs-1)) smax = max;
         else        smax = tree->keys[i];

         // Every subtree must be a valid B+ tree
         if (!validate(tree->pointers[i],smin,smax,
                         &sdepth)) return 0;

         // The length of each path from this node to a 
         //   leaf must be the same
         if ((i > 0)  && (lastdepth != sdepth)) 
           return 0;
         lastdepth = sdepth;
      }


The B+ Tree Algorithm

The Validate function (continued):

      // Recursive conditions OK, tree is valid
      *ndepth = lastdepth+1;
      return 1;
   }
   else {
      //   Check that rightmost pointer of leaf 
      //    points to next leaf
      *ndepth = 0;
      return 1;
   }
}


The B+ Tree Algorithm

The Insert function:

// The insert routine inserts a new key,pointer combination into a
//  leaf node

void insert(Node *node, int key, Node *ptr)
{
     // If there is room in the node, insert the key and pointer and
     // return.

     // No room in node: first split the node by creating a new node,
     // and copying the first half the pointers from the old node 
     // to the new node.  Then insert the key, pointer into the
     // appropriate node. 

     // Special case:  Root node splits.
     //   - Create a new node to be the root, and insert pointers
     //     to the original node and the new split node.

     // Now adjust all the pointers:
     //   - Chain the new node in with the other leaf nodes in
     //     sequence.
     //   - Since we have created a new node, we have to insert a new
     //     key and pointer into the parent of the original node.
     //     The key for
     //     the new node in the parent will be the leftmost key value
     //     of the original node, and the pointer will be to the new
     //     node (we always insert a key, right-pointer combination).
     //     Call insert recursively to accomplish the insert.
}

Because the depth of the tree can only increase by splitting the root and adding a new root node, the length of all paths increases equally, ensuring that the tree always remains balanced.


The B+ Tree Algorithm

The Delete function:

// The delete routine deletes a key,pointer combination from a node

void delete (Node *node, int key, Node *ptr)
{
   // Remove the key and pointer from the node, decrement num_ptrs

   // If num_ptrs >= ceiling(ORDER/2) return

   // Special case: If node is root (!tree->parent), and num_ptrs < 2
   //   - remove the root, and make the remaining child the new root


   // Case 1: Borrow a pointer
   //   - Check left and right siblings to see if either has an extra
   //     pointer to spare.
   //   - If so, move the adjacent pointer into node
   //   - Fix up the key in the parent to reflect the change

   // Case 2: Consolidate nodes
   //   - If neither sibling can spare a pointer, choose a sibling to
   //     consolidate with
   //   - Copy all of the keys and pointers into a single node
   //   - Recursively call delete on the parent to remove the key,
   //     pointer that point to the now-empty node.
}

Since the depth of the tree can only be reduced by removing the root node, the length of all paths to the leaves decrease equally, ensuring that the tree always remains balanced.



 
next up previous
Next: About this document ...
Jack Keane
2002-03-30