We can represent a heap-structure as an array, simply locating the ith node in the heap-structure sequence in the ith cell of the array. For example, the heap-structure below is represented by the array COMPUTERS (with the C at index 1, and the S at index 9: we reserve index 0 for reasons that will become clear below.)
C
/ \
O M
/ \ / \
P U T E
/ \
R S
Notice that the links between parents and children can be captured with simple
arithmetic on the array indices. That is, the children of the ith node are
in positions 2i and 2i+1 (provided that those indices are not outside the array
bounds). Similarly, the parent of node i is at location i/2 (integer
arithmetic). Thus, all the information in the tree above is contained in the
array. (Why don't we represent all binary trees with this array
construction?)We define a heap as a heap-structure with keys at each node (say single character keys, for simplicity) where the key at any node is no smaller than the keys of either of its children. In a heap, the largest key must therefore be at the root - it can't be anywhere else, otherwise the heap property is not satisfied. We'll discuss simple ways to build heaps (by repeated insertion of single keys into larger and larger heaps) and to remove the largest element - and restore the heap. A heap functions as a priority queue: you can insert new elements, and remove - the element with the largest priority.
A simple (though not the most efficient) way to build the heap is - beginning with a heap of size one (A[1]), we correctly arrange the first two elements in the array so that they form a heap. Then we include (insert) the third element, etc., until all the keys have been rearranged so that the final array represents not only a heap-structure, but a heap. To add the ith key to the heap A[1...i-1], we must compare it with its parent (at location i/2), swapping if the parent key is smaller. If so, we then compare the key (now at position i/2) with its parent (at i/4), swapping if necessary. Finally, the new insertion will find its place, possibly at the root of the heap (index 1). We can facilitate the code for this "upward movement" by first placing a copy of the ith key in position 0. And, this process is carried out for i = 2 to i = n.
char[] A = new char[n+1]; // n elements in cells 1, 2, ..., n
for(int last = 2; last <= n; last++){
int x = last;
A[0] = A[x]; // make a copy for simple loop termination
while(A[x] > A[x/2]) {
swap A[x] with A[x/2];
x = x/2; // integer arithmetic
}
}
Applying this process to the heap-structure above, we generate the heap below:
U
/ \
S T
/ \ / \
R O M E
/ \
C P
A way to verify that we indeed have a heap is to trace paths from any leaf to
the root: the keys must appear in increasing order. In this case, we have
CRSU, PRSU, OSU, MTU, and ETU, so the heap property is satisfied. Suppose we have a heap represented by the n elements (1,2, ..., n) of the array: how much computational effort do we expend to install the next element? More generally, how much effort is required to insert a new element into a heap of n elements? We observe that the element with index x is at level lgx (truncated to an integer) in the binary heap which the array represents. Thus, a heap of n elements is about lgn high. An insertion is accomplished by propagating the new element up a path (from the lowest, rightmost leaf position) towards the root: at worst, the new element will go all the way to the root, and we expend effort proportional to lgn, the (approximate) height of the tree.
To build the entire heap "from scratch" (to convert the array representing a heap structure to an array representing a heap), the total effort is no worse than
WCbh < lg2 + lg3 + ....... + lgn = lgn!
n! is well approximated by nne-n, so we
can say that
WCbh < nlgn - nlge = nlgn - 1.4n < nlgn
We'll learn a (slightly) more efficient method for heap-building soon. Notice how well the heap (a priority queue) supports an application like sorting an array of elements. After building the heap (rearranging the array elements so that a heap is represented), we can perform the simple algorithm below:
/* heap in place in indices 1 through n */
for(int last = n; last > 0;) {
swap A[last--] with A[1];
restoreHeap(1, last);
}
where restoreHeap is a function (class method, say) which "pushes"
the top element down some path until it becomes the "king" of its own little
heap. That is, we compare the element in position 1 with the largest of its
two children, swap them if the child is larger, and continue this process
(iteratively/recursively) until the element finds a place in the binary heap
where it is larger than its largest child. Recall that the children of the
element at position x are at 2x and 2x+1 - provided that
those indices aren't greater than the index last. At worst, the item
at the root will "descend" to the bottom of the tree - to level
lg(last) (truncated to an integer) . Thus, the cost of heapsort
is the cost of building the heap + the cost of the successive restore
operations:
WCHS < nlgn + 2(lgn + lg(n-1) + ..... + lg2) < 3nlgn
(Notice that two compares are required for every change of level in the restore
phase.) In the big-O notation we say that heapsort is worst-case
O(nlgn). Finally, don't forget that we concentrated only on the operations involving comparisons of keys in this discussion. In practice, the array elements (the heap entries) would look like this (assuming single-character keys):
class heapItem {
char key;
Object data;
}
and the heap-building and restore operations would access the keys with
A[i].key. Furthermore, keys - in general - will not just be single
alphabetic characters where we can use the built-in comparison operators
for simple datatypes. More generally, we would define a Comparable interface
and the heap operations would deal with Comparable objects. The user of the
heap would add Comparable functionality to the heapItem class.There is a slightly more efficient heap-building process which works by joining two heaps at a new root node, pushing the root key down as appropriate (restore) so that a single merged heap is created - containing one more node than the two original heaps. This is an "in-place" algorithm, as distinct from the "incremental" algorithm demonstrated above.
Suppose we have a list of n (unordered) keys stored in a contiguous array. Let's map the array to a heap-structure, and - from that perspective - discover how we can systematically rearrange the keys so that the (max) heap property is obtained. To do this, we apply recursively a very simple idea: if two subtrees are both (max) heaps, and we join them to a root, then all we must do to convert the new (single) tree to a heap is to "push the root down" - if necessary, until it is the "king of its own heap" - subtree. That is, we compare the root key with the largest key of its two children, and if the root key is smaller, we exchange it with the largest child. Then we repeat this task at the subtree where the root is now positioned, and continue until no more exchanges are necessary, or the root has migrated down to lowest level and become a leaf. Notice that this "migration" only involves pushing the root down some path in the tree. We pursue this idea from the bottom of the tree up to the root.
Certainly, every leaf node is a heap containing one node, since the leaves have no children. What about the parent with the highest index in the heap-structure? (The first parent node encountered as you move backwards - right to left - starting at the rightmost leaf.) Since the tree has n nodes, the index of this parent is n/2. We adjust the tree of which it is the (possibly temporary) root, and build a (small) heap. And this process iterates backwards thru the indices n/2-1, n/2-2,......, 2, 1. At each step, the two subtrees of the current node are heaps, we are rooting them at this node, and locating the root in one of its subtrees until it comes to rest - at which point the tree rooted at that original index is now a heap. A heap-structure with n nodes will have a height no greater than about lgn. Also, the node with index 2k will be at level k, as will the nodes with indices between 2k and 2(k+1)-1, since every level (except possibly the last) is complete. So when we are positioning the node with index x at level floor(lgx), the worst that might happen in terms of computation effort is that the key at x will find its way down to the lowest level of the tree: floor(lgn). With 2 comparisons required every time the key moves down one level, we have - at worst - 2(floor(lgn) - floor(lgx)) comparisons, or approximately 2lg(n/x) comparisons. And since x takes values from n/2 down to 1, we see that the worst-case cost of building the heap is approximately :
nlgn - 2lg[(n/2)!]
Using the approximation m! = mme-m for "large" m, so
that lgm! is approximately mlgm - mlge, the expression above becomes
nlgn - 2[(n/2)lg(n/2) - (n/2)lge] = n + nlge
or about 2.4n
since lge is approximately 1.4. That is, the worst-case cost of converting an
unordered array of n items into a heap is proportional to n (!!) What is the
best-case cost?