Minimal Spanning Trees: the Nearest Neighbor Method and Shortest Path Trees

Many problems in computer science, engineering, operations research, economics, etc. can be modeled by weighted graphs, where the weights on the edges of the graph represent some cost or distance metric in connection with the problem that is being modeled. If we have a graph (undirected) with positive (say integer) weights associated with the edges, it is often of great importance to find a spanning tree for the graph such that the sum of the weights on the edges of the spanning tree is as small as possible. There are two main approaches to finding a minimal-sum-of-weights spanning tree (minimal, rather than minimum, since there may be several trees with the same smallest sum of weights): we will examine the nearest-neighbor method, which is closely related to the technique for finding a shortest path in a weighted graph.

 We'll generate a spanning tree by accumulating a set of connected edges, in such a way as to cause the sum of the edge weights to be as small as possible (for a given weighted graph). At any point in the growth of the tree, we will divide the vertices in the graph into 3 groups; those already in the tree, those in the "fringe" - there is an edge between some vertex in the tree and a vertex in the fringe, and those vertices that are "unseen" - not in the tree nor in the fringe. Vertices in the fringe contribute to a list of "candidate edges"; a list of those edges for which each fringe vertex is closest (least weight) to the tree. The candidate edge list is scanned, and the one with least weight is removed from the list and added to the (growing) tree. This brings a new vertex into the tree (from the fringe set) , and also adds vertices to the fringe set. (There are now new vertices - previously "unseen" - which are now just an edge away from some vertex in the tree.) Furthermore, the candidate edge list must be updated; are any of the vertices in the fringe "closer" to the tree than they were before as a result of the inclusion of the newest edge - and vertex? Actually, a little thought reveals that we only have to look at the fringe neighbors of the newest tree vertex to see if they are closer to it than the current candidate edge that links them to the tree.

 If the original graph is connected, then a minimal-weight spanning tree is produced. If not, the algorithm will discover this, because the fringe set will empty out, but not enough edges will have been absorbed into the tree. (For a graph to connect n vertices, at least n-1 edges are required. Of course, the spanning tree - which has no cycles (no unnecessary edges) - will require exactly n-1 edges.)

 Thus the essence of the algorithm is the following:

initialize the vertices so that all are unseen; initialize the edge count (of the spanning tree) to 0; choose a vertex (say the first in the vertex list) and incorporate it into the tree; initialize the fringe list and the list of candidate edges and weights. as long as the fringe list is not empty, do the following: scan the fringe list and select the least weight candidate edge, and absorb the vertex at the other end into the tree; add one to the edge count; scan the neighbors of the newly included vertex and update their status; this may add vertices to the fringe list, and may also change some of the candidate edges; if the final edge count is sufficient, then (say) print the edges of the tree, and the sum-of-edge-weights; else the graph is not connected.

The vertex-array edge-list representation is very appropriate for this problem. The edge records will have a weight field, of course, and the vertex records will carry additional fields to facilitate the computations that must be carried out at each step of the algorithm. For each vertex, there should be a status field (tree, fringe, or unseen are the three possible values), a field for the nearest neighbor in the tree (for fringe vertices), and the weight on that edge.

Notice that for every edge (vertex) brought into the tree, the algorithm has to look thru the fringe vertices to find the least-weight edge among the candidates. For a (connected) graph with n vertices, this task will be performed n-1 times. One obvious way to do this each time is to scan the vertex list, and whenever a vertex is in the fringe, examine the weight on its candidate edge. Of course, this simple scan takes n steps each time, so we have introduced an overhead into the problem proportional to n2 !! If the graph in question is dense with edges, then the fringe list will be large most of the time, and we can't avoid this worst-case cost. However, in most cases, the fringe lists will be small, and it is important to devise a simple scheme to avoid scanning the vertex list when we only want to examine the fringe vertices. (This is not unlike the motivation which argues that the adjacency list structure representation of a graph is cheaper to use - for most purposes - than the adjacency matrix.) Can you think of some simple ways to do this?
 

Shortest Path Tree

If we have a graph (undirected) with positive (say integer) weights associated with the edges, it is often of great importance to find the least-cost path between some pair of vertices in the tree. In fact, it is just as easy to find the cost of the least-cost paths (and the paths) between a given source vertex - and all other vertices in the graph: the so-called shortest-path tree, (SPT).

 The algorithm we can use is the nearest-neighbor MST algorithm with a simple modification: instead of labelling the fringe vertices with their distance from the tree (nearest neighbor vertex in the tree), each fringe vertex f is labelled with d(nn) + w(nn,f), where d(nn) is the least distance from vertex nn to the root (source) vertex of the tree, and w(nn,f) is the weight of the edge from nn to vertex f. The label on vertex nn (d(nn)) is determined when that vertex is brought into the tree from the fringe: at that point, there can be no (strictly) shorter path from source vertex to nn. Thus, each vertex record contains a pointer to its neighbor list, its status (T,F,U), its distance from the source vertex (a "final" distance for T vertices, a current (but one that might change) value for F vertices, and infinity for U vertices), and the identity of its nearest neighbor in the tree. When a fringe vertex is "treed", its (last) nearest neighbor becomes its predecessor in the shortest path from source to the new tree vertex. When the algorithm completes (the fringe set is empty), a simple recursive algorithm can reconstruct the shortest path from source vertex s to target vertex t. The algorithm would output the path (s,x)(x,y),......(z,t) of edges from s to t which define the shortest path.

 The nearest-neighbor MST algorithm is associated with Prim: the SPT with Dijkstra.