Parallel Minimum Spanning Tree
I plan to write multiple parallel implementations of Minimum Spanning Tree, and
compare their performance on large graphs of different structures. Initial
implementations will be for the CPU (likely based on openMP or pthreads), but if
time permits I will also investigate GPU-based algorithms.
A spanning tree of a connected undirected graph G is a subset of n-1 edges of G
that, together with the vertices of G, form a tree. The Minimum Spanning Tree
problem is to, given a connected undirected graph with edge weights, find a
spanning tree of G the sum of whose edge weights is minimal. Minimum Spanning
Tree (hereafter called MST) has a host of applications, from travelling-salesman
approximations to circuit design to utility networks. With this in mind, it's
useful to develop MST algorithms that scale well to the amount of concurrency
available on modern-day machines.
Essentially all implementations of MST work because of a theorem known as the
Light Edge Theorem
: partition the vertices of any graph with a minimum
spanning tree into two nonempty subsets, and one of the edges of minimum weight
among those that go between the two subsets
must be in the minimum
spanning tree. This theorem implies a few easy sequential algorithms, but it
also hints at parallel ones as well. In particular, it suggests that edges from
different parts of the graph could be selected for inclusion in the mst in
The main challenges I can foresee are:
- Dividing the work into big enough chunks to utilize the given processors. How to
do this isn't obvious, since it's often not clear what edges will be in the mst
until others have already been added, but I know one way to do it from an
eariler class, and hope to think of others.
- If applicable, merging information about the added edges generated by the
different processors together. One implementation of MST uses a union-find
data structure, and I may need to do something similar but in
- Avoiding contention over the graph structure. Since all the different processors
will be reading the same data structures, having them also frequently updating
those same data structures would cause a lot of cache misses/coherency
I will be working on the Gates machines, starting from the graph I/O code we
used in assignment 3. I know of various sequential implementations of MST, as
well as one parallel one, from an earlier CS class (15-210), but will not be
using a reference implementation or research papers. I've done research with
professor Guy Blelloch in the past, and if the project goes well I may ask him
for permission to run my code on the 40-core machine he uses for his research.
I plan to achieve the following:
- A fast sequential MST implementation for the CPU.
- At least two parallel MST implementations for the CPU with fundamentally
- Meaningful comparisons of these approaches on different types of graphs, for
example random, planar, and real-world.
- Speedup of at least n/2 on n cores for small n, compared to the parallel
I hope to achieve the following:
- Performance meeting or beating Professor Guy Blelloch's solution here for the Gates machines.
- Performance that scales well to high numbers of cores (at least 32).
- An efficient GPU implementation of MST, with performance comparing well to
the CPU implementation.
I'll be working with straight C++ and pthreads on the CPU, possibly with openMP
at some point. I believe this is the best choice of platform and libraries
because I don't yet know what algorithms will work well, and I don't want to
restrict myself to a specific paradigm until then.
||What We Plan To Do
|Apr 1-7|| Write/compare sequential implementations|
|Apr 8-14|| Write first parallel implementation|
|Apr 15-21|| Write second parallel implementation|
|Apr 22-28|| Write third parallel implementation|
|Apr 29-May 5|| Write GPU implementation|
|May 6-11|| Schedule padding|