15-418/618 Final Project Proposal

Parallelizing Dinic's Max-Flow Algorithm

Connor Mowry (cmowry), Wenqi Deng (wenqid)

Summary

We will parallelize Dinic's max-flow algorithm in its BFS and DFS phases using both OpenMP and MPI.

Background

Dinic's algorithm is a well-known method for solving the maximum flow problem, similar to Ford-Fulkerson in that it finds the optimal solution using augmenting paths. However, Dinic's approach is more efficient because it uses a two-phase process. First, it performs a breadth-first search (BFS) to construct a layer graph, which organizes vertices based on their distance from the source. Then, it uses a depth-first search (DFS) on this graph to compute a blocking flow---a collection of augmenting paths such that no more flow can be sent along any shortest path in the current layer graph without exceeding the edge capacities.

The core advantage of Dinic's algorithm is that the BFS phase ensures only the shortest paths from the source to the sink are explored during the DFS phase. This optimization allows each DFS to run in amortized \(O(n)\) time because each dead-end edge is traversed only once per blocking flow. Aside from these dead-end edges, each DFS only explores \(O(n)\) edges. As a result, this significantly improves the overall efficiency of the algorithm by reducing redundant explorations.

Our project focuses on further accelerating Dinic's algorithm by parallelizing two main phases. In the BFS phase, each layer of the graph can be constructed in parallel by exploring all neighboring vertices simultaneously. In the DFS phase for blocking flow computation, multiple potential augmenting paths can be explored in parallel. The challenge in the DFS phase is ensuring that, when a path reaches the sink and updates the capacities of its edges, other paths correctly respect these updated capacities to maintain flow feasibility.

We plan to implement Dinic's algorithm using both OpenMP and MPI. In the shared memory model with OpenMP, updating capacities is more straightforward since all threads have access to shared data. However, implementing this in MPI, where processes do not share memory, presents additional challenges in synchronizing capacity updates across distributed processes.

The Challenge

One of the main challenges of our project stems from the inherent difficulty of parallelizing sequential algorithms like BFS and DFS, where the exploration of future paths depends on the outcomes of previously explored paths. To maximize performance gains, we will need to carefully manage synchronization to minimize contention among threads.

Additionally, because the algorithm continually updates edge capacities during the DFS phase, we must derive paths based on the most current state of the graph and promptly terminate searches that reach dead ends. This becomes particularly challenging in the MPI implementation, where efficiently communicating the updated state of the graph across multiple processors is crucial to maintaining consistency.

Resources

Currently, we do not plan to use any existing codebases for the core implementation of our project. We will consult pseudocode and lecture notes from 15-451 [2], [3] as the primary guide for implementing the sequential version of Dinic's algorithm, which we will then parallelize. Additionally, we may refer to research papers on parallel flow algorithms for optimization strategies, such as the report by Jonas Hafermas and Zoya Masih [1], which discusses techniques for parallelizing the BFS phase using MPI.

For testing and benchmarking our implementation, we will use a combination of synthetic and real-world datasets:

Goals and Deliverables

Our primary goal is to parallelize both the BFS and DFS phases of Dinic's algorithm using OpenMP for shared memory and MPI for distributed memory systems. The project will focus on implementing efficient parallel strategies for each phase, optimizing for both performance and scalability.

Planned Achievements

Stretch Goals

Demo for Poster Session

For the poster session, we plan to present:

By the end of the project, our goal is to demonstrate that parallelizing both phases of Dinic's algorithm can significantly reduce computation time and efficiently scale across both shared-memory and distributed-memory systems.

Platform Choice

For our initial implementation and testing, we will use the GHC machines available at CMU, which provide a shared-memory environment ideal for developing and optimizing our OpenMP implementation. These machines will allow us to experiment with multi-threaded performance and gather detailed metrics such as cache misses, memory bandwidth utilization, and synchronization overhead.

Once we have a stable OpenMP version, we will conduct further testing on the Pittsburgh Supercomputing Center (PSC) machines. The PSC systems offer a distributed-memory environment with high-core-count nodes, making them well-suited for evaluating the scalability and performance of our MPI implementation. This setup will enable us to analyze how the algorithm scales with increasing numbers of processors and to optimize inter-process communication.

If we meet our stretch goals, we plan to utilize the GPUs available on some of the GHC machines to explore further optimizations using CUDA. This will allow us to compare the performance of our CPU-based parallel implementations with GPU acceleration.

Schedule

Week 1 (Nov 10 - Nov 16)

Week 2 (Nov 17 - Nov 23)

Week 3 (Nov 24 - Nov 30)

Week 4 (Dec 1 - Dec 7)

Week 5 (Dec 8 - Dec 15)

Stretch Goal (If Time Permits)

References

  1. Jonas Hafermas and Zoya Masih, Parallel BFS with MPI
  2. Lecture Notes on Network Flow (Part 1), CMU 15-451 Lecture 11
  3. Lecture Notes on Network Flow (Part 2), CMU 15-451 Lecture 12