Proposal | Checkpoint | Final Report

Trobo Proposal

Eric Wong

Jonathan Goldman

Summary

We are going to implement a parallel Tron AI for multi-core CPU platforms. We will use the six-core machines in the clusters to make smarter and faster AIs than the single threaded bots in the Google Tron AI Challenge. .

tron-o.gif

Background

Tron is a two-player game in which players drive around on a grid, leaving behind a wall as they travel. The game ends when one player crashes into a wall, and thus the objective is to outlast your opponent.

In 2010, Google sponsored a Tron AI Challenge that received over 700 entries worldwide. In the challenge, bots are given one second to make each move (i.e. go forward one square, turn left, or turn right). Here is an example match from the challenge: https://www.youtube.com/watch?v=Jyys22xoWDI.

To get a better feel for the game, you can watch two of the top (human) Tron players compete here: https://www.youtube.com/watch?v=kxhkhKrr4vs.

While the rules of the challenge prohibited multithreading and other forms of parallelism, we intend to build our parallel Tron AI (named TroBo™) within their framework, and run it against some of the top-performing bots from the competition. We believe that parallel computation will not only enable our AI to make faster decisions, but will also enable our AI to make smarter decisions by performing a deeper search and analysis of the game tree.

To make decisions, TroBo will use an adaptation of alpha-beta pruning. At the heart of alpha-beta pruning lies the mini-max algorithm. Mini-max works by constructing a tree with vertices representing game states and edges representing possible transitions between states. The root of the tree is the current game state. The second level of the tree has a node for each of the three moves TroBo may make from this state, while the third level of the tree has nodes for each of the three moves the opponent may make. Levels alternate between TroBo’s moves, and the opponent’s moves. Since each player always has three possible moves, the nth­ level of the tree will consist of 3n nodes. Since the tree grows very quickly, the depth of the tree must be capped to some constant (for example, 10 or 15.)

Mini-max relies on a heuristic function, which assigns a score to every node in the tree. The heuristic assigns a value of 100 to nodes in which TroBo has won, a value of -100 to nodes in which the opponent has won, and values between -100 and 100 for games that are still in progress. While refining this heuristic will take a fair amount of thought, one basic idea is to consider the number of unoccupied spaces that TroBo can reach before the opponent.

Faithful to its name, mini-max chooses the move that maximizes the heuristic score of the worst possible outcome for TroBo. For example, if the worst leaf node of the left branch (corresponding to TroBo turning left) has a score of -17, the worst leaf node in the center branch (corresponding to TroBo moving straight) has a score of 12, and the worst leaf node in the right branch (corresponding to TroBo moving right) has a score of 20, TroBo will turn right.

Alpha-beta pruning extends mini-max by cutting off unfavorable branches of the tree. For example, if we know that TroBo is guaranteed a score of at least 5 by taking the left branch, and we encounter a node with value -40 in the center branch, we can stop exploring the center branch.

We believe that the majority of the parallelism of this problem lies in the branching-structure of the game tree. In the mini-max algorithm, each branch can be explored independently. That being said, alpha-beta pruning imposes some synchronization constraints that make the problem somewhat more challenging.

The Challenge

There are several interesting challenges related to parallelizing alpha-beta pruning.

  1. The dependency graph of mini-max is tree-structured, something we have not seen in other applications from class. Depending on the computational complexity of the heuristic function, there is a risk of a high communication-to-computation ratio, since each node potentially needs to communicate with 3 neighbors. There are some positives, however. Since the tree is balanced, divergent execution may not be a problem, and we should be able to take advantage of SIMD. Further, it should be relatively straightforward to achieve good load balance.
  2. Alpha-beta pruning is inherently sequential in a sense. In order to prune a suboptimal branch, we must have first already explored a better branch (if not the optimal branch). If we naively choose to explore all branches in parallel, we may end up exploring paths that would have been pruned in the sequential algorithm. Therefore, we must instead make decisions about how to order the branches, and when to dig into sub branches in parallel .There has been a lot of work in the area of parallel tree pruning—we plan to spend time researching these methods during the first week.

Resources

We will be using the machines in the Gates 3000 cluster as the target platform for the Tron AI.

We will also be using the starter code provided by the Google AI challenge as the framework for running and evaluating our bot. It provides an environment with maps that allows us to test bots against each other. We will also use several publicly available tron bots written for the competition to run against our bot (i.e. a1k0n's Tron bot).

As a reference for the main strategy of the bot, we are following the post-mortems of the top winners of the Google AI challenge. Specifically, we will be following the main strategy of the Tron AI described here.

Finally, to parallelize minimax, we are looking at the ideas in using the GPU in this paper.  

Goals

The main goal is the implement a parallelized Tron AI that gets a reasonable speedup over the serialized Tron AI. Ideally, a faster Tron AI would allow us to do more in-depth decision making, and thus allow the bot to make better decisions than the serialized Tron AI. Speed is especially important due to the Google AI challenge’s limit of one second to make a decision, and makes the AI more applicable to real games of Tron that occur in real time. As a result, our main goal is to aim for better performance.

We will evaluate our performance by comparing the runtime of the sequential algorithm with the runtime of our parallel algorithm. Within the Google challenge, there was also the additional goal of providing a decision in one second, so we can also evaluate our bot by how much of the minimax tree the bot can explore in one second.

Platform Choice

We will be using C++ for maximum speed, and for familiarity with threading capabilities.

We chose the Gates 3000 machines since they have the highest number of cores, and thus the highest potential for multithreaded speedup.

We use the Google AI challenge starter code as the framework for testing and evaluation.

Schedule

Week

Plan

April 5th - April 11th

Read papers on parallel minimax and alpha beta pruning, decide on final AI algorithm

April 12th - April 18th

Implement a basic sequential Tron AI with minimax / alpha-beta pruning

April 19th - April 25th

Parallelize the Tron AI using multithreaded (parallel minimax / alpha-beta pruning, parallelize heuristic)

April 26th - May 2nd

Further improvements / optimizations to the Tron AI

May 3rd - May 9th

Test against other bots and obtain final performance data, prepare for presentation and final report