Project Proposal:
Pricing American Asian Options with the Binomial Model
Daniel Lu (dylu), Da-Yoon Chung (dayoonc)
Main Project Page
Summary
We will implement an engine to price path-dependent (Asian) stock options in parallel. Here is a link to a more in-depth summary.
Background (from our detailed summary)
Challenge
• Since exhaustively considering every path is intractable, our algorithm generates averages of sample paths in parallel to be stored with each node and uses interpolation to find the values of the option that have not already been computed. Since we need partial running averages of our sample paths at each node, this step could be done independently in parallel but since the paths are of different lengths, divergent execution may occur. Thus, we need to find a way to assign the work to reduce/minimize divergent execution.

• For the actual backward-inductive algorithm that we use to price the option, we will try several approaches. One of the ways to do this is to divide the recombining tree into blocks to maximize the use of shared memory (see Figure 1). We shall monitor the number of levels we divide the blocks per iteration of the algorithm as well as consider different ways of grouping the computation into such blocks.

Figure 1 (source)
Resources
In terms of code, we will be writing our implementation from scratch. We will run our code on the GHC machines with NVidia GPUs.

Here are some outside resources we will be using:
• Hull, J., and White, A. Efficient Procedures for valuing European and American path-dependent options. Journal of Deriva- tives, 1, 1993
• Klassen, T. Simple, fast, and flexible pricing of Asian options.
• Zhang, N., Lim, E., Man, K. Lei, C. CPU-GPU Hybrid Parallel Binomial American Option Pricing. Proceedings of the International MultiConference of Engineers and Computer Scientists 2012 Vol II, 2012
• An NVidia document about a CUDA implementation of the binomial pricing model
Goals/Deliverables
Our first goal is to implement a correct, sequential CPU version of our option pricing algorithm. This version will be used as a benchmark to compare the accuracy and speedup of our parallel version.

Next, we wish to implement a parallel version of our option pricing algorithm in order to achieve the maximum possible speedup on the Nvidia GTX 480 GPUs. Since there are O(N^2) number of nodes in our recombining tree, for sufficiently large N, we would undoubtedly have sufficient amount of work to keep all of the processors in the GPU busy.

Finally, we would like to compare the results (speed and accuracy) of our algorithm on real-world data and present our statistical findings at the parallelism competition.
Platform
We will be implementing our project on the GHC 3000 machines with NVidia GTX 480 GPUs using C++, CUDA, and Thrust. We decided to write our code for GPUs rather than multi-core systems using OpenMP or MPI because of the small granularity of the units of work we will be parallelizing over, the large number of these units of work, and the many opportunities for sharing memory.
Proposed Schedule
[Please do not modify the schedule on this page after the proposal process (it is your proposed schedule, and it will be useful to compare it to your actually project logs at the end of the project). Update the schedule on your project main page if your goals or timeline changes over the course of the project.]

 Week What We Plan To Do Apr 1-7 Develop the proposal Apr 8-14 Write and test a serial implementation Apr 15-21 Begin writing and testing a parallel implementation Apr 22-28 Analyze our current results, and implement optimizations Apr 29-May 5 Test using real world data, compare results and performance to existing alternatives May 6-11 Write the final presentation