Initial Proposal: Project Proposal
Final Writeup: Final Writeup

Programming Interface

I strived to make the database API clean and usable from a C++ environment. Further, one weakness I see in SQL often is the lack of type checking, every action is specified as a string ("SELECT * FROM x WHERE y..."). In my database I attempted to construct the interface in such a way that it would detect at compile-time any errors with query construction. For example, it is impossible to specify criteria on a column not in the table or to try to insert an incomplete or incorrect row into the database.

Table Creation

The database is constructed to store structs. To create a database for a particular struct, you issue the following command:

struct record {
 int id;
 long time;

table<record> table = db_table_create(&record::id, &record::time);

This creates a new table for the record struct with two indexes, id and time. Indexes are specified using struct member pointers.


To insert rows into a table, issue the following command:

record* recs;
int num_recs;

db_insert_bulk(table, recs, num_recs);

This inserts num_recs records into the table. Insert operations are asynchronously performed on the GPU, however, so this call returns fairly quickly.


Finally to select a row from the table, all that's needed is to specify the criteria. Criteria are constructed by nested calls to LT, LE, EQ, GE, GT, AND, and OR. To issue a query that selects records with id 4 or 13 from records with time greater than 1000, issue the following series of commands:

select_handle handle = db_select_prepare(table, GT(&record::time, 1000), OR(EQ(&record::id, 4), EQ(&record::id, 13)));

record* rec;
while((rec = db_select_next(handle)) != NULL) {
 cout << "found record {id = " << rec->id << ", time = " << rec->time << "}" << endl;

Currently a limitation of the system is that you must specify which index you want to perofrm the select using. I plan to eventually allow the user to pass a single criteria and the database will figure out the best index to use to perform the select


The database is implemented as a set of btrees located on the graphics card. When the user creates a table, a btree is created for each index. The btrees are 16-32 btrees which allow efficient processing of the nodes by a warp (32 lanes). Nodes are able to be processed in constant time by having each lane check a particular key. Inserts work by transferring the data to the card, launching a block for each index, and sequentially inserting it into each index. Selects work by launching a single warp that traverses the relevent parts of the tree looking and continuously fills a variable size buffer that is transferred to the host upon filling it. I see lots of room for improvement in both these processes as most of my time was spent creating and debugging a btree which could be operated on in a SIMD environment and creating a generic criteria specification protocol.

Preliminary Results

Preliminary results show my database is about on par in terms of speed with an in-memory database like SQLite. I'm fairly pleased with this result since a lot of work went into optimizing SQLite and I would consider my code fairly unoptimized. One important performance consideration that CUDA-DB faces is PCI bus bandwidth, but this limit vanishes as soon as queries become even a little complex. When we compare two basic queries "SELECT * FROM record" and "SELECT * FROM record WHERE id = 2" we see that my database quickly matches the speed of SQLite.

In the following charts the x-axis is the size of the table and the y-axis is the total select query time or average select query time per row in milliseconds. The "WHERE id = 2" clause filters the number of results down to 1/20 the table size

The following charts so benchmarks on a more complex query. CUDA-DB does not seem to perform as well in this case which leads me to believe that criteria evaluation is not very efficient.

Future Improvements

I've really enjoyed this project and I find the results fairly impressive so I plan to continue working on it/optimizing it after the school year ends. I've made a list of possible improvements I'd like to implement:


April 14-18Created a parallel btree implementation that supports insert and lookup operations
April 21Found a bug in NVCC :(
April 25Worked around the NVCC bug and created the initial database interface. NVCC doesn't support C++11 at all so getting everything to compile was fun.
May 3Can perform inserts and selects based on basic "x = y" criteria
May 4Can perform slects based on various comparison operators (<, <=, ==, >=, >)
May 5Overhauled criteria to allow arbitrarily nested conjunctions/disjunctions of various comparison operators. Selects require index to be specified