|4/20 - 4/23|| Improve cache locality of optimistic concurrent cuckoo hashmap.
Create better test suite for benchmarking.
|4/24 - 4/30||Additional optimizations for our optimistic concurrent cuckoo hashmap implementation.|
|5/1 - 5/9||Parallel cuckoo hashmap using BFS to speed up insertion time.|
We have implemented the following:
|Implementation||20M Insert Time (Seconds)||20M Read Time (Seconds)|
|C++11 Unordered Map||32.91||12.95|
|Parallel (Bucket-Level) Separate Chaining with 48 Threads||3.50||1.47|
|Optimistic Concurrent Cuckoo with 48 Threads||21.91||1.22|
We have included preliminary results of our hashmap implementations above, running our test suite on an Intel Xeon E5645 CPU. As expected, the parallel bucket-level hashmap is faster than sequential implementations. One perhaps surprising result is that our optimistic concurrent cuckoo hashmap insertion is even slower than the sequential cuckoo hashmap. This is because the optimistic concurrent cuckoo hashmap is optimized for a read-heavy workload. We sacrifice insertion throughput but improve read throughput by minimizing the chances of concurrent read-write conflicts.
We have managed to implement most of what we originally planned to achieve by the end of the semester. However, to be fair, we believe that for versions (4) and (5), there are some performance bugs or implementation details that we may be missing that is preventing us from achieving even greater speedup. Specifically, we have not been able to achieve speedup by adding tags in (4) to improve cache locality and reduce pointer dereferences. (UPDATE: We have solved this issue!)
We will create a number of graphs to visualize our benchmarking results. These graphs will mostly show the performance of different key-value store implementations as we vary the percentage of read and write requests.