Learning a Portfolio-Based Checker for Provenance-Similarity of Binaries
This is an ongoing Independent Research & Development (IRAD)
project at the Software Engineering Institute, Carnegie Mellon
University. The goal of this project is to explore the use of
supervised learning (a.k.a. classification) in detecting
provenance-similarity between binaries, or executables. Broadly,
two binaries are provenance-similar if they have been compiled
from similar source code with similar compilers. Detecting
provenance-similarity is a challenging area of research, with
important applications ranging from code clone detection,
understanding the impact of software updates, judging the
provenance of untrusted software, and fighting against malware.
The project is being led by Sagar
Chaki, Arie Gurfinkel, and Cory Cohen. Our current focus is on
detecting similarity between functions. Intuitively, a
function is a fragment of a binary derived by compiling a
source-level procedure or method. We believe that functions are an
ideal basis for judging binary similarity: they are the
fundamental units of a binary's behavior. If two binaries have
many functions in common, then they are very likely to be
similar. The greater the share of common functions, the higher the
degree of similarity. We have recently blogged
about our work.
Benchmark
We are releasing a benchmark and some tools that we have
developed, and are using as part of our project. Once you download and unpack the
distribution (using tar -xvfj), read the README.txt file for
further instructions. The benchmark is derived from some of the
most downloaded open-source software available from Soureforge. We compiled the source
code using three versions of Microsoft Visual Studio: 2003 .NET,
2005 and 2008. We then extracted functions from the resulting
binaries using IdaPro, together with
our custom extensions. Finally we extracted features using a
custom Rose plugin. The
benchmark is packaged as a SQLite3 database. The tools
should run on a modern Linux distribution (we have used them on
Ubuntu 8.04 and 9.04).
Publications
Contact