Abstract
Random mutational fuzz testing (fuzzing) and symbolic executions are program testing
techniques that have been gaining popularity in the security research community. Fuzzing
finds bugs in a target program by natively executing it with random inputs while monitoring
the execution for abnormal behaviors such as crashes. While fuzzing may have a reputation
of being able to explore deep into a program’s state space efficiently, naive fuzzers usually
have limited code coverage for typical programs since unconstrained random inputs are
unlikely to drive the execution down many different paths. In contrast, symbolic execution
tests a program by treating the program’s input as symbols and interpreting the program
over such symbolic inputs. Although in theory symbolic execution is guaranteed to be effective
in achieving code coverage if we explore all possible paths, this generally requires exponential
resource and is thus not practical for many real-world programs.
This thesis presents our attempt to attain the best of both worlds by combining fuzzing with
symbolic execution in a novel manner. Our technique, called hybrid fuzzing, first uses symbolic
execution to discover frontier nodes that represent unique paths in the program. After collecting
as many frontier nodes as possible under a user-specifiable resource constraint, it transits to
fuzz the program with preconditioned random inputs, which are provably random inputs that respect
the path predicate leading to each frontier node. Our current implementation supports programs with
linear path predicates and can automatically generate preconditioned random inputs from a polytope
model of the input space extracted from binaries. These preconditioned random inputs can then be
used with any fuzzer. Experiments show that our implementation is efficient in both time and space,
and the inputs generated by it are able to gain extra breath and depth over previous approaches.