I have an overarching interest in causation: learning the causal relationships between variables from observational data. This falls under the heading of "causal network structure learning", or more concisely, "causal search".

Why it's called "search": We start with observations on a set of variables, and we wish to learn which of those variables influence which others. If we picture each variable as a node in the network, and the relationship "A causally influences B" as an edge AB, then we are trying to find the true network: the true set of edges over our set of variables. In other words, we are searching the space of all possible sets of edges, to find the one graph that represents the true causal relationships.

For example: we think that certain genes regulate other genes, controlling the degree to which those genes are expressed. If we have observed gene expression levels for a set of genes, we can search for the true regulatory network over those genes.

Causal search is useful when controlled experiments are impossible or unethical. Even when experiments are possible, they are often expensive. Causal search can help us prioritize the experiments that are most likely to be informative. For example, a biologist who wants to find new genetic regulatory relationships could run a causal search algorithm on gene expression data, to choose which genes to mutate. A tech company modifying its software could run a causal search algorithm on features of the interface and user behavior, to decide which features to change in an A/B test.

Currently I am developing methods for learning genetic regulatory networks when we have a small sample size, by using information from related species. This is an instance of transfer learning.

Previous work

Poster: Grace Hopper Conference, 2014

Poster (with Richard Scheines and Ilya Goldin): Educational Data Mining, 2014 (and the paper we originally submitted)

NIPS workshop on Causation, 2013: [Slides] [Video]

My Masters thesis (2013)