### Contact Information

PhD Student

Department of Statistics and

Machine Learning Department

Office: GHC 7412

Carnegie Mellon University

5000 Forbes Avenue

Pittsburgh, PA 15213

Phone: (615) 364-7734

Email: click here to reveal

Department of Statistics and

Machine Learning Department

Office: GHC 7412

Carnegie Mellon University

5000 Forbes Avenue

Pittsburgh, PA 15213

Phone: (615) 364-7734

Email: click here to reveal

I'm a fourth year PhD student, jointly in the Machine Learning Department and the Department of Statistics at Carnegie Mellon University (CMU). I graduated from CMU in 2014 with a BS in Computer Science and a BS in Mathematical Sciences, as well as an MS in Mathematical Sciences. I'm fortunate to be supported by a Graduate Research Fellowship from the NSF and the Richard King Mellon Foundation Fellowship in the Life Sciences.

### Research

My broad research interests are in the mechanisms and limits of stochastic information processing in both artificial and biological systems, viewed through a rigorous mathematical lens. I am also passionate about most other parts of mathematics, statistics, and theoretical computer science, especially analysis, information theory, and algorithms/complexity, as well as more biological areas such as computational and systems biology, genomics, and cognitive development. Specific areas I have worked in include nonparametric statistics, sparse dictionary learning, and modeling and prediction in biological systems, including (neural) functional connectivity, bacterial swarms, and higher-order genome organization.##### Some Current Projects

**Nonparametric Density Functional Estimation**

With Barnabás Póczos, I'm studying estimators for integral functionals of probability densities, including information theoretic quantities (e.g., families of entropy, mutual information, divergence) and Sobolev norms. We first derived convergence rates for boundary corrected plug-in estimators of Renyi divergence and more general functionals. We then derived the first convergence rates for popular but poorly understood K-nearest neighbor estimators of entropy and then more general functionals. Achieving optimal convergence rates with these estimators often requires knowledge of the smoothness of the densities in question, which is rarely available in practice. Recently, we studied estimation of Sobolev Norms (a common measure of smoothness of a function), which may help us develop estimators that can adapt to densities of unknown smoothness. Most recently, we studied estimation of mutual information and entropy under a Gaussian copula assumption. This somewhat restricts the interactions that can be modeled but often leads to significantly faster convergence in high dimensions. Even if the data are not truly from a Gaussian copula, this often gives a more useful result than a fully nonparametric estimator (which can fail to converge with realistic sample sizes), despite being asymptotically inconsistent.

**Bacterial Swarm Optimization**

With Ziv Bar-Joseph and Saket Navlakha I'm working on biologically inspired distributed algorithms; specifically, inspired by cooperative foraging behavior of E. coli swarms, we are studying a distributed optimization framework that provably converges under very general constraint and objective functions and strong constraints on communication. We're also trying to understand the communication networks and mechanisms used by real E. coli cells.

**Deep Learning for Prediction from Gene Sequence Data**

With Jian Ma, I'm working on leveraging recent advances in deep learning to develop a system that uses only DNA sequence information to predict certain functional properties of genes. Historically, most prediction methods have relied on using epigenetic markers as predictors, due to difficulty extracting relevant features directly from the genetic code. We recently implemented a deep neural network model that helps by automatically learning informative features of the genetic sequence. We also implemented a gradient-boosting model, which it better able to explicitly predictive sequence features. These models may eventually reduce the need for gathering expensive and time-consuming experimental data, and help elucidate the mechanisms by which the DNA sequence determines its own function in different cell types.

##### Some Past Projects

**Computational Neuroscience of Vision**

With Tai Sing Lee and Jason Samonds, I studied computational models of binocular disparity perception in visual systems of primates, with interest in modelling understanding how neural network dynamics encode perceptual uncertainty. I'm keen to revisit neural data analysis after developing well-understood tools for estimating information-theoretic quantities.