Fanyi Xiao

Graduate Student

216 Smith Hall
Robotics Institute
Carnegie Mellon University

Email: fanyix [dot] cs [at] gmail [dot] com


Hello! I am a second-year master student in the Robotics Institute of Carnegie Mellon University, where I am co-advised by Prof. Martial Hebert and Prof. Yaser Sheihk. I got my Bachelor's degree in Computer Science from Central South University, China in 2012.

Research Interests

I am interested in Computer Vision and Machine Learning. More specifically, I am interested in leveraging the large amount of data emerging nowadays to reason about our visual world from machine's perspective. For applications, I'm broadly interested in many applications like object detection, scene understanding, fine-grained instance categorization and large scale data-mining for visual representations, etc.. I'm currently working on scaling up exemplar-based object detection method at runtime with model recommendation.


Model Recommendation For Large Scale Exemplar-based Object Detection
Fanyi Xiao, Martial Hebert, Yaser Sheikh, Mei Chen, Yair Movshovitz-Attias, Denver Dash

In this project, we aim at exploring ways of scaling up exemplar-based method for object instance detection. Despite of many benefits brought by exemplar-based method (e.g., data-driven essence, accurate matching and meta-data transfer), a serious problem arises with this class of approaches, as with all non-parametric methods: these methods cannot be applied in the scenario where we have a huge collection of models (say tens of thousands of models, which is actually what makes these methods attractive) because of the intractable runtime computation. However, by using the fact that the models are not independent with each other (e.g., similar viewpoints have similar appearance), we can effectively reduce the number of models we need to apply during runtime while preserving good performance as compared with exhaustive search, thus makes exemplar-based approaches scalable to large-scale regime.

Discriminative Patches For Scalable Fine-grained Object Detection
Fanyi Xiao, Martial Hebert, Yaser Sheikh, Mei Chen, Yair Movshovitz-Attias, Denver Dash

This is a project aiming at using the model recommendation and discriminative mid-level representations to help us detect and categorize the object in a fine-grained manner (e.g. detecting a car in the image and identify its make and model). The idea is to exploit the discriminativeness and representativeness of mid-level visual representation (e.g. image patches), by drawing the connection between mid-level representation and higher-level object representation (i.e. exemplar models).

Multi-Task Regularization with Covariance Dictionary for Linear Classifiers
Fanyi Xiao, Ruikun Luo, Zhiding Yu

In this project we propose a multi-task linear classifier learning problem called D-SVM (Dictionary SVM). D-SVM uses a dictionary of parameter covariance shared by all tasks to do multi-task knowledge transfer among different tasks. We formally define the learning problem of D-SVM and show two interpretations of this problem, from both the probabilistic and kernel perspectives. From the probabilistic perspective, we show that our learning formulation is actually a MAP estimation on all optimization variables. We also show its equivalence to a multiple kernel learning problem in which one is trying to find a re-weighting kernel for features from a dictionary of basis (despite the fact that only linear classifiers are learned). Finally, we describe an alternative optimization scheme to minimize the objective function and present empirical studies to valid our algorithm.

Efficient Temporal Commonality Discovery
Fanyi Xiao*, Wen-Sheng Chu*, David Fouhey* (equal contribution)

This project proposes to improve the efficiency for the task of Temporal Commonality Discovery (TCD), which aims to find correspondence in temporal data in an unsupervised manner. This problem is relatively unexplored and bears many challenges. We present an approach for TCD that blends the local-minima-excluding advantages of previous branch-and-bound approaches with the computational efficiency of standard local search techniques.

Marvin: A Multi-modal Intelligent Retaling Assistant
Iljoo Baek, Taylor Stine, Denver Dash, Fanyi Xiao, Yair Movshovitz-Attias, Mei Chen, Yaser Sheikh, Martial Hebert
[PDF][Demo Video]

We present Marvin, a system that can search physical objects using a mobile or wearable device. It integrates HOG-based object recognition, SURF-based localization information, automatic speech recognition, and user feedback information with a probabilistic model to recognize the "object of interest'' at high accuracy and at interactive speeds. Once the object of interest is recognized, the information that the user is querying, e.g. reviews, options, etc., is displayed on the user's mobile or wearable device. We tested this prototype in a real-world retail store during business hours, with varied degree of background noise and clutter. We show that this multi-modal approach achieves superior recognition accuracy compared to using a vision system alone, especially in cluttered scenes where a vision system would be unable to distinguish which object is of interest to the user without additional input. It is computationally able to scale to large numbers of objects by focusing compute-intensive resources on the objects most likely to be of interest, inferred from user speech and implicit localization information. We present the system architecture, the probabilistic model that integrates the multi-modal information, and empirical results showing the benefits of multi-modal integration.

Honors & Awards

Graduate Research Fellowship, CMU, 2013
"Best Undergraduate Thesis Award", CSU, 2012
"Best Intern Group", ChinaSoft Corporation, 2011
"Excellence Scholarship" (University-wide highest honor, 0.8%), CSU, 2010
"Sunward Excellent" Scholarship (0.4%), Sunward Corporation, 2010
National Scholarship (1%), Ministry of Education, 2009
"1st Grade Scholarship" (6%), CSU, 2009
Outstanding Student, CSU, 2009-2012