Hongfei Zhang

View resume in PDF.

Linkedin Profile

1 Bayard Rd, Apt A6, Pittsburgh, PA, 15213

Hongfei Zhang

Work Experience

Platfora, San Mateo, CA, June 2015 -- August 2015
Software Engineer in Distributed Systems Team

  • Developed Amazon Elastic MapReduce checks service for Hadoop-check based on Platfora abstract services
  • Integrated YARN, Hortonworks, Cloudera and Spark service supports
  • Enabled Platfora distributed configuration auto-troubleshooting within 5-6 minutes

Harbin Institute of Technology, Harbin, China, October 2012 -- May 2014
Research Assistant in Computer Science School

  • Designed and Implemented algorithms of distributed entity resolution
  • Developed xml data duplicate detection based on entity description attribute
  • Developed data clean components of conflicts resolution by MapReduce Framework

Teaching Experience

Carnegie Mellon University, Pittsburgh, PA, September 2015 -- Present
Head TA in Advanced Algorithms and Data Structures (15-650/02-613/15-351)


Carnegie Mellon University, Pittsburgh, PA, USA, August 2014 -- Present
M.S. in School of Computer Science, Language Technologies Institute, GPA 3.8/4.0

Harbin Institute of Technology, Harbin, China, August 2010 -- July 2014
B.E. in Honors School, Elite Program, Computer Science and Technology, Top 5%

Academic Projects

Search Engine on Large Dataset, Carnegie Mellon University, September 2015 -- present

  • Designed and implemented a text-based large scale search engine indexed with Lucene API on corpus of 500,000+ documents from ClueWeb09 dataset
  • Supported score and inverted list query operators and retrieval models/algorithms including Unranked/Ranked Boolean, Okapi BM25, statistical language models like Indri, etc.
  • Evaluated by metrics like MAP, MRR, F score, etc.

Fax to EMR System (F2E), Carnegie Mellon University, October 2014 -- May 2015
Cooperating with Allegheny General Hospital, Agile team of 9, digitizing paper-based medical records

  • Extracted field content from paper-based medical forms and segmented character images with JavaCV and image processing algorithms, including graph rotation, line breaking, space removal and layout detection
  • Recognized segmented words using text mining and machine learning algorithms like random forest
  • Integrating open sourced OCR engines like Google Tesseract. Improved precision to 85%

Language Model Classifier on Large Dataset, Carnegie Mellon University, March 2015 -- May 2015

  • Designed different NLP classifiers on POS tagged WSJ corpus. Reduced perplexity of corpus by 30%
  • Implemented machine learning models including decision tree, EM, HMM and neural networks
  • Incorporated smoothing techniques like Dirichlet model and mixture model

Biomedical Question Answering System (BioQ&A), Carnegie Mellon University, November 2014
Apache UIMA framework, intelligent question answering system, agile team of 4

  • Extracted query keywords based on NER and NLP tools like LingPipe, OpenNLP and Stanford NLP, etc. Applied GoPubMed as web service
  • Implemented pipeline including query parser, document retrieval, snippet retrieval, answer generation and evaluation. Improved the precision of document retrieval to over 95%
  • Retrieved relevant documents by retrieval models like Ranked Boolean, BM25 and VSM

Distributed XML Data Conflicts Resolution System, Harbin Institute of Technology, March 2014 -- June 2014

  • Developed xml data duplicate detection based on entity description attribute
  • Implemented xml data true value discoverer by Bayesian graph model
  • Designed both single-node and distributed process based on MapReduce Framework

Network Connection Administration System in Ubuntu, Harbin Institute of Technology, November 2013

  • Developed system and server part. Called and scheduled different application software
  • Built a buffer pool for data transfer in case of disconnection
  • Implemented unicast and multicast modes using different protocols (UDP & TCP)

Selected Courses

  • Advanced Algorithms & Data Structures
  • Machine Learning
  • Language and Statistics
  • Big Data Systems
  • Software engineering
  • Search Engines
  • Robotics and Machine Learning


Languages: Java, Python, C/C++, Scala, C#, XML, SQL, HTML, Hive, Pig, R, MATLAB, VHDL

Other Skills: Machine Learning, Hadoop, AWS EMR, Spark, Distributed Systems, Algorithms Design, NLP, UIMA, Statistical Analysis, Compiler Theories, Databases

Honors and Awards

Silver Medal in International Genetically Engineered Machine Competition (IGEM)
Massachusetts Institute of Technology, October 2013

Honorable Mention in Interdisciplinary Contest in Modeling (ICM)
the Consortium for Mathematics and Its Applications(COMAP), February 2013

The 2nd Prize of People's Scholarship
Harbin Institute of Technology, Honors School top 3 in each class of Elite Program

Runner-up in HIT Football Champions League
Harbin Institute of Technology, May 2011

Gold Medal (twice), Chinese National High School Mathematics Compitition
Chinese Mathematical Society, 2008, 2009