ABOUT

Mohan Liu obtained a B.S degree of Physics from Nanjing University in 2013. He is a PhD student in Materials Science and Engineering at Northwestern University. His research focuses on using high-throughput computational databases to accelerate materials design and discovery. In addition to his research, he serves as a code debugger, database maintainer and web developer for his research group. He also has strong experiences on deep learning, image processing, natural language processing, RESTful API, web design and etc. In his free time, he is a bass player in a rock/metal band and his band has performed their original songs several times in Chicago.

EDUCATION

PhD, Materials Science and Engineering
Northwestern University, Evanston, IL
Cumulative GPA: 3.71/4.0
Transcript

Sep. 2013 - Aug. 2019

Bachelor of Science, Physics
Nanjing University, China
Cumulative GPA: 3.89/4.0 | Major GPA: 3.96/4.0 | Class Rank: 2/170
Transcript

Sep. 2009 - July 2013

Data Science
The Data Incubator, San Fransisco, CA
Credential ID 68978

June 2019 - Aug. 2019

Exchange Program, Cross-disciplinary Scholars in Science and Technology (CSST)
UCLA, Los Angeles, CA

July 2012 – Sep. 2012



TECHNICAL SKILLS

Data Science
Data Mining (Pandas), Data Visualization (Plotly, Bokeh, D3.js, Google Charts), Machine and Deep Learning, Time Series Analysis/Forecasting (Prophet), Image Processing (OpenCV), Natural Language Processing (NLTK, Spacy, Ployglot)

Machine Learning Algorithms
Supporting Vector Machine, Random Forest (Scikit-learn), Gradient Boosting Decision Tree (LightGBM, XGBoost), Convolutional Neural Networks (TensorFlow, PyTorch), Latent Semantic Analysis

Data Engineering
SQL (MySQL, PostgreSQL), Apache Spark, RESTful API Implementation (Django), GUI Programming (TKinter), Web Scrapping (BeautifulSoup, Selenium), Web Development (Flask, Dash, HTML, CSS, JavaScript)

Languages and Platforms
Python, C/C++, GCP (Cloud SQL, Cloud Function, App Engine), AWS (EC2), Git, Docker, Heroku


Python
Data Analysis
Machine Learning
Deep Learning
Data/Web API
MySQL
MongoDB
C++
R
AWS
Docker


DATA SCIENCE PROJECTS

Divvy Chicago Bicycle Sharing project
Predict the daily need of bikes for each divvy station and search for new locations to open a new Divvy Bicycle station. (Web application)

  • Explored the Divvy bike sharing system in Chicago and deployed an end-to-end product to predict daily bike demand at each Divvy station on a future date
  • Applied time series forecasting using historical bike-trip data considering stationarity, seasonality, special events and historical Chicago weather; decreased the error by 50% compared with baseline model
  • Developed a Live Station Status Monitor to visualize the trend of available bikes/docks at each station over the past week
  • Built an ETL pipeline including extracting real-time data of Divvy bike station status using cron job scheduler, transforming JSON format into PostgreSQL and loading the data into Google Cloud database


  • Keywords: EDA, Time Series, Random Forest, Google Cloud Platform
    Github repository 1 Github repository 2

    June 2019 - Aug. 2019
      

    Avito Demand Prediction Challenge
    Predict demand for an online classified advertisement (Kaggle Chanllenge Link)

  • Applied latent semantic analysis for natural language processing (using NLTK and polyglot) to extract test features from advertisement titles and descriptions. Trained convolutional neural network (using TenserFlow) to collect image features from images of items provided by each seller.
  • Stored large datasets (more than 15 GB in total) using Hierarchical Data Format (HDF) and Feather format to facilitate data I/O. Performed multiprocessing to accelerate feature engineering process.
  • Trained the dataset with more than 1,500,000 observations and 900 features using gradient boost decision tree model (using LightGBM) and our predicted RMSE is 0.2236 (ranked top 11%).



  • Keywords: NLP, SVD, CNN, OpenCV, Multiprocessing
    Github repository

    June 2018
         

    Home Credit Default Risk
    Predict how capable each applicant is of repaying a loan (Kaggle Chanllenge Link)

  • Applied one-hot encoding for categorical features and preformed PCA to reduce dimensions. Conducted recursively feature elimination to further extract the most important features.
  • Trained a large dataset with more than 300,000 observations using gradient boost decision tree model (using LightGBM) and our predicted area under the ROC curve is 0.796.
  • Ensemble and cross validation have been utilized to overcome potential overfitting. Amazon Web Services (AWS) has been used to accelerate training process.



  • Keywords: PCA, GBDT, AWS
    Github repository

    May 2018
         



    RESEARCH PROJECTS

    Materials informatics: large databases and machine learning for materials design and discovery
    Advisor: Professor Chris Wolverton, Northwestern University, IL

  • Managing and maintaining a computational materials database (using MySQL) containing calculated physical and chemical properties for >600,000 compounds.
  • Developed a simple and efficient web API based on REpresentational State Transfer (REST) principles to provide the community with an easy access to our database.
  • Trained three regression models (LASSO, SVR and random-forest) using our materials dataset and predicted materials band gaps with ~20% relative RMSE. Constructed our feature space using elemental-property-based attributes and perform univariate feature selection to reduce feature dimensions.



  • Keywords: Materials Database, API, MySQL, Django, Machine Learning
    Github repository DockerHub repository

    June 2017 - Aug. 2019
    IN PROGRESS
         

    Predictive modeling of adsorbate coverage and compositional effects on catalytic activity
    Advisor: Professor Chris Wolverton, Northwestern University, IL

  • Investigated interplay of compositional and local atomic ordering on adsorption at a bimetallic surface alloy to search for optimal catalyst with high activity and low cost.
  • Generated computed materials data for >500 different crystal structures and applied a linear regression model to predict catalytic activity at alloyed surfaces with less than 1% relative RMSE.
  • Trained a classification model to create a strategy to determine which method should be used when studying a certain alloy catalyst, which can reduce the overall computational cost by over 50%.



  • Keywords: Catalysts, Alloys, Linear Regression

    Sep. 2013 - June 2017

    Improved Heisenberg Model calculation for ferromagnetic system
    Supervisor: Professor Vidvuds Ozolins, UCLA, CA

  • Improved Heisenberg Model by adding long-range and multi-body (triplets and quadruplets) interaction and calculated the Curie Temperature of ferromagnetic materials.
  • Mastered Vienna Ab initio Simulation Package (VASP), a program for atomic scale materials modeling from first principles.
  • Grasped Compressive Sensing method, generate my own code and applied it to physical calculation.

  • Keywords: Heisenberg Model, Compressive Sensing, First-principles

    July 2012 – Sep. 2012

    Synthesis and magnetic properties analysis of carbon-based nanotubes
    Supervisor: Professor Wei Zhong, Nanjing University, China

  • Dealt with synthesis and magnetic properties of carbon-based nanotubes.
  • Mastered chemical vapor deposition (CVD), a commonly used method of carbon nano-materials production, as well as Transmission Electron Microscope (TEM) and Scanning Electron Microscope (SEM).
  • Look for the most effective water-soluble catalysts to help the reaction (other than commonly used metal catalysts which are difficult to be removed from product).

  • Keywords: Nanotubes, CVD, SEM, TEM

    Sep. 2012 - June 2013



    PUBLICATIONS

    1. Peng-Cheng Chen, Mohan Liu, Jingshan Du, Brian Meckes, Shunzhi Wang, Haixin Lin, Vinayak P. Dravid, Chris Wolverton, Chad A. Mirkin, "Interface and heterostructure design in polyelemental nanoparticles", Science 363, Issue 6430 (2019), pp. 959-964, Download

    2. Liliang Huang, Mohan Liu, Haixin Lin, Yaobin Xu, Jinsong Wu, Vinayak P. Dravid, Chris Wolverton, Chad A. Mirkin, "Shape regulation of high-index facet nanoparticles by dealloying", Science 365, Issue 6458 (2019), pp. 1159-1163, Download

    3. Huang, Liliang, Peng-Cheng Chen, Mohan Liu, Xianbiao Fu, Pavlo Gordiichuk, Yanan Yu, Chris Wolverton, Yijin Kang, and Chad A. Mirkin. "Catalyst design by scanning probe block copolymer lithography." Proceedings of the National Academy of Sciences 115, no. 15 (2018): 3764-3769. Download

    4. Jingshan Du, Yi-Ge Zhou, Mohan Liu, Edward J. Kluender, Andrey Ivankir, Peng-Cheng Chen, James L. Hedrick, Chris Wolverton, Vinayak P. Dravid, Chad A. Mirkin, "Combinatorial Assessment of Au-Cu Alloy Nanoparticle Electrocatalysts", (under preparation, 2019)

    5. Mohan Liu, William F. Schneider, and Chris Wolverton, "Configuration-dependent adsorption energy atbimetallic surfaces", (under preparation, 2019)

    6. Mohan Liu, Vinay I. Hegde, and Chris Wolverton, "High-throughput hybrid-functional DFT investigations on materials band gaps and formation energies", (under preparation, 2019)

    7. Mohan Liu, Vinay I. Hegde, and Chris Wolverton, "qmpy: A RESTful API for accessing materials properties in the Open Quantum Materials Database", (under preparation, 2019)



    AWARDS

    Hierarchical Materials Cluster Program fellowship
    Northwestern University

    Oct. 2014

    Excellent Leader of Student Union
    Nanjing University

    July 2011

    Outstanding Student Model of Nanjing University
    Nanjing University, awarded to top 0.15%

    May 2011

    National Scholarship
    Nanjing University, awarded to top 1.5%

    Nov. 2010



    ADDITIONAL INFORMATION

    Languages
    Native Speaker of Mandarin; Fluent in English

    Interests
    Bass, guitar, basketball, skiing, swimming