I'm a Data Scientist and a graduate of UC Berkeley's Master in Data Science (MIDS) program. Currently I am working as a consulting Data Scientist for Austin Capital Data helping deliver insights for a variety of clients from fortune 500 corporations to startups to the federal government. I'm also working part time as a Course Coordinator in the MIDS Statistics class coordinating grading and lecturing on statistics concepts. Before going into Data Science, I worked as a structural engineer designing bridges and other heavy infrastructure.
In my time at MIDS I have had the opportunity to work on several very cool projects. Two of them are on the way to being published in a journal. Most links here are public. Some are private to encourage academic honesty amongst my students. If you want access give me a ring and I can add you as a collaborator.
Projects are presented in reverse chronological order
Screen Ahead Rx - A Machine Learning Model to Predict Cancer Drug Efficacy
Course: Capstone Project
Term: Summer 2021
Description: A suite of machine learning models designed to predict a cancer drug's efficacy based on drug chemical structure and on mutations in the cancer's DNA. Winner of the prestigious Hal Varian award for best capstone project of the term.
Technologies: Python, Pandas sklearn, tensorflow, Keras
Links:
- Project Website hosted on iSchool
- Zoom Recording of Final Presentation
- Final Presentation Slide Deck
- Medium Article by the UC Berkeley iSchool
A Deep Learning Model to Identify Rhinos from Drone Video
Course: Deep Learning in the Cloud and on the Edge
Term: Summer 2021
Description: A project in cooperation with WildTracks to identify rhinos and humans from drone video with the goal of preventing poaching in the Kuzikus wildlife park in Namibia.
Technologies: Python, Pandas, MQTT, Flask, YOLOv5, PyTorch
Links:
Seattle Transit Connectivity Visualization
Course: Data Visualization
Term: Spring 2021
Description: An interactive data visualization exploring transit connectiveness in the greater Seattle area. The idea is that this will be used by home buyers, renters and travelers making decisions about where to stay in the city.
Technologies: Python, Altair, Pandas, Tableau
Links:
Machine Learning Model to Predict Domestic US Flight Delays
Course: Machine Learning at Scale
Term: Spring 2021
Description: A machine learning model which uses 5 years of US domestic flight data to predict whether a given flight will be delayed using information available an hour before departure.
Technologies: Python, DataBricks, Parquet, HDFS, PySpark
Links:
Randomized Control Trial Testing the Effectiveness of Date Picker Widgets
Course: Experiments and Causality
Term: Fall 2020
Description: A randomized control trial conducted on Amazon Mturk which tested whether "date picker widgets" are faster than actually just manually entering a date.
Technologies: R statistical software, Amazon MTurk
Links:
Forest Tree Cover Prediction Model
Course: Introduction to Machine Learning w207
Term: Fall 2020
Description: A final project forked from a kaggle competition where we attempt to predict the species of tree that exists on a given piece of land in the Roosevelt National forest based on topographical factors.
Technologies: Python, Jupyter, sklearn, docker, Google Cloud Platform
Links:
An End to End Data Collection Pipeline
Course: Introduction to Data Engineering w205
Term: Summer 2020
Descritption: A flask web app that takes in web requests and puts them out on a Kafka web queue. Can be scaled to very high velocity high volume workloads.
Technologies: Python, Jupyter, Docker, Pandas, Kafka, Flask, Google Cloud Platform, Spark
Links:
- Final Report Rough version, final was lost.
Statistical Predictors of COVID Spread
Course: Statistics w203
Term: Summer 2020
Description: Using data available In Summer of 2020, we created a linear model that showed a negative linear relationship between amount of unemployment benefits and the rate of COVID infection in a state.
Technologies: R statistical software
Links: