Skip to content
View blulightspecial's full-sized avatar
  • UC Berkeley
  • Seattle

Highlights

  • Pro

Block or report blulightspecial

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
blulightspecial/README.md

About Me

I'm a Data Scientist and a graduate of UC Berkeley's Master in Data Science (MIDS) program. Currently I am working as a consulting Data Scientist for Austin Capital Data helping deliver insights for a variety of clients from fortune 500 corporations to startups to the federal government. I'm also working part time as a Course Coordinator in the MIDS Statistics class coordinating grading and lecturing on statistics concepts. Before going into Data Science, I worked as a structural engineer designing bridges and other heavy infrastructure.

Projects

In my time at MIDS I have had the opportunity to work on several very cool projects. Two of them are on the way to being published in a journal. Most links here are public. Some are private to encourage academic honesty amongst my students. If you want access give me a ring and I can add you as a collaborator.

Projects are presented in reverse chronological order

Screen Ahead Rx - A Machine Learning Model to Predict Cancer Drug Efficacy

Course: Capstone Project

Term: Summer 2021

Description: A suite of machine learning models designed to predict a cancer drug's efficacy based on drug chemical structure and on mutations in the cancer's DNA. Winner of the prestigious Hal Varian award for best capstone project of the term.

Technologies: Python, Pandas sklearn, tensorflow, Keras

Links:

A Deep Learning Model to Identify Rhinos from Drone Video

Course: Deep Learning in the Cloud and on the Edge

Term: Summer 2021

Description: A project in cooperation with WildTracks to identify rhinos and humans from drone video with the goal of preventing poaching in the Kuzikus wildlife park in Namibia.

Technologies: Python, Pandas, MQTT, Flask, YOLOv5, PyTorch

Links:

Seattle Transit Connectivity Visualization

Course: Data Visualization

Term: Spring 2021

Description: An interactive data visualization exploring transit connectiveness in the greater Seattle area. The idea is that this will be used by home buyers, renters and travelers making decisions about where to stay in the city.

Technologies: Python, Altair, Pandas, Tableau

Links:

Machine Learning Model to Predict Domestic US Flight Delays

Course: Machine Learning at Scale

Term: Spring 2021

Description: A machine learning model which uses 5 years of US domestic flight data to predict whether a given flight will be delayed using information available an hour before departure.

Technologies: Python, DataBricks, Parquet, HDFS, PySpark

Links:

Randomized Control Trial Testing the Effectiveness of Date Picker Widgets

Course: Experiments and Causality

Term: Fall 2020

Description: A randomized control trial conducted on Amazon Mturk which tested whether "date picker widgets" are faster than actually just manually entering a date.

Technologies: R statistical software, Amazon MTurk

Links:

Forest Tree Cover Prediction Model

Course: Introduction to Machine Learning w207

Term: Fall 2020

Description: A final project forked from a kaggle competition where we attempt to predict the species of tree that exists on a given piece of land in the Roosevelt National forest based on topographical factors.

Technologies: Python, Jupyter, sklearn, docker, Google Cloud Platform

Links:

An End to End Data Collection Pipeline

Course: Introduction to Data Engineering w205

Term: Summer 2020

Descritption: A flask web app that takes in web requests and puts them out on a Kafka web queue. Can be scaled to very high velocity high volume workloads.

Technologies: Python, Jupyter, Docker, Pandas, Kafka, Flask, Google Cloud Platform, Spark

Links:

Statistical Predictors of COVID Spread

Course: Statistics w203

Term: Summer 2020

Description: Using data available In Summer of 2020, we created a linear model that showed a negative linear relationship between amount of unemployment benefits and the rate of COVID infection in a state.

Technologies: R statistical software

Links:

Popular repositories Loading

  1. Course-Overview Course-Overview Public

    Forked from kayashaolu/Course-Overview

    Jupyter Notebook 1

  2. MIDS-w203_docker_image MIDS-w203_docker_image Public

    Dockerfile 1

  3. syllabus syllabus Public

    Forked from mids-w203/syllabus

    https://mids-w203.github.io/syllabus/

  4. w241 w241 Public

    Forked from 2U-jruth/w241

    This is the course repository for w241 and 290 -- Experiments and Causality.

    R

  5. reading reading Public

    Forked from mids-w203/reading

    Reading materials to be fed to the ISVC.

  6. aoc_2020 aoc_2020 Public

    Solutions to advent of code 2020

    HTML