This repository contains code and documentation for MSDS 434 Analytics Application Development final project (Part I). It covers week 1-10 learning objectives, documentation, code used, and videos that correspond with them.
- Class Contents
- Week 1 - Introduction to GCP
- Week 2 - Github and Continuous Integration
- Week 3 - Google Cloud Platform (GCP)
- Week 4 - Cloud-Native Database Choice and Design
- Week 5 - Applied Data Engineering
- Week 6 - Managed ML Platforms
- Week 7 - Operationalizing ML Models
- Week 8 - Total Cost of Ownership (TCO) Estimation for Engineering Projects
- Week 9 - Monitoring
- Week 10 - MVP
- Learnings from my classmates
- Final Thoughts
- Repository Information
- Demo instantiation of an instance on both the Google Cloud Platform (GCP), as well as on Amazon AWS.
- Describe one application of AI that you would be interested in pursuing in the context of the GCP leveraging the technologies identified in this course.
- How to start up and stop projects within GCP and AWS
- How to access the consul within GCP and AWS
- Start up a project in both GCP and AWS
- Create a GitHub repository and push code to it
- Clone GitHub repository to both GCP and AWS projects
- Edit code in GCP and AWS projects and push the changes back to the GitHub repository
- How to clone a GitHub repository in both GCP and AWS
- How to edit code in GCP and AWS, then push it to my GitHub repository
- In GCP, you need to 'sudo install git' as git is not installed automatically
- Create a “hello world” pipeline to Google Cloud that calls into a Python-based Google App Engine (GAE) project and returns “hello world” as a JavaScript Object Notation (JSON) response.
- How to set up APIs & Services on GCP
- Enabling Cloud Builder API
- Connecting my billing account to the project
- Activating a yaml file to deploy app
- How to shutdown project after I am done using it
- Use gcloud commands to verify the shutdown of the project
- Create an ingest to ETL pipeline using CSV files and Google BigQuery
- Schedule a recurring cron job to batch update the data
- How to access Cloud Shell Editor in GCP and use the GUI to manage directories
- Use Python to call an API and save the json output in a pandas data frame, then export the data frame to a CSV file in my GCP project
- Use command line to grab a CSV file in my project directory, move it to a GCP Bucket, then load the data into BigQuery
- Return an aggregated result with a machine-learning (ML) prediction using Google BigQuery ML
- Serve out results using Google App Engine
- How to load a dataset into BigQuery
- Run a basic Machine Learning (ML) Regression Model within BigQuery
- View results of model and evaluate how it performed
- Train a multi-class classification model on AutoML
- How to use AutoML within BigQuery
- AutomML does take time to run and might not provide the best model results
- Create a production and development environment
- Deploy your final project to both environments
- Different ways you can deploy, monitor, and maintain multiple environments
- Built a Staging and Production projects environment within GCP
- Use the Google Cloud Platform Billing API
- Create a cost forecast using BigQuery ML
- Connect data transfer to a GCP billing table you create for your project
- Go to 'Billing Export' within GCP and enable the different billing options you would like to export and connect to BigQuery table
- It does take time to load billing data into table
- Ran a simple query to take a look at my billing usage
- Set up a monitoring dashboard within GCP
- Perform a simple load test using ApacheBench or a similar tool
- Started up a Virtual Machine (VM) within GCP
- Installed Go on VM
- Created a simple Go program to create activity on VM
- Built a monitoring within GCP
- Used apache within the command line to run basic metric checks
- How to create, edit and save a file using command line
See GitHub Repository for full coding documentation.
I acquired numerous valuable skills related to cloud architecture from my peers, which enabled me to enhance my own projects. Among the key concepts I learned, containerizing environments using Docker emerged as one of the most significant. Docker provides a uniform platform for developers and system administrators to bundle, deploy, and operate applications. For the final project, I attempted to construct a Docker image to containerize my application, and my efforts proved successful. Additionally, my colleagues adeptly illustrated the importance of Software as a Service (SAAS) and the critical role of continuous integration (CI) systems in SAAS development and deployment, enabling developers to automate the testing and integration of new code changes into software applications. Finally, I discovered that traditional SQL servers update data tables more rapidly than BigQuery, as BigQuery has a restriction on the number of operations allowed per table.
MSDS 434 Analytics Application Development proved to be an incredibly valuable course during my time at Northwestern University. As cloud computing continues to be an increasingly essential skill for career advancement, I anticipate looking back on my work in this course with fondness. Professor Ostrowski's guidance throughout the course was excellent, providing clear explanations for each week's assignments and remaining responsive to all my questions. In hindsight, I wish I had taken this course before MSDS 436 Analytics Systems Engineering, as they share similar content, but Professor Ostrowski focused on GCP and a little AWS, whereas MSDS 436 was more self-directed and heavily emphasized AWS. Nevertheless, I feel much more well-rounded now, having gained experience in two distinct cloud environments and developing a fundamental understanding of how to leverage cloud computing resources.
My big takeaways from this class include the following:
- Learned how to stop billing completely by disabling your billing account to your project.
- The ability to use machine learning within BigQuery is not only powerful but also lowers the barrier of entry as you only need to know how to use SQL to run models.
- GitHub Actions is a cool resource to use for continuous integration and continuous delivery.
- There isn’t a set way you should or have to setup your cloud environments. There are several different tooling options that will help you build what you are looking to deploy and it’s good to know the pros and cons of each one to determine what fits your project's needs the best.
- Setting up building alerts and monitoring is a great way to make sure you stay within your budget and minimize the fear of running up a large bill on your credit card.
Created by: Nicholas Drake
Create date: 01/03/2023