Snowflake for Data Science

Getting Started

🎥 Intro Video Walkthrough: Snowflake for ML Intro
🎥 End-to-End ML Ops in Snowflake: Live: End-to-End ML Ops in Snowflake
🔗 Regular 30-Day Trial: Sign Up
🔗 Student/Educator 120-Day Trial: Sign Up (Student)

Although we recorded videos, we are constantly making upgrades and additions to this repo, so the videos may differ slightly from what is in the repo. Overall they are the same but we will continue to upload more videos on any additions to the repo.

Configuration Setup

Create a .env file and populate it with your account details:

SNOWFLAKE_ACCOUNT = abc123.us-east-1
SNOWFLAKE_USER = username
SNOWFLAKE_PASSWORD = yourpassword
SNOWFLAKE_ROLE = sysadmin
SNOWFLAKE_WAREHOUSE = compute_wh
SNOWFLAKE_DATABASE = snowpark
SNOWFLAKE_SCHEMA = titanic

Utilize the environment.yml file to set up your Python environment for the demo:
- Examples in the terminal:
  - conda env create -f environment.yml
  - micromamba create -f environment.yml -y

Why we partner with Anaconda

Review of distributed Hyperparameter tuning benefits

Local run time 8 min 27 seconds

SnowflakeML run time 1 min 17 seconds (6.5x improvement in speed leveraging a Large WH)

Data Processing & ML Operations

Load & Transform Data

Execute the load_data notebook to accomplish the following:

Load the Titanic dataset from Seaborn, convert to uppercase, and save as CSV
Upload the CSV file to a Snowflake Internal Stage
Create a Snowpark DataFrame from the staged CSV
Write the Snowpark DataFrame to Snowflake as a table

Machine Learning Operations (snowml)

In the snowml notebook:

Generate a Snowpark DataFrame from the Titanic table
Validate and handle null values
Remove columns with high null counts and correlations
Adjust Fare datatype and impute categorical nulls
One-Hot Encode Categorical Values
Segregate data into Test & Train sets
Train an XGBOOST Classifier Model with hyperparameter tuning
Conduct predictions on the test set
Display Accuracy, Precision, and Recall metrics

Advanced MLOps with Live/Batch Inference & Streamlit

Following the load_data steps, utilize the deployment notebook to:

Create a Snowpark DataFrame from the Titanic table
Assess and eliminate columns with high null counts and correlated columns
Adjust Fare datatype and handle categorical nulls
One-Hot Encode Categorical Values
Split the data into Test & Train sets
Train an XGBOOST Classifier Model, optimizing with grid search
Display model accuracy and best parameters
Register the model in the model registry
Deploy the model as a vectorized UDF (User Defined Function)
Execute batch predictions on a table
Perform real-time predictions using Streamlit for interactive inference

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github/workflows		.github/workflows
notebooks		notebooks
streamlit		streamlit
streamlit_images		streamlit_images
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
environment.yml		environment.yml
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snowflake for Data Science

Getting Started

Configuration Setup

Data Processing & ML Operations

Load & Transform Data

Machine Learning Operations (snowml)

Advanced MLOps with Live/Batch Inference & Streamlit

About

Releases

Packages

Contributors 3

Languages

cromano8/Snowflake_ML_Intro

Folders and files

Latest commit

History

Repository files navigation

Snowflake for Data Science

Getting Started

Configuration Setup

Data Processing & ML Operations

Load & Transform Data

Machine Learning Operations (snowml)

Advanced MLOps with Live/Batch Inference & Streamlit

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages