This tutorial covers how to train a PyTorch model on AI Platform with Hyperparameter Tuning using a Custom Container (docker image). The PyTorch model predicts whether the given sonar signals are bouncing off a metal cylinder or off a cylindrical rock from UCI Machine Learning Repository.
Citation: Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
- Create your model
- Add argument parsing for the hyperparameter values. (These values are chosen for you in this tutorial)
- Add code to track the performance of your hyperparameter values.
- Create the docker image
- Build the docker image
- Test your docker image locally
- Deploy the docker image to Cloud Container Registry
- Submit your training job
Before you jump in, let’s cover some of the different tools you’ll be using to get your container up and running on AI Platform.
Google Cloud Platform lets you build and host applications and websites, store data, and analyze data on Google's scalable infrastructure.
AI Platform is a managed service that enables you to easily build machine learning models that work on any type of data, of any size.
Cloud Container Registry is a single place for your team to manage Docker images, perform vulnerability analysis, and decide who can access what with fine-grained access control.
Google Cloud Storage (GCS) is a unified object storage for developers and enterprises, from live data serving to data analytics/ML to data archiving.
Cloud SDK is a command line tool which allows you to interact with Google Cloud products. In order to run this tutorial, make sure that Cloud SDK is installed in the same environment as your Jupyter kernel.
Overview of Hyperparameter Tuning - Hyperparameter tuning takes advantage of the processing infrastructure of Google Cloud Platform to test different hyperparameter configurations when training your model.
docker is a containerization technology that allows developers to package their applications and dependencies easily so that they can be run anywhere.
- Create a project on GCP
- Create a Google Cloud Storage Bucket
- Enable AI Platform Training and Prediction, Container Registry, and Compute Engine APIs
- Install Cloud SDK
- Install docker
- Configure docker for Cloud Container Registry
- Install PyTorch [Optional: used if running locally]
- Install pandas [Optional: used if running locally]
These variables will be needed for the following steps.
Replace these variables:
# PROJECT_ID: your project's id. Use the PROJECT_ID that matches your Google Cloud Platform project.
export PROJECT_ID=YOUR_PROJECT_ID
# BUCKET_ID: the bucket id you created above.
export BUCKET_ID=BUCKET_ID
Additional variables:
# JOB_DIR: with the path to a Google Cloud Storage location to use for job output.
export JOB_DIR=gs://$BUCKET_ID/hp_tuning
# IMAGE_REPO_NAME: where the image will be stored on Cloud Container Registry
export IMAGE_REPO_NAME=sonar_hp_tuning_pytorch_container
# IMAGE_TAG: an easily identifiable tag for your docker image
export IMAGE_TAG=sonar_hp_tuning_pytorch
# IMAGE_URI: the complete URI location for Cloud Container Registry
export IMAGE_URI=gcr.io/$PROJECT_ID/$IMAGE_REPO_NAME:$IMAGE_TAG
# REGION: select a region from https://cloud.google.com/ml-engine/docs/regions
# or use the default '`us-central1`'. The region is where the model will be deployed.
export REGION=us-central1
# JOB_NAME: the name of your job running on AI Platform.
export JOB_NAME=hp_tuning_container_job_$(date +%Y%m%d_%H%M%S)
Here we provide an example model.py that trains a PyTorch model to predict whether the given sonar signals are bouncing off a metal cylinder or off a cylindrical rock. You will also find the code that handles argument parsing for the hyperparameter values as well as the code to track the performance for hptuning.
Open up the task.py to see exactly how the model is called during training.
data_utils.py is used to download / load the data and exports your trained model and uploads the model to Google Cloud Storage.
The dataset for the model is hosted originally at the UCI Machine Learning Repository. We've hosted the sonar dataset in Cloud Storage for use with this sample.
Open the Dockerfile to see how the Docker image is created that will run on Cloud AI Platform.
docker build -f Dockerfile -t $IMAGE_URI ./
docker run $IMAGE_URI --epochs 1
If it ran successfully, the output should look similar to: Accuracy: 58%
.
You should have configured docker to use Cloud Container Registry, found here.
docker push $IMAGE_URI
Open hptuning_config.yaml
to see how to configure the hyper parameters that are passed into the
model.
Submit the training job to AI Platform using gcloud
.
Note: You may need to install gcloud beta to submit the training job.
gcloud components install beta
gcloud beta ml-engine jobs submit training $JOB_NAME \
--job-dir=$JOB_DIR \
--region=$REGION \
--master-image-uri $IMAGE_URI \
--config=hptuning_config.yaml \
--scale-tier BASIC
You can view the logs for your training job:
- Go to https://console.cloud.google.com/
- Select "Logging" in left-hand pane
- Select "Cloud ML Job" resource from the drop-down
- In filter by prefix, use the value of $JOB_NAME to view the logs
View the contents of the destination model folder to verify that model file has indeed been uploaded to GCS.
Note: The model can take a few minutes to train and show up in GCS.
gsutil ls gs://$BUCKET_ID/hp_tuning/*