Skip to content

Commit

Permalink
Add readmes for CLI (Azure#1779)
Browse files Browse the repository at this point in the history
  • Loading branch information
xanwal authored Oct 20, 2022
1 parent bda3ae4 commit 358c65c
Show file tree
Hide file tree
Showing 9 changed files with 44 additions and 0 deletions.
20 changes: 20 additions & 0 deletions cli/endpoints/online/custom-container/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Azure Custom Container Examples

This directory contains examples on how to use custom containers to deploy endpoints to Azure. In each example, a Dockerfile defines an image that may be either an extension of an Azure-originated image such as the AzureML Minimal Inference image or a third party BYOC such as Triton.


## Example Directory

Each example consists of a script located in the [CLI](../../..) directory as well as an example subdirectory that contains assets and a README.

|Example|Script|Description|
|-------|------|---------|
|[minimal/multimodel](minimal/multimodel)|[deploy-custom-container-minimal-multimodel](../../../deploy-custom-container-minimal-multimodel.sh)|Deploy multiple models to a single deployment by extending the AzureML Inference Minimal image.|
|[minimal/single-model](minimal/single-model)|[deploy-custom-container-minimal-single-model](../../../deploy-custom-container-minimal-single-model.sh)|Deploy a single model by extending the AzureML Inference Minimal image.|
|[mlflow/multideployment-scikit](mlflow/multideployment-scikit)|[deploy-custom-container-mlflow-multideployment-scikit](../../../deploy-custom-container-mlflow-multideployment-scikit.sh)|Deploy two MLFlow models with different Python requirements to two separate deployments behind a single endpoint using the AzureML Inference Minimal Image.|
|[r/multimodel-plumber](r/multimodel-plumber)|[deploy-custom-container-r-multimodel-plumber](../../../deploy-custom-container-r-multimodel-plumber.sh)|Deploy three regression models to one endpoint using the Plumber R package|
|[tfserving/half-plus-two](tfserving/half-plus-two)|[deploy-custom-container-tfserving-half-plus-two](../../../deploy-custom-container-tfserving-half-plus-two.sh)|Deploy a simple Half Plus Two model using a TFServing custom container using the standard model registration process.|
|[tfserving/half-plus-two-integrated](tfserving/half-plus-two-integrated)|[deploy-custom-container-tfserving-half-plus-two-integrated](../../../deploy-custom-container-tfserving-half-plus-two-integrated.sh)|Deploy a simple Half Plus Two model using a TFServing custom container with the model integrated into the image.|
|[torchserve/densenet](torchserve/densenet)|[deploy-custom-container-torchserve-densenet](../../../deploy-custom-container-torchserve-densenet.sh)|Deploy a single model using a Torchserve custom container.|
|[torchserve/huggingface-textgen](torchserve/huggingface-textgen)|[deploy-custom-container-torchserve-huggingface-textgen](../../../deploy-custom-container-torchserve-huggingface-textgen.sh)|Deploy Huggingface models to an online endpoint and follow along with the Huggingface Transformers Torchserve example.|
|[triton/single-model](triton/single-model)|[deploy-custom-container-triton-single-model](../../../deploy-custom-container-triton-single-model.sh)|Deploy a Triton model using a custom container|
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Deploy Huggingface models using Torchserve

This example demonstrates how to deploy Huggingface models to a managed online endpoint and follows along with the [Serving Huggingface Transformers using TorchServe](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers) example from HuggingFace.

In this example we deploy a BERT model for text generation.

## How to deploy

This example can be run end-to-end using the `deploy-customcontainer-torchserve-huggingface-textgen.sh` script in the `CLI` folder. Torchserve is not required to be installed.

## Image

The image used for this example is defined in file `ts-hf-tg.dockerfile`. It uses `pytorch/torchserve` as a base image and overrides the default `CMD` so that the `model-store` points to the location of the mounted model upon initialization by referencing the `AZUREML_MODEL_DIR` env var and that the `models` loaded are defined in the custom env var `TORCHSERVE_MODELS`.

## Model

To prepare the model, the [Huggingface_Transformers](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers) directory is cloned from the `pytorch/serve` Github repo. We use the same image built for deployment above to prepare the model per the instructions in the Huggingface example.

## Environment
The environment is defined inline in the deployment yaml and references the ACR url of the image. The ACR must be associated with the workspace (or have a user-assigned managed identity that enables ACRPull) in order to successfully deploy.

We define an additional env var called `TORCHSERVE_MODELS` which is used by the image upon initialization.

The environment also contains an `inference_config` block that defines the `liveness`, `readiness`, and `scoring` routes by path and port. Because the images used in this examples are based on the AzureML Inference Minimal images, these values are the same as those in a non-BYOC deployment, however they must be included since we are now using a custom image.

0 comments on commit 358c65c

Please sign in to comment.