forked from Azure/azureml-examples
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
44 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Azure Custom Container Examples | ||
|
||
This directory contains examples on how to use custom containers to deploy endpoints to Azure. In each example, a Dockerfile defines an image that may be either an extension of an Azure-originated image such as the AzureML Minimal Inference image or a third party BYOC such as Triton. | ||
|
||
|
||
## Example Directory | ||
|
||
Each example consists of a script located in the [CLI](../../..) directory as well as an example subdirectory that contains assets and a README. | ||
|
||
|Example|Script|Description| | ||
|-------|------|---------| | ||
|[minimal/multimodel](minimal/multimodel)|[deploy-custom-container-minimal-multimodel](../../../deploy-custom-container-minimal-multimodel.sh)|Deploy multiple models to a single deployment by extending the AzureML Inference Minimal image.| | ||
|[minimal/single-model](minimal/single-model)|[deploy-custom-container-minimal-single-model](../../../deploy-custom-container-minimal-single-model.sh)|Deploy a single model by extending the AzureML Inference Minimal image.| | ||
|[mlflow/multideployment-scikit](mlflow/multideployment-scikit)|[deploy-custom-container-mlflow-multideployment-scikit](../../../deploy-custom-container-mlflow-multideployment-scikit.sh)|Deploy two MLFlow models with different Python requirements to two separate deployments behind a single endpoint using the AzureML Inference Minimal Image.| | ||
|[r/multimodel-plumber](r/multimodel-plumber)|[deploy-custom-container-r-multimodel-plumber](../../../deploy-custom-container-r-multimodel-plumber.sh)|Deploy three regression models to one endpoint using the Plumber R package| | ||
|[tfserving/half-plus-two](tfserving/half-plus-two)|[deploy-custom-container-tfserving-half-plus-two](../../../deploy-custom-container-tfserving-half-plus-two.sh)|Deploy a simple Half Plus Two model using a TFServing custom container using the standard model registration process.| | ||
|[tfserving/half-plus-two-integrated](tfserving/half-plus-two-integrated)|[deploy-custom-container-tfserving-half-plus-two-integrated](../../../deploy-custom-container-tfserving-half-plus-two-integrated.sh)|Deploy a simple Half Plus Two model using a TFServing custom container with the model integrated into the image.| | ||
|[torchserve/densenet](torchserve/densenet)|[deploy-custom-container-torchserve-densenet](../../../deploy-custom-container-torchserve-densenet.sh)|Deploy a single model using a Torchserve custom container.| | ||
|[torchserve/huggingface-textgen](torchserve/huggingface-textgen)|[deploy-custom-container-torchserve-huggingface-textgen](../../../deploy-custom-container-torchserve-huggingface-textgen.sh)|Deploy Huggingface models to an online endpoint and follow along with the Huggingface Transformers Torchserve example.| | ||
|[triton/single-model](triton/single-model)|[deploy-custom-container-triton-single-model](../../../deploy-custom-container-triton-single-model.sh)|Deploy a Triton model using a custom container| |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
24 changes: 24 additions & 0 deletions
24
cli/endpoints/online/custom-container/torchserve/huggingface-textgen/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Deploy Huggingface models using Torchserve | ||
|
||
This example demonstrates how to deploy Huggingface models to a managed online endpoint and follows along with the [Serving Huggingface Transformers using TorchServe](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers) example from HuggingFace. | ||
|
||
In this example we deploy a BERT model for text generation. | ||
|
||
## How to deploy | ||
|
||
This example can be run end-to-end using the `deploy-customcontainer-torchserve-huggingface-textgen.sh` script in the `CLI` folder. Torchserve is not required to be installed. | ||
|
||
## Image | ||
|
||
The image used for this example is defined in file `ts-hf-tg.dockerfile`. It uses `pytorch/torchserve` as a base image and overrides the default `CMD` so that the `model-store` points to the location of the mounted model upon initialization by referencing the `AZUREML_MODEL_DIR` env var and that the `models` loaded are defined in the custom env var `TORCHSERVE_MODELS`. | ||
|
||
## Model | ||
|
||
To prepare the model, the [Huggingface_Transformers](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers) directory is cloned from the `pytorch/serve` Github repo. We use the same image built for deployment above to prepare the model per the instructions in the Huggingface example. | ||
|
||
## Environment | ||
The environment is defined inline in the deployment yaml and references the ACR url of the image. The ACR must be associated with the workspace (or have a user-assigned managed identity that enables ACRPull) in order to successfully deploy. | ||
|
||
We define an additional env var called `TORCHSERVE_MODELS` which is used by the image upon initialization. | ||
|
||
The environment also contains an `inference_config` block that defines the `liveness`, `readiness`, and `scoring` routes by path and port. Because the images used in this examples are based on the AzureML Inference Minimal images, these values are the same as those in a non-BYOC deployment, however they must be included since we are now using a custom image. |
File renamed without changes.