Add readmes for CLI (Azure#1779)

vrxmike · Oct 20, 2022 · 358c65c · 358c65c
1 parent bda3ae4
commit 358c65c
Show file tree

Hide file tree

Showing 9 changed files with 44 additions and 0 deletions.
diff --git a/cli/endpoints/online/custom-container/README.md b/cli/endpoints/online/custom-container/README.md
@@ -0,0 +1,20 @@
+# Azure Custom Container Examples 
+
+This directory contains examples on how to use custom containers to deploy endpoints to Azure. In each example, a Dockerfile defines an image that may be either an extension of an Azure-originated image such as the AzureML Minimal Inference image or a third party BYOC such as Triton.
+
+
+## Example Directory
+
+Each example consists of a script located in the [CLI](../../..) directory as well as an example subdirectory that contains assets and a README.
+
+|Example|Script|Description| 
+|-------|------|---------|
+|[minimal/multimodel](minimal/multimodel)|[deploy-custom-container-minimal-multimodel](../../../deploy-custom-container-minimal-multimodel.sh)|Deploy multiple models to a single deployment by extending the AzureML Inference Minimal image.|
+|[minimal/single-model](minimal/single-model)|[deploy-custom-container-minimal-single-model](../../../deploy-custom-container-minimal-single-model.sh)|Deploy a single model by extending the AzureML Inference Minimal image.|
+|[mlflow/multideployment-scikit](mlflow/multideployment-scikit)|[deploy-custom-container-mlflow-multideployment-scikit](../../../deploy-custom-container-mlflow-multideployment-scikit.sh)|Deploy two MLFlow models with different Python requirements to two separate deployments behind a single endpoint using the AzureML Inference Minimal Image.|
+|[r/multimodel-plumber](r/multimodel-plumber)|[deploy-custom-container-r-multimodel-plumber](../../../deploy-custom-container-r-multimodel-plumber.sh)|Deploy three regression models to one endpoint using the Plumber R package|
+|[tfserving/half-plus-two](tfserving/half-plus-two)|[deploy-custom-container-tfserving-half-plus-two](../../../deploy-custom-container-tfserving-half-plus-two.sh)|Deploy a simple Half Plus Two model using a TFServing custom container using the standard model registration process.|
+|[tfserving/half-plus-two-integrated](tfserving/half-plus-two-integrated)|[deploy-custom-container-tfserving-half-plus-two-integrated](../../../deploy-custom-container-tfserving-half-plus-two-integrated.sh)|Deploy a simple Half Plus Two model using a TFServing custom container with the model integrated into the image.|
+|[torchserve/densenet](torchserve/densenet)|[deploy-custom-container-torchserve-densenet](../../../deploy-custom-container-torchserve-densenet.sh)|Deploy a single model using a Torchserve custom container.|
+|[torchserve/huggingface-textgen](torchserve/huggingface-textgen)|[deploy-custom-container-torchserve-huggingface-textgen](../../../deploy-custom-container-torchserve-huggingface-textgen.sh)|Deploy Huggingface models to an online endpoint and follow along with the Huggingface Transformers Torchserve example.| 
+|[triton/single-model](triton/single-model)|[deploy-custom-container-triton-single-model](../../../deploy-custom-container-triton-single-model.sh)|Deploy a Triton model using a custom container|
diff --git a/...om-container-minimal-multimodel-README.md → ...om-container/minimal/multimodel/README.md b/...om-container-minimal-multimodel-README.md → ...om-container/minimal/multimodel/README.md
diff --git a/...-container-minimal-single-model-README.md → ...-container/minimal/single-model/README.md b/...-container-minimal-single-model-README.md → ...-container/minimal/single-model/README.md
diff --git a/...r-mlflow-multideployment-scikit-README.md → ...r/mlflow/multideployment-scikit/README.md b/...r-mlflow-multideployment-scikit-README.md → ...r/mlflow/multideployment-scikit/README.md
diff --git a/...ner-tfserving-half-plus-two-integrated.md → ...erving/half-plus-two-integrated/README.md b/...ner-tfserving-half-plus-two-integrated.md → ...erving/half-plus-two-integrated/README.md
diff --git a/...ploy-custom-container-tfserving-README.md → ...ntainer/tfserving/half-plus-two/README.md b/...ploy-custom-container-tfserving-README.md → ...ntainer/tfserving/half-plus-two/README.md
diff --git a/...y-custom-container-torchserve-densenet.md → ...m-container/torchserve/densenet/README.md b/...y-custom-container-torchserve-densenet.md → ...m-container/torchserve/densenet/README.md
diff --git a/cli/endpoints/online/custom-container/torchserve/huggingface-textgen/README.md b/cli/endpoints/online/custom-container/torchserve/huggingface-textgen/README.md
@@ -0,0 +1,24 @@
+# Deploy Huggingface models using Torchserve
+
+This example demonstrates how to deploy Huggingface models to a managed online endpoint and follows along with the [Serving Huggingface Transformers using TorchServe](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers) example from HuggingFace. 
+
+In this example we deploy a BERT model for text generation. 
+
+## How to deploy
+
+This example can be run end-to-end using the `deploy-customcontainer-torchserve-huggingface-textgen.sh` script in the `CLI` folder. Torchserve is not required to be installed. 
+
+## Image
+
+The image used for this example is defined in file `ts-hf-tg.dockerfile`. It uses `pytorch/torchserve` as a base image and overrides the default `CMD` so that the `model-store` points to the location of the mounted model upon initialization by referencing the `AZUREML_MODEL_DIR` env var and that the `models` loaded are defined in the custom env var `TORCHSERVE_MODELS`. 
+
+## Model
+
+To prepare the model, the [Huggingface_Transformers](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers) directory is cloned from the `pytorch/serve` Github repo. We use the same image built for deployment above to prepare the model per the instructions in the Huggingface example. 
+
+## Environment
+The environment is defined inline in the deployment yaml and references the ACR url of the image. The ACR must be associated with the workspace (or have a user-assigned managed identity that enables ACRPull) in order to successfully deploy.
+
+We define an additional env var called `TORCHSERVE_MODELS` which is used by the image upon initialization. 
+
+The environment also contains an `inference_config` block that defines the `liveness`, `readiness`, and `scoring` routes by path and port. Because the images used in this examples are based on the AzureML Inference Minimal images, these values are the same as those in a non-BYOC deployment, however they must be included since we are now using a custom image. 
diff --git a/...e-model/deploy-custom-container-triton.md → ...m-container/triton/single-model/README.md b/...e-model/deploy-custom-container-triton.md → ...m-container/triton/single-model/README.md