New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Added docs for raw deployment autoscaling. #312

Open

andyi2it wants to merge 3 commits into kserve:main from andyi2it:issue-303

Contributor

andyi2it commented Nov 6, 2023 •

edited

Loading

"Fixes #303" Update Autoscaling docs for Raw deployment mode

Proposed Changes

netlify bot commented Nov 6, 2023 •

edited

Loading

✅ Deploy Preview for elastic-nobel-0aef7a ready!

Name	Link
🔨 Latest commit	`83ffb79`
🔍 Latest deploy log	https://app.netlify.com/sites/elastic-nobel-0aef7a/deploys/678db5b509a51e0008c961ad
😎 Deploy Preview	https://deploy-preview-312--elastic-nobel-0aef7a.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

kserve-oss-bot requested review from alexagriffith and theofpa

November 6, 2023 02:39

Collaborator

kserve-oss-bot commented Nov 6, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: andyi2it
To complete the pull request process, please assign theofpa after the PR has been reviewed.
You can assign the PR to them by writing /assign @theofpa in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

andyi2it force-pushed the issue-303 branch from 480f4d3 to 8135ecd Compare

November 6, 2023 09:04

yuzisun reviewed

View reviewed changes

docs/modelserving/autoscaling/autoscaling.md Outdated Show resolved Hide resolved

yuzisun reviewed

View reviewed changes

docs/modelserving/autoscaling/autoscaling.md Outdated Show resolved Hide resolved

yuzisun reviewed

View reviewed changes

docs/modelserving/autoscaling/autoscaling.md Outdated Show resolved Hide resolved

andyi2it added 2 commits

January 15, 2025 14:29


          Added docs for raw deployment autoscaling.

6cd625c

Signed-off-by: Andrews Arokiam <[email protected]>


          Schema order changed.

257ad87

Signed-off-by: Andrews Arokiam <[email protected]>

andyi2it force-pushed the issue-303 branch from 8135ecd to 257ad87 Compare

January 15, 2025 09:00


          code review changes

83ffb79

Signed-off-by: Andrews Arokiam <[email protected]>

spolti reviewed

View reviewed changes

docs/modelserving/autoscaling/raw_deployment_autoscalling.md

		@@ -0,0 +1,89 @@
		## Autoscaler for Kserve's Raw Deployment Mode

		KServe supports `RawDeployment` mode to enable `InferenceService` deployment with Kubernetes resources [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment), [`Service`](https://kubernetes.io/docs/concepts/services-networking/service), [`Ingress`](https://kubernetes.io/docs/concepts/services-networking/ingress) and [`Horizontal Pod Autoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale). Comparing to serverless deployment it unlocks Knative limitations such as mounting multiple volumes, on the other hand `Scale down and from Zero` is not supported in `RawDeployment` mode.

Contributor

spolti Jan 28, 2025

Suggested change

      
            KServe supports `RawDeployment` mode to enable `InferenceService` deployment with Kubernetes resources [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment), [`Service`](https://kubernetes.io/docs/concepts/services-networking/service), [`Ingress`](https://kubernetes.io/docs/concepts/services-networking/ingress) and [`Horizontal Pod Autoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale). Comparing to serverless deployment it unlocks Knative limitations such as mounting multiple volumes, on the other hand `Scale down and from Zero` is not supported in `RawDeployment` mode.
          
            KServe supports `RawDeployment` mode to enable `InferenceService` deployment with the following Kubernetes resources:
          
            - [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment)
          
            - [`Service`](https://kubernetes.io/docs/concepts/services-networking/service), [`Ingress`](https://kubernetes.io/docs/concepts/services-networking/ingress)
          
            - [`Horizontal Pod Autoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale). 
          
            Compared to Serverless deployment it unlocks Knative limitations such as mounting multiple volumes, but,  on the other hand,  `Scale down` and `from Zero` is not supported in `RawDeployment` mode.

spolti reviewed

View reviewed changes

docs/modelserving/autoscaling/raw_deployment_autoscalling.md


		### HPA in Raw Deployment

		When using Kserve with the `RawDeployment` mode, Knative is not installed. In this mode, if you deploy an `InferenceService`, Kserve uses Kubernetes’ Horizontal Pod Autoscaler (HPA) for autoscaling instead of Knative Pod Autoscaler (KPA). For more information about Kserve's autoscaler, you can refer [`this`](https://kserve.github.io/website/master/modelserving/v1beta1/torchserve/#knative-autoscaler)

Contributor

spolti Jan 28, 2025

Suggested change

      
            When using Kserve with the `RawDeployment` mode, Knative is not installed. In this mode, if you deploy an `InferenceService`, Kserve uses **Kubernetes’ Horizontal Pod Autoscaler (HPA)** for autoscaling instead of **Knative Pod Autoscaler (KPA)**. For more information about Kserve's autoscaler, you can refer [`this`](https://kserve.github.io/website/master/modelserving/v1beta1/torchserve/#knative-autoscaler)
          
            When using KServe with the `RawDeployment` mode, Knative is not required. In this mode, if you deploy an `InferenceService`, KServe uses **Kubernetes’ Horizontal Pod Autoscaler (HPA)** for autoscaling instead of **Knative Pod Autoscaler (KPA)**. For more information about KServe's autoscaler, you can refer [`this`](https://kserve.github.io/website/master/modelserving/v1beta1/torchserve/#knative-autoscaler) documentation.

spolti reviewed

View reviewed changes

docs/modelserving/autoscaling/raw_deployment_autoscalling.md

+                        storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
+                  ```
+              `ScaleTarget` specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler. you can refer [`this`](https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).

Contributor

spolti Jan 28, 2025

Suggested change

      
            `ScaleTarget` specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler. you can refer [`this`](https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
          
            `ScaleTarget` specifies the integer target value of the metric type the Autoscaler watches for. Concurrency and RPS (Requests Per Second) targets are supported by Knative Pod Autoscaler. you can refer [`this`](https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).

spolti reviewed

View reviewed changes

docs/modelserving/autoscaling/raw_deployment_autoscalling.md


		`ScaleTarget` specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler. you can refer [`this`](https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).

		`ScaleMetric` defines the scaling metric type watched by autoscaler. Possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler. you can refer [`this`](https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).

Contributor

spolti Jan 28, 2025

Suggested change

      
            `ScaleMetric` defines the scaling metric type watched by autoscaler. Possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler. you can refer [`this`](https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
          
            `ScaleMetric` defines the scaling metric type watched by autoscaler. Possible values are:
          
            - oncurrency
          
            - rps
          
            - cpu
          
            - memory.
          
            Concurrency and RPS are supported via Knative Pod Autoscaler. you can refer [`this`](https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).

spolti reviewed

View reviewed changes

docs/modelserving/autoscaling/raw_deployment_autoscalling.md


		### Disable HPA in Raw Deployment

		If you want to control the scaling of the deployment created by KServe inference service with an external tool like [`KEDA`](https://keda.sh/). You can disable KServe's creation of the HPA by replacing external value with autoscaler class annotaion that should be disable the creation of HPA

Contributor

spolti Jan 28, 2025

Suggested change

      
            If you want to control the scaling of the deployment created by KServe inference service with an external tool like [`KEDA`](https://keda.sh/). You can disable KServe's creation of the **HPA** by replacing **external** value with autoscaler class annotaion that should be disable the creation of HPA
          
            If you want to control the scaling of the deployment created by KServe inference service with an external tool like [`KEDA`](https://keda.sh/), you can disable KServe's **HPA** by replacing **external** value with autoscaler class annotation that should disable the creation of HPA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet