From 993e72a67fcd42074bb5034c56f5b4a9f317121b Mon Sep 17 00:00:00 2001 From: Scott Carruthers Date: Tue, 19 Mar 2024 13:40:21 +0000 Subject: [PATCH] GITBOOK-1290: SHM SDL adds --- SUMMARY.md | 1 + .../shared-memory-shm-enablement.md | 130 ++++++++++++++++++ readme/stack-definition-language.md | 43 +++++- 3 files changed, 168 insertions(+), 6 deletions(-) create mode 100644 providers/build-a-cloud-provider/shared-memory-shm-enablement.md diff --git a/SUMMARY.md b/SUMMARY.md index 5401a6bb..a179899f 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -154,6 +154,7 @@ * [Verify Node Labels For Storage Classes](providers/build-a-cloud-provider/helm-based-provider-persistent-storage-enablement/label-nodes-for-storage-classes.md) * [Additional Verifications](providers/build-a-cloud-provider/helm-based-provider-persistent-storage-enablement/verifications.md) * [Teardown](providers/build-a-cloud-provider/helm-based-provider-persistent-storage-enablement/teardown.md) + * [Shared Memory (SHM) Enablement](providers/build-a-cloud-provider/shared-memory-shm-enablement.md) * [Akash Provider Bid Pricing Calculation](providers/build-a-cloud-provider/akash-provider-bid-pricing/README.md) * [Download Git Repository](providers/build-a-cloud-provider/akash-provider-bid-pricing/download-git-repository.md) * [Calculate Pricing](providers/build-a-cloud-provider/akash-provider-bid-pricing/example-command-use.md) diff --git a/providers/build-a-cloud-provider/shared-memory-shm-enablement.md b/providers/build-a-cloud-provider/shared-memory-shm-enablement.md new file mode 100644 index 00000000..d719c10f --- /dev/null +++ b/providers/build-a-cloud-provider/shared-memory-shm-enablement.md @@ -0,0 +1,130 @@ +# Shared Memory (SHM) Enablement + +## Update Provider Configuration File + +Providers must be updated with attributes in order to bid on the SHM deplloyments. + +> _**NOTE**_ - in the Akash Provider build documentation a `provider.yaml` file was created and which stores provider attribute/other settings. In this section we will update that `provider.yaml` file with SHM related attributes. The remainder of the pre-existing file should be left unchanged. + +### Access Provider Configuration File + +* Steps included in this code block create the necessary `provider.yaml` file in the expected directory + +``` +cd ~ + +cd provider + +vim provider.yaml +``` + +### **Update the Provider YAML File With SHM Attribute** + +* When the `provider.yaml` file update is complete look like the following example. + +``` + - key: capabilities/storage/3/class + value: ram + - key: capabilities/storage/3/persistent + value: false +``` + +#### Example Provider Config File + +``` + +--- +from: "$ACCOUNT_ADDRESS" +key: "$(cat ~/key.pem | openssl base64 -A)" +keysecret: "$(echo $KEY_PASSWORD | openssl base64 -A)" +domain: "$DOMAIN" +node: "$AKASH_NODE" +withdrawalperiod: 12h +attributes: + - key: host + value: akash + - key: tier + value: community + - key: capabilities/storage/3/class + value: ram + - key: capabilities/storage/3/persistent + value: false +``` + +## Update Provider Via Helm + +``` + +helm upgrade --install akash-provider akash/provider -n akash-services -f provider.yaml \ +--set bidpricescript="$(cat /root/provider/price_script_generic.sh | openssl base64 -A)" +``` + +## Verify Health of Akash Provider + +Use the following command to verify the health of the Akash Provider and Hostname Operator pods + +``` +kubectl get pods -n akash-services +``` + +#### Example/Expected Output + +``` +root@node1:~/provider# kubectl get pods -n akash-services +NAME READY STATUS RESTARTS AGE +akash-hostname-operator-5c59757fcc-kt7dl 1/1 Running 0 17s +akash-provider-0 1/1 Running 0 59s +``` + +## Verify Provider Attributes On Chain + +* In this step we ensure that your updated Akash Provider Attributes have been updated on the blockchain. Ensure that the GPU model related attributes are now in place via this step. + +> _**NOTE**_ - conduct this verification from your Kubernetes control plane node + +``` +# Ensure that a RPC node environment variable is present for query +export AKASH_NODE=https://rpc.akashnet.net:443 + +# Replace the provider address with your own value +provider-services query provider get +``` + +#### Example/Expected Output + +
provider-services query provider get akash1mtnuc449l0mckz4cevs835qg72nvqwlul5wzyf
+
+attributes:
+- key: region
+  value: us-central
+- key: host
+  value: akash
+- key: tier
+  value: community
+- key: organization
+  value: akash test provider
+- key: capabilities/storage/3/class
+  value: ram
+- key: capabilities/storage/3/persistent
+  value: false
+host_uri: https://provider.akashtestprovider.xyz:8443
+info:
+  email: ""
+  website: ""
+owner: akash1mtnuc449l0mckz4cevs835qg72nvqwlul5wzyf
+
+ +## Verify Akash Provider Image + +Verify the Provider image is correct by running this command: + +``` +kubectl -n akash-services get pod akash-provider-0 -o yaml | grep image: | uniq -c +``` + +#### Expected/Example Output + +``` +root@node1:~/provider# kubectl -n akash-services get pod akash-provider-0 -o yaml | grep image: | uniq -c + 4 image: ghcr.io/akash-network/provider:0.5.4 +``` diff --git a/readme/stack-definition-language.md b/readme/stack-definition-language.md index c4d674cd..697690c9 100644 --- a/readme/stack-definition-language.md +++ b/readme/stack-definition-language.md @@ -13,6 +13,7 @@ A complete deployment has the following sections: * [persistent storage](../features/persistent-storage/) * [gpu support](stack-definition-language.md#gpu-support) * [stable payment](stack-definition-language.md#stable-payment) +* [shared memory (shm)](stack-definition-language.md#shared-memory-shm) An example deployment configuration can be found [here](https://github.com/akash-network/docs/tree/62714bb13cfde51ce6210dba626d7248847ba8c1/sdl/deployment.yaml). @@ -200,7 +201,7 @@ This says that the 20 instances of the `web` service should be deployed to a dat ## GPU Support -GPUs can be added to your workload via inclusion the compute profile section. The placement of the GPU stanza can be viewed in the full compute profile example shown below. +GPUs can be added to your workload via inclusion the compute profile section. The placement of the GPU stanza can be viewed in the full compute profile example shown below. > _**NOTE**_ - currently the only accepted vendor is `nvidia` but others will be added soon @@ -232,7 +233,7 @@ To view an example GPU enabled SDL in full for greater context, review this [exa #### Model Specification Optional -The declaration of a GPU model is optional in the SDL. If your deployment does not require a specific GPU model, leave the model declaration blank as seen in the following example. +The declaration of a GPU model is optional in the SDL. If your deployment does not require a specific GPU model, leave the model declaration blank as seen in the following example. ``` gpu: @@ -244,9 +245,7 @@ The declaration of a GPU model is optional in the SDL. If your deployment does #### Multiple Models Declared -If your deployment is optimized to run on multiple GPU models, include the appropriate list of models as seen in the following example. In this usage, any Akash provider that has a model in the list will bid on the deployment. - - +If your deployment is optimized to run on multiple GPU models, include the appropriate list of models as seen in the following example. In this usage, any Akash provider that has a model in the list will bid on the deployment. ``` gpu: @@ -276,6 +275,38 @@ Use of Stable Payments is supported in the Akash SDL and is declared in the plac amount: 100 ``` -#### Full GPU SDL Example +#### Full GPU SDL Example To view an example Stable Payment enabled SDL in full for greater context, review this [example](https://gist.github.com/chainzero/040d19bdb20d632009b8ae206fb548f5). + +## Shared Memory (SHM) + +A new storage class named `ram` may be added to the SDL to enable shared memory access for multiple services running in the same container. + +> _**NOTE**_ - SHM must not be persistent. The SDL validations will error if SHM is defined as persistent. + +``` +profiles: + compute: + grafana: + resources: + cpu: + units: 1 + memory: + size: 1Gi + storage: + - size: 512Mi + - name: data + size: 1Gi + attributes: + persistent: true + class: beta2 + - name: shm + size: 1Gi + attributes: + class: ram +``` + +#### Full SHM SDL Example + +To view an example SHM enabled SDL in full for greater context, review this[ example](https://gist.github.com/chainzero/0dea9f2e1c4241d2e4d490b37153ec86).