NOTE - This step should be completed from the Kubespray host only
With inventory in place we are ready to build the Kubernetes cluster via Ansible.
NOTE - the cluster creation may take several minutes to complete
- If the Kubespray process fails or is interpreted, run the Ansible playbook again and it will complete any incomplete steps on the subsequent run
cd ~/kubespray
source venv/bin/activate
ansible-playbook -i inventory/akash/hosts.yaml -b -v --private-key=~/.ssh/id_rsa cluster.yml
Each node that provides GPUs must be labeled correctly.
NOTE - these configurations should be completed on a Kubernetes control plane node
- Use this label template in the
kubectl label
command in the subsequent Label Appliction sub-section below
NOTE - please do not assign any value other than
true
to these labels. Setting the value tofalse
may have unexpected consequences on the Akash provider. If GPU resources are removed from a node, simply remove the Kubernetes label completely from that node.
akash.network/capabilities.gpu.vendor.<vendor name>.model.<model name>=true
NOTE - if you are unsure of the
<node-name>
to be used in this command - issuekubectl get nodes
from one of your Kubernetes control plane nodes to obtain via theNAME
column of this command output
kubectl label node <node-name> <label>
NOTE - issue this command/label application for all nodes hosting GPU resources
kubectl label node node1 akash.network/capabilities.gpu.vendor.nvidia.model.a4000=true
###Apply labels
root@node1:~/provider# kubectl label node node1 akash.network/capabilities.gpu.vendor.nvidia.model.a4000=true
node/node1 labeled
###Verification of applied labels
root@node1:~/provider# kubectl describe node node1 | grep -A10 Labels
Labels: akash.network/capabilities.gpu.vendor.nvidia.model.a4000=true
...
...
NOTE - these configurations should be completed on a Kubernetes control plane node
kubectl create ns akash-services
kubectl label ns akash-services akash.network/name=akash-services akash.network=true
kubectl create ns lease
kubectl label ns lease akash.network=true