Skip to content

Commit

Permalink
chore(docs): various docs updates (#282)
Browse files Browse the repository at this point in the history
  • Loading branch information
andy108369 authored Nov 7, 2023
1 parent c03683a commit a07cf01
Show file tree
Hide file tree
Showing 21 changed files with 36 additions and 50 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ kubectl label ns akash-services akash.network/name=akash-services akash.network=
## Install Helm

* Install Helm for Kubernetes package management if not done so prior
* Execute on these steps on a Kubernetes master node
* Execute on these steps on a Kubernetes control plane node

```
wget https://get.helm.sh/helm-v3.11.0-linux-amd64.tar.gz
Expand Down
6 changes: 1 addition & 5 deletions operator/provider/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ At ths point you would be left with a Kubernetes cluster that is ready to be a p

The recommended method for setting up a Kubernetes cluster is to use the [Kubespray](https://github.com/kubernetes-sigs/kubespray) project. This project is a collection of ansible resources for setting up a Kubernetes cluster.

The recommended minimum number of machines is three. One machine hosts the Kubernetes master node & provider, with the other machines hosting the compute nodes. It is possible however to provision a single-machine cluster if you choose to, but this configuration is not recommended.
The recommended minimum number of machines is three. One machine hosts the Kubernetes control plane node & provider, with the other machines hosting the compute nodes. It is possible however to provision a single-machine cluster if you choose to, but this configuration is not recommended.

### Getting kubespray & setup

Expand Down Expand Up @@ -67,7 +67,6 @@ Example single node configuration \(not recommended\)
```text
all:
vars:
cluster_id: "1.0.0.1"
ansible_user: root
hosts:
mynode:
Expand Down Expand Up @@ -97,14 +96,11 @@ This Ansible inventory file defines a single node file with a host named "mynode

The host is placed into the groups `kube-master`, `etcd`, `kube-node`, and `calico-rr`. All hosts in those groups are then placed into the `k8s-cluster` group. This is similar to a standard configuration for a Kubernetes cluster, but utilizes Calico for networking. Calico is the only networking solution for the Kubernetes cluster that Akash officially supports at this time.

One important detail is the value `cluster_id` which is assigned to all nodes by using the `all` group under `vars` in the YAML file. This value is used by Calico to uniquely identify a set of resources. For a more in depth explanation [see this document](https://hub.docker.com/r/calico/routereflector/).

Example multinode configuration, with a single master

```text
all:
vars:
cluster_id: "1.0.0.1"
ansible_user: root
hosts:
mymaster:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Each node that provides GPUs must be labeled correctly.

> _**NOTE**_ - these configurations should be completed on a Kubernetes master/control plane node
> _**NOTE**_ - these configurations should be completed on a Kubernetes control plane node
## Label Template

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,8 @@ Update the nvidia-container-runtime config in order to prevent `NVIDIA_VISIBLE_D
Make sure the config file `/etc/nvidia-container-runtime/config.toml` contains these line uncommmented and set to these values:

```
accept-nvidia-visible-devices-envvar-when-unprivileged = false
accept-nvidia-visible-devices-as-volume-mounts = true
accept-nvidia-visible-devices-envvar-when-unprivileged = false
```

> _**NOTE**_ - `/etc/nvidia-container-runtime/config.toml` is part of `nvidia-container-toolkit-base` package; so it won't override the customer-set parameters there since it is part of the `/var/lib/dpkg/info/nvidia-container-toolkit-base.conffiles`
Expand All @@ -120,10 +120,6 @@ In this step we add the NVIDIA runtime confguration into the Kubespray inventory

```
cat > ~/kubespray/inventory/akash/group_vars/all/akash.yml <<'EOF'
ansible_user: root
ansible_connection: ssh
containerd_additional_runtimes:
- name: nvidia
type: "io.containerd.runc.v2"
Expand All @@ -139,9 +135,6 @@ EOF
```
cd ~/kubespray
###Execute following command if not already in the Python virtual environment
###Creation and activation of virtual evironment described further here:
###https://docs.akash.network/providers/build-a-cloud-provider/kubernetes-cluster-for-akash-providers/step-2-install-ansible
source venv/bin/activate
ansible-playbook -i inventory/akash/hosts.yaml -b -v --private-key=~/.ssh/id_rsa cluster.yml
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ dmesg -T | grep -Ei 'nvidia|nvml|cuda|mismatch'

## Ensure Correct Version/Presence of NVIDIA Device Plugin

> _**NOTE**_ - conduct this verification step on the Kubernetes master node on which Helm was installed during your Akash Provider build
> _**NOTE**_ - conduct this verification step on the Kubernetes control plane node on which Helm was installed during your Akash Provider build
```
helm -n nvidia-device-plugin list
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

An Akash Provider leases compute to users launching new deployments. Follow the steps in this guide to build your own provider.

This guide uses a single Kubernetes master node.
This guide uses a single Kubernetes control plane node.

## Overview and links to the steps involved in Akash Provider Build:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ In this section we perform the following DNS adjustments:

> _**NOTE**_ - the DNS resolution issue & the Netplan fix addressed in this step are described [here](https://github.com/akash-network/support/issues/80)
Apply the following to all Kubernetes master and worker nodes.
Apply the following to all Kubernetes control plane and worker nodes.

> _**IMPORTANT**_ - Make sure you do not have any other config files under the `/etc/netplan` directory, otherwise it could cause unexpected networking issues / issues with booting up your node.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ To disable unattended upgrades, execute these two commands on your Kubernetes wo
echo -en 'APT::Periodic::Update-Package-Lists "0";\nAPT::Periodic::Unattended-Upgrade "0";\n' | tee /etc/apt/apt.conf.d/20auto-upgrades
apt remove unattended-upgrades
systemctl stop unattended-upgrades.service
systemctl mask unattended-upgrades.service
```

## Verify
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Create Provider namespaces on your Kubernetes cluster.

Run these commands from a Kubernetes master node which has kubectl access to cluster.
Run these commands from a Kubernetes control plane node which has kubectl access to cluster.

```
kubectl create ns akash-services
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

## Initial Guidance and Assumptions

* Conduct all steps in this guide from a Kubernetes master node in your Akash Provider cluster.
* Conduct all steps in this guide from a Kubernetes control plane node in your Akash Provider cluster.
* Guide assumes that your Akash Provider was installed via Helm Charts as detailed in this [guide](../../providers/build-a-cloud-provider/helm-based-provider-persistent-storage-enablement/).
* Guide assumes that the Kubernetes master node used has Helm installed. Refer to this [guide](../../providers/build-a-cloud-provider/akash-cloud-provider-build-with-helm-charts/step-4-helm-installation-on-kubernetes-node.md) step if a Helm install is needed. Return to this guide once Helm install is completed.
* Guide assumes that the Kubernetes control plane node used has Helm installed. Refer to this [guide](../../providers/build-a-cloud-provider/akash-cloud-provider-build-with-helm-charts/step-4-helm-installation-on-kubernetes-node.md) step if a Helm install is needed. Return to this guide once Helm install is completed.

## Caveats

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Use the steps covered in this section to verify the current settings of your run

> Steps in this section assume the provider was installed via Akash Provider Helm Charts.
>
> Conduct the steps from a Kubernetes master node with `kubectl` access to the cluster.
> Conduct the steps from a Kubernetes control plane node with `kubectl` access to the cluster.
## View Provider Current Settings

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Install Helm and add the Akash repo if not done previously by following the steps in this [guide](../akash-cloud-provider-build-with-helm-charts/step-4-helm-installation-on-kubernetes-node.md)**.**

All steps in this section should be conducted from the Kubernetes master node on which Helm has been installed.
All steps in this section should be conducted from the Kubernetes control plane node on which Helm has been installed.

Rook has published the following Helm charts for the Ceph storage provider:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Attribute Adjustments

* Conduct the steps in this section on the Kubernetes master from which the provider was configured in prior steps
* Conduct the steps in this section on the Kubernetes control plane from which the provider was configured in prior steps
* Adjust the following key-values pairs as necessary within the `provider-storage.yaml` file created below:
* Update the values of the `capabilities/storage/2/class` key to the correct class type (I.e. `beta2`). Reference the [Storage Class Types](storage-class-types.md) doc section for additional details.
* Update the region value from current `us-west` to an appropriate value such as `us-east` OR `eu-west`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## **Overview**

Akash leases are deployed via Kubernetes pods on provider clusters. This guide details the build of the provider’s Kubernetes control plane and worker nodes.
Akash leases are deployed as Kubernetes pods on provider clusters. This guide details the build of the provider’s Kubernetes control plane and worker nodes.

The setup of a Kubernetes cluster is the responsibility of the provider. This guide provides best practices and recommendations for setting up a Kubernetes cluster. This document is not a comprehensive guide and assumes pre-existing Kubernetes knowledge.

Expand All @@ -15,7 +15,7 @@ The Kubernetes instructions in this guide are intended for audiences that have t
* **Server Administration Skills** - necessary for setting up servers/network making up the Kubernetes cluster
* **Kubernetes Experience** - a base level of Kubernetes administration is highly recommended

Please consider using the [Praetor](../../community-solutions/praetor.md) application to build an Akash Provider for small and medium sized environments which require little customization.
> Please consider using the [Praetor](../../community-solutions/praetor.md) application to build an Akash Provider for small and medium sized environments which require little customization.
## Guide Sections

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,26 @@ We recommend using the Kubespray project to deploy a cluster. Kubespray uses Ans

The recommended minimum number of hosts is four for a production Provider Kubernetes cluster. This is meant to allow:

* Three hosts serving as a redundant control plane/master instances
* Three hosts serving as a redundant control plane (aka master)/etcd instances
* One host to serve as Kubernetes worker node to host provider leases.

### Additional Cluster Sizing Considerations

> While a production Kubernetes cluster would typically require three redundant control plane nodes, in circumstances in which the control plane node is easily recoverable the use of a single control instance for Akash providers should suffice.
* While a production Kubernetes cluster would typically require three redundant control plane nodes, in circumstances in which the control plane node is easily recoverable the use of a single control instance for Akash providers should suffice.

> The number of control plane nodes in the cluster should always be an odd number to allow the cluster to reach consensus.
* The number of control plane nodes in the cluster should always be an odd number to allow the cluster to reach consensus.

> We recommend running a single worker node per physical server as CPU is typically the largest resource bottleneck. The use of a single worker node allows larger workloads to be deployed on your provider.
* We recommend running a single worker node per physical server as CPU is typically the largest resource bottleneck. The use of a single worker node allows larger workloads to be deployed on your provider.

> If you intended to build a provider with persistent storage please refer to host requirements detailed [here](../helm-based-provider-persistent-storage-enablement/persistent-storage-requirements.md).
* If you intended to build a provider with persistent storage please refer to host storage requirements detailed [here](../helm-based-provider-persistent-storage-enablement/persistent-storage-requirements.md).

## Kubernetes Cluster Software/Hardware Requirements and Recommendations

### Software Recommendation

Akash Providers have been tested on Ubuntu 22.04 with the default Linux kernel. Your experience may vary should install be attempted using a different Linux distro/kernel.
Akash Providers have been tested on **Ubuntu 22.04** with the default Linux kernel. Your experience may vary should install be attempted using a different Linux distro/kernel.

### Kubernetes Master Node Requirements
### Kubernetes Control Plane Node Requirements

* Minimum Specs
* 2 CPU
Expand All @@ -36,7 +36,7 @@ Akash Providers have been tested on Ubuntu 22.04 with the default Linux kernel.
* 8 GB RAM
* 40 GB disk

### Kubernetes Work Node Requirements
### Kubernetes Worker Node Requirements

* Minimum Specs
* 4 CPU
Expand All @@ -48,7 +48,7 @@ Akash Providers have been tested on Ubuntu 22.04 with the default Linux kernel.

## **etcd Hardware Recommendations**

* Use this [guide](https://etcd.io/docs/v3.3/op-guide/hardware) to ensure Kubernetes control plane nodes meet the recommendations for hosting a `etcd` database.
* Use this [guide](https://etcd.io/docs/v3.5/op-guide/hardware) to ensure Kubernetes control plane nodes meet the recommendations for hosting a `etcd` database.

## **Kubespray Clone**

Expand All @@ -65,7 +65,7 @@ Obtain Kubespray and navigate into the created local directory:
```
cd ~
git clone -b v2.23.0 --depth=1 https://github.com/kubernetes-sigs/kubespray.git
git clone -b v2.23.1 --depth=1 https://github.com/kubernetes-sigs/kubespray.git
cd kubespray
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ ssh-copy-id -i ~/.ssh/id_rsa.pub <username>@<ip-address>

### **Example**

* Conduct this step for every Kubernetes master and worker node in the cluster
* Conduct this step for every Kubernetes control plane and worker node in the cluster

```
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
Expand All @@ -63,7 +63,7 @@ ssh -i ~/.ssh/id_rsa <username>@<ip-address>

### **Example**

* Conduct this access test for every Kubernetes master and worker node in the cluster
* Conduct this access test for every Kubernetes control plane and worker node in the cluster

```
ssh -i ~/.ssh/id_rsa [email protected]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,6 @@ vi ~/kubespray/inventory/akash/hosts.yaml
```

* Within the YAML file’s “all” stanza and prior to the “hosts” sub-stanza level - insert the following vars stanza
* We currently recommend disabling TCP offloading on vxlan.calico interface until calico fixes a related bug. This only applies when Calico is configured to use VXLAN encapsulation. Read more about this bug [here](https://github.com/kubernetes-sigs/kubespray/pull/9261#issuecomment-1248844913).

```
vars:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,6 @@ container_manager: containerd

## **gVisor Issue - No system-cgroup v2 Support**

> Skip if you are not using gVisor
If you are using a newer systemd version, your container will get stuck in ContainerCreating state on your provider with gVisor enabled. Please reference [this document](../gvisor-issue-no-system-cgroup-v2-support.md) for details regarding this issue and the recommended workaround.
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,6 @@ With inventory in place we are ready to build the Kubernetes cluster via Ansible
```
cd ~/kubespray
###Execute following command if not already in the Python virtual environment
###Creation and activation of virtual evironment described further here:
###https://docs.akash.network/providers/build-a-cloud-provider/kubernetes-cluster-for-akash-providers/step-2-install-ansible
source venv/bin/activate
ansible-playbook -i inventory/akash/hosts.yaml -b -v --private-key=~/.ssh/id_rsa cluster.yml
Expand All @@ -25,7 +22,7 @@ ansible-playbook -i inventory/akash/hosts.yaml -b -v --private-key=~/.ssh/id_rsa

Each node that provides GPUs must be labeled correctly.

> _**NOTE**_ - these configurations should be completed on a Kubernetes master/control plane node
> _**NOTE**_ - these configurations should be completed on a Kubernetes control plane node
### Label Template

Expand Down Expand Up @@ -71,7 +68,7 @@ Labels: akash.network/capabilities.gpu.vendor.nvidia.model.a4000=tru

## Additional Kubernetes Configurations

> _**NOTE**_ - these configurations should be completed on a Kubernetes master/control plane node
> _**NOTE**_ - these configurations should be completed on a Kubernetes control plane node
```
kubectl create ns akash-services
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ In this section we perform the following DNS adjustments:

> _**NOTE**_ - the DNS resolution issue & the Netplan fix addressed in this step are described [here](https://github.com/akash-network/support/issues/80)
Apply the following to all Kubernetes master and worker nodes.
Apply the following to all Kubernetes control plane and worker nodes.

> _**IMPORTANT**_ - Make sure you do not have any other config files under the `/etc/netplan` directory, otherwise it could cause unexpected networking issues / issues with booting up your node.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ Update the nvidia-container-runtime config in order to prevent `NVIDIA_VISIBLE_D
Make sure the config file `/etc/nvidia-container-runtime/config.toml` contains these line uncommmented and set to these values:

```
accept-nvidia-visible-devices-envvar-when-unprivileged = false
accept-nvidia-visible-devices-as-volume-mounts = true
accept-nvidia-visible-devices-envvar-when-unprivileged = false
```

> _**NOTE**_ - `/etc/nvidia-container-runtime/config.toml` is part of `nvidia-container-toolkit-base` package; so it won't override the customer-set parameters there since it is part of the `/var/lib/dpkg/info/nvidia-container-toolkit-base.conffiles`
Expand All @@ -25,10 +25,6 @@ In this step we add the NVIDIA runtime confguration into the Kubespray inventory

```
cat > ~/kubespray/inventory/akash/group_vars/all/akash.yml <<'EOF'
ansible_user: root
ansible_connection: ssh
containerd_additional_runtimes:
- name: nvidia
type: "io.containerd.runc.v2"
Expand Down

0 comments on commit a07cf01

Please sign in to comment.