Merge pull request #1332 from run-ai/RUN-24217-TW-Support-node-affini…

…ty-and-tolerations-for-Run-ai-cluster-services Copied from Doc 360
run-ai · Jan 1, 2025 · fe0ef11 · fe0ef11
2 parents d44d8e5 + 769fe8f
commit fe0ef11
Show file tree

Hide file tree

Showing 3 changed files with 6 additions and 3 deletions.
diff --git a/docs/admin/config/advanced-cluster-config.md b/docs/admin/config/advanced-cluster-config.md
@@ -28,6 +28,9 @@ The following configurations allow you to enable or disable features, control pe
 | spec.global.syncServices (object) | Defines resource constraints uniformly for the entire set of Run:ai sync services. For more information, see Resource requests and limits of Pod and container| `{resources: {}}` |
 | spec.global.workloadServices (object) | Defines resource constraints uniformly for the entire set of Run:ai workload services. For more information, see Resource requests and limits of Pod and container | `{resources: {}}` |
 | spec.global.nodeAffinity.restrictScheduling (boolean) | Enables setting node roles and restricting workload scheduling to designated nodes| false |
+| spec.global.affinity (object) | Sets the system nodes where Run:ai system-level services are scheduled. Using global.affinity will overwrite the [node roles](node-roles.md) set using the Administrator CLI (runai-adm). | Nodes labelled with `node-role.kubernetes.io/runai-system` |
+| spec.global.tolerations (object) | Configure Kubernetes tolerations for Run:ai system-level services. | | 
+| spec.daemonSetsTolerations (object) | Configure Kubernetes tolerations for Run:ai daemonSets / engine. | |  
 | spec.runai-container-toolkit.logLevel (boolean) | Specifies the run:ai-container-toolkit logging level: either 'SPAM', 'DEBUG', 'INFO', 'NOTICE', 'WARN', or 'ERROR' | INFO |
 | spec.global.core.dynamicFractions.enabled (boolean) | Enables dynamic GPU fractions | true |
 | spec.global.core.swap.enabled (boolean) | Enables memory swap for GPU workloads | false |
@@ -46,7 +49,7 @@ The following configurations allow you to enable or disable features, control pe
 | spec.pod-grouper.args.gangSchedulingKnative (boolean) | Enables gang scheduling for inference workloads.For backward compatibility with versions earlier than v2.19, change the value to false | true |
 | spec.runai-scheduler.args.verbosity (int) | Configures the level of detail in the logs generated by the scheduler service | 4 |
 
-### Exclude nodes from Run:ai
+### Run:ai Managed Nodes
 
 To include or exclude specific nodes from running workloads within a cluster managed by Run:ai, use the `nodeSelectorTerms` flag. For additional details, see [Kubernetes nodeSelector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector).
 

diff --git a/docs/admin/config/node-roles.md b/docs/admin/config/node-roles.md
@@ -7,7 +7,6 @@ For optimal performance in production clusters, it is essential to avoid extensi
 * Run:ai system-level services run on dedicated CPU-only nodes.
 * Workloads that do not request GPU resources (e.g. Machine Learning jobs) are executed on CPU-only nodes.
 
-The Run:ai cluster applies [Kubernetes Node Affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity){target=_blank} using node labels to manage scheduling for cluster services (system) and DaemonSets (worker).
 
 ## Prerequisites
 
@@ -39,6 +38,7 @@ To set a system role for a node in your Kubernetes cluster, follow these steps:
 
 The `runai-adm` CLI will label the node and set relevant cluster configurations.
 
+The Run:ai cluster applies [Kubernetes Node Affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity){target=_blank} using node labels to manage scheduling for cluster services (system).
 
 !!! Warning
     Do not assign a system node role to the Kubernetes master node. This may disrupt Kubernetes functionality, particularly if the Kubernetes API Server is configured to use port 443 instead of the default 6443.

diff --git a/docs/developer/rest-auth.md b/docs/developer/rest-auth.md
@@ -8,7 +8,7 @@ Run:ai APIs are accessed using *bearer tokens*. A token can be obtained by creat
 An application contains a client ID and a client secret. With the client credentials you can obtain a token and use it within subsequent API calls.
 
 * To create applications for your organization, see [Applications](../admin/authentication/applications.md).
-* To create your own user applications, see [User Applications](../Researcher/best-practices/user-applications.md).
+* To create your own user applications, see [User Applications](user-applications.md).
 
 
 ## Request an API Token