Skip to content

Commit

Permalink
Merge pull request #1332 from run-ai/RUN-24217-TW-Support-node-affini…
Browse files Browse the repository at this point in the history
…ty-and-tolerations-for-Run-ai-cluster-services

Copied from Doc 360
  • Loading branch information
SherinDaher-Runai authored Jan 1, 2025
2 parents d44d8e5 + 769fe8f commit fe0ef11
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 3 deletions.
5 changes: 4 additions & 1 deletion docs/admin/config/advanced-cluster-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ The following configurations allow you to enable or disable features, control pe
| spec.global.syncServices (object) | Defines resource constraints uniformly for the entire set of Run:ai sync services. For more information, see Resource requests and limits of Pod and container| `{resources: {}}` |
| spec.global.workloadServices (object) | Defines resource constraints uniformly for the entire set of Run:ai workload services. For more information, see Resource requests and limits of Pod and container | `{resources: {}}` |
| spec.global.nodeAffinity.restrictScheduling (boolean) | Enables setting node roles and restricting workload scheduling to designated nodes| false |
| spec.global.affinity (object) | Sets the system nodes where Run:ai system-level services are scheduled. Using global.affinity will overwrite the [node roles](node-roles.md) set using the Administrator CLI (runai-adm). | Nodes labelled with `node-role.kubernetes.io/runai-system` |
| spec.global.tolerations (object) | Configure Kubernetes tolerations for Run:ai system-level services. | |
| spec.daemonSetsTolerations (object) | Configure Kubernetes tolerations for Run:ai daemonSets / engine. | |
| spec.runai-container-toolkit.logLevel (boolean) | Specifies the run:ai-container-toolkit logging level: either 'SPAM', 'DEBUG', 'INFO', 'NOTICE', 'WARN', or 'ERROR' | INFO |
| spec.global.core.dynamicFractions.enabled (boolean) | Enables dynamic GPU fractions | true |
| spec.global.core.swap.enabled (boolean) | Enables memory swap for GPU workloads | false |
Expand All @@ -46,7 +49,7 @@ The following configurations allow you to enable or disable features, control pe
| spec.pod-grouper.args.gangSchedulingKnative (boolean) | Enables gang scheduling for inference workloads.For backward compatibility with versions earlier than v2.19, change the value to false | true |
| spec.runai-scheduler.args.verbosity (int) | Configures the level of detail in the logs generated by the scheduler service | 4 |

### Exclude nodes from Run:ai
### Run:ai Managed Nodes

To include or exclude specific nodes from running workloads within a cluster managed by Run:ai, use the `nodeSelectorTerms` flag. For additional details, see [Kubernetes nodeSelector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector).

Expand Down
2 changes: 1 addition & 1 deletion docs/admin/config/node-roles.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ For optimal performance in production clusters, it is essential to avoid extensi
* Run:ai system-level services run on dedicated CPU-only nodes.
* Workloads that do not request GPU resources (e.g. Machine Learning jobs) are executed on CPU-only nodes.

The Run:ai cluster applies [Kubernetes Node Affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity){target=_blank} using node labels to manage scheduling for cluster services (system) and DaemonSets (worker).

## Prerequisites

Expand Down Expand Up @@ -39,6 +38,7 @@ To set a system role for a node in your Kubernetes cluster, follow these steps:

The `runai-adm` CLI will label the node and set relevant cluster configurations.

The Run:ai cluster applies [Kubernetes Node Affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity){target=_blank} using node labels to manage scheduling for cluster services (system).

!!! Warning
Do not assign a system node role to the Kubernetes master node. This may disrupt Kubernetes functionality, particularly if the Kubernetes API Server is configured to use port 443 instead of the default 6443.
Expand Down
2 changes: 1 addition & 1 deletion docs/developer/rest-auth.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Run:ai APIs are accessed using *bearer tokens*. A token can be obtained by creat
An application contains a client ID and a client secret. With the client credentials you can obtain a token and use it within subsequent API calls.

* To create applications for your organization, see [Applications](../admin/authentication/applications.md).
* To create your own user applications, see [User Applications](../Researcher/best-practices/user-applications.md).
* To create your own user applications, see [User Applications](user-applications.md).


## Request an API Token
Expand Down

0 comments on commit fe0ef11

Please sign in to comment.