forked from operate-first/odh-manifests
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use k8s jobs for cuda build chains deployment (opendatahub-io#472)
* Use k8s job for cuda-11.0.3 build chain deployment Add cuda-version=11.0.3 labels to the buildconfig and imagestream Signed-off-by: Landon LaSmith <[email protected]> * Restore default serviceaccount group namespace to image-pullers RoleBinding Signed-off-by: Landon LaSmith <[email protected]>
- Loading branch information
Showing
12 changed files
with
542 additions
and
339 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
36 changes: 19 additions & 17 deletions
36
jupyterhub/notebook-images/overlays/cuda-11.0.3/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,40 +1,42 @@ | ||
# CUDA Build Chain | ||
|
||
This overlay contains CUDA build chain to produce CUDA based images for TensorFlow, PyTorch, and Minimal jupyter notebooks. | ||
This overlay contains CUDA build chain to produce CUDA 11.0.3 based images for TensorFlow, PyTorch, and Minimal jupyter notebooks. | ||
|
||
## Version Details: | ||
When this overlay is applied, the BuildConfigs and Imagestreams required for the build chain will be created by an OpenShift [job](./cuda-build-job.yaml). Once the BuildConfigs and ImageStreams are deployed, the Open Data Hub operator will no longer reconcile or re-deploy these objects unless the job and buildchain objects are manually deleted by the user. Any changes you make will persist until you manually delete all associated build chain objects and job. | ||
|
||
``` | ||
notebook = ">=6.0.2" | ||
jupyterhub = ">=1.3" | ||
jupyterlab = ">=3.0.0" | ||
TensorFlow: v2.4.1 | ||
PyTorch: v1.8.0 | ||
CUDA: 11.0.3 | ||
``` | ||
## Build Details: | ||
|
||
- [CUDA-ubi8-build-chain](./cuda-ubi8-build-chain.yaml): This yaml contains CUDA build chain which creates the base image which is used by the jupyter notebook images. | ||
The CUDA build chain is stored in the cuda-build-chain [configMap](./cuda-buildchain.configmap.yaml). This configmap contains the yaml files for deploying the BuildConfigs and Imagestreams for the CUDA build chain and GPU notebooks. | ||
- `cuda-ubi8-build-chain.yaml`: This yaml contains CUDA build chain which creates the base image which is used by the jupyter notebook images. | ||
|
||
- [gpu-notebook](./gpu-notebook.yaml): This yaml contains CUDA build chain which creates the GPU supported jupyter notebook images like s2i-minimal-gpu-notebook, s2i-tensorflow-gpu-notebook, and s2i-pytorch-gpu-notebook. | ||
- `gpu-notebook.yaml`: This yaml contains CUDA build chain which creates the GPU supported jupyter notebook images like s2i-minimal-gpu-notebook, s2i-tensorflow-gpu-notebook, and s2i-pytorch-gpu-notebook. | ||
|
||
## Resource Requirements: | ||
|
||
**_NOTE:_** If users don't have quota restrictions then they can remove the resource requirements from the [gpu-notebook](./gpu-notebook.yaml) | ||
|
||
### Minimal GPU Notebook | ||
|
||
The Minimal notebook requires atleast **3GB** of memory while build-time as the minimal notebook installs `jupyterhub`, `jupyterlab` and `jupyter notebook` packages along with the supported extension that requires this much amount of memory. | ||
The Minimal notebook requires atleast **3GB** of memory while build-time as the minimal notebook installs `jupyterhub`, `jupyterlab` and `jupyter notebook` packages along with the supported extension that requires this much amount of memory. | ||
we have added **4GB** generously to avoid issues. | ||
|
||
### TensorFlow GPU Notebook | ||
|
||
The TensorFlow notebook requires atleast **6GB** of memory while build-time as the TensorFlow notebook installs `jupyterlab` and `jupyter notebook` supported extension and `jupyterlab build` requires this much amount of memory. | ||
The TensorFlow notebook requires atleast **6GB** of memory while build-time as the TensorFlow notebook installs `jupyterlab` and `jupyter notebook` supported extension and `jupyterlab build` requires this much amount of memory. | ||
we have added **6GB** generously to avoid issues. | ||
|
||
### PyTorch GPU Notebook | ||
|
||
The PyTorch notebook requires atleast **6GB** of memory while build-time as the PyTorch notebook installs `jupyterlab` and `jupyter notebook` supported extension and `jupyterlab build` requires this much amount of memory. | ||
The PyTorch notebook requires atleast **6GB** of memory while build-time as the PyTorch notebook installs `jupyterlab` and `jupyter notebook` supported extension and `jupyterlab build` requires this much amount of memory. | ||
we have added **6GB** generously to avoid issues. | ||
|
||
## Deleting CUDA build objects | ||
All the job and all objects created by the job have the `cuda-version = 11.0.3` label applied. This label can be used to purge all of the CUDA objects so that the operator can restore the original CUDA build chain | ||
|
||
``` | ||
oc delete build -l cuda-version=11.0.3 | ||
oc delete bc -l cuda-version=11.0.3 | ||
oc delete is -l cuda-version=11.0.3 | ||
oc delete cm -l cuda-version=11.0.3 | ||
oc delete job -l cuda-version=11.0.3 | ||
``` |
51 changes: 51 additions & 0 deletions
51
jupyterhub/notebook-images/overlays/cuda-11.0.3/cuda-build-job.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
apiVersion: batch/v1 | ||
kind: Job | ||
metadata: | ||
annotations: | ||
name: cuda-11-0-3-build | ||
labels: | ||
cuda-version: "$(cuda_version)" | ||
spec: | ||
backoffLimit: 2 | ||
template: | ||
spec: | ||
containers: | ||
- image: registry.redhat.io/openshift4/ose-cli:v4.7 | ||
volumeMounts: | ||
- name: cuda-ubi8-build-chain | ||
mountPath: /tmp/ | ||
|
||
# work around unwriteable HOME dir / for unprivileged pods causing OC commands to be slow in pods | ||
env: | ||
- name: HOME | ||
value: /tmp | ||
- name: BUILD_NAMESPACE | ||
valueFrom: | ||
fieldRef: | ||
fieldPath: | ||
metadata.namespace | ||
|
||
command: | ||
- /bin/bash | ||
- -c | ||
- | | ||
set -x | ||
echo "PWD: $PWD" | ||
oc create -n ${BUILD_NAMESPACE} -f /tmp/gpu-notebook.yaml | ||
oc create -n ${BUILD_NAMESPACE} -f /tmp/cuda-ubi8-build-chain.yaml | ||
imagePullPolicy: IfNotPresent | ||
name: cuda-11-0-3-build | ||
dnsPolicy: ClusterFirst | ||
restartPolicy: OnFailure | ||
serviceAccount: cuda-11.0.3-build-job | ||
serviceAccountName: cuda-11.0.3-build-job | ||
terminationGracePeriodSeconds: 30 | ||
volumes: | ||
- name: cuda-ubi8-build-chain | ||
configMap: | ||
name: cuda-build-chain | ||
items: | ||
- key: gpu-notebook.yaml | ||
path: gpu-notebook.yaml | ||
- key: cuda-ubi8-build-chain.yaml | ||
path: cuda-ubi8-build-chain.yaml |
80 changes: 80 additions & 0 deletions
80
jupyterhub/notebook-images/overlays/cuda-11.0.3/cuda-build-role.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
apiVersion: authorization.openshift.io/v1 | ||
kind: Role | ||
metadata: | ||
labels: | ||
app: cuda-11.0.3-build-job | ||
name: cuda-11.0.3-build-job | ||
rules: | ||
- apiGroups: | ||
- "" | ||
- build.openshift.io | ||
resources: | ||
- builds | ||
verbs: | ||
- get | ||
- list | ||
- watch | ||
- apiGroups: | ||
- "" | ||
- image.openshift.io | ||
resources: | ||
- imagestreams | ||
verbs: | ||
- create | ||
- patch | ||
- update | ||
- get | ||
- list | ||
- watch | ||
- apiGroups: | ||
- "" | ||
resources: | ||
- configmaps | ||
- secrets | ||
- events | ||
- persistentvolumeclaims | ||
- pods | ||
- services | ||
- endpoints | ||
verbs: | ||
- get | ||
- list | ||
- watch | ||
- apiGroups: | ||
- "" | ||
- template.openshift.io | ||
resources: | ||
- processedtemplates | ||
- templateconfigs | ||
- templateinstances | ||
- templates | ||
verbs: | ||
- create | ||
- delete | ||
- deletecollection | ||
- patch | ||
- update | ||
- apiGroups: | ||
- "" | ||
- template.openshift.io | ||
resources: | ||
- processedtemplates | ||
- templateconfigs | ||
- templateinstances | ||
- templates | ||
verbs: | ||
- get | ||
- list | ||
- watch | ||
- apiGroups: | ||
- build.openshift.io | ||
resources: | ||
- builds | ||
- buildconfigs | ||
verbs: | ||
- create | ||
- patch | ||
- update | ||
- get | ||
- list | ||
- watch |
12 changes: 12 additions & 0 deletions
12
jupyterhub/notebook-images/overlays/cuda-11.0.3/cuda-build-rolebinding.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: RoleBinding | ||
metadata: | ||
name: cuda-11.0.3-build-job | ||
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: Role | ||
name: cuda-11.0.3-build-job | ||
subjects: | ||
- kind: ServiceAccount | ||
name: cuda-11.0.3-build-job | ||
namespace: $(namespace) |
4 changes: 4 additions & 0 deletions
4
jupyterhub/notebook-images/overlays/cuda-11.0.3/cuda-build.serviceaccount.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
apiVersion: v1 | ||
kind: ServiceAccount | ||
metadata: | ||
name: cuda-11.0.3-build-job |
Oops, something went wrong.