Skip to content
This repository has been archived by the owner on Dec 19, 2024. It is now read-only.

Commit

Permalink
Merge pull request #5 from QGreenland-Net/use-custom-image
Browse files Browse the repository at this point in the history
Use custom image
  • Loading branch information
trey-stafford authored Apr 18, 2024
2 parents 53dd96a + 7d94f81 commit 547a431
Show file tree
Hide file tree
Showing 10 changed files with 227 additions and 95 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
runinfo*
runinfo*
src/
6 changes: 5 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
FROM mambaorg/micromamba:1.5.8 AS micromamba

USER root
RUN apt update && apt install -y git
USER $MAMBA_USER

COPY --chown=$MAMBA_USER:$MAMBA_USER . .

RUN micromamba install --yes --name "base" --file "environment.yml"
RUN micromamba clean --all --yes

# ENV PATH "/opt/conda/bin:${PATH}"
ENV PATH "/opt/conda/bin:${PATH}"
97 changes: 82 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,42 +8,81 @@ processing pipelines in k8s:
> laptops to supercomputers.

## Configuring Parsl on Kubernetes
## Cluster setup

The
[Parsl user guide's "Kubernetes Clusters" section](https://parsl.readthedocs.io/en/stable/userguide/configuring.html#kubernetes-clusters)
is a good place to start.
The following need to be set up on the target cluster to make this work.

> [!NOTE]
> Some of this may be "done for you" on the ADC cluster, but you'll still need to set up
> your local (e.g. Rancher Desktop) cluster.
### `qgnet` Namespace

```bash
kubectl create namespace qgnet
```

Update your config to add a context referencing the new namespace:

```bash
kubectl config --kubeconfig={config-file-path} \
set-context {context-name} \
--cluster={cluster-name} \
--namespace=qgnet \
--user={user}
```

For a local Rancher Desktop cluster, this looks like:

```bash
kubectl config --kubeconfig="${HOME}/.kube/config" \
set-context rancher-desktop-qgnet \
--cluster=rancher-desktop \
--namespace=qgnet \
--user=rancher-desktop
```


### `qgnet` ServiceAccount

The target cluster should be configured to have a `qgnet` service account with
permissions for managing pods (creation/deletion).

```bash
kubectl apply -f k8s/serviceaccount.yml
```

> [!NOTE]
> TODO
> Role bindings are based on
> [DataONE example](https://github.com/DataONEorg/k8s-cluster/tree/main/authorization).

## Submitting jobs

First, select the appropriate k8s context. E.g., to run locally:

```
kubectl config use-config rancher-desktop
```bash
kubectl config use-context rancher-desktop-qgnet
```

to run on the `dev-qgnet` k8s cluster:
To run on the remote `dev-qgnet` k8s cluster:

> [!WARNING]
> Deployment to `dev-qgnet` currently does not work. See
> https://github.com/QGreenland-Net/parsl-exploration/issues/3

```
kubectl config use-config dev-qgnet
```bash
kubectl config use-context dev-qgnet
```


Submit the example job defined in `run.py` with:

```
```bash
python run.py
```

> [!NOTE]
> The local version of python and parsl must match the remote version!
Expand All @@ -52,7 +91,7 @@ python run.py
> up properly. This may require manual cleanup!

### Submitting jobs on the ADC cluster
### Submitting jobs to a remote cluster

Running a Parsl "job" on a remote cluster has a frustrating complexity: The remote
workers need to be able to connect back to the host running the Parsl program. If you're
Expand All @@ -62,6 +101,24 @@ The workaround we're using is to submit a Kubernetes Job that runs the Parsl ini
program from a ConfigMap. See `run-on-remote-cluster.sh` and `job.yml` for an
example of this.

> [!IMPORTANT]
> We currently using our own fork of parsl to add support for getting "in-cluster"
> configuration. See: https://github.com/Parsl/parsl/pull/3357

### Viewing job output file(s)

Check [Inspect a Kubernetes PersistentVolumeClaim by Frank
Sauerburger](https://frank.sauerburger.io/2021/12/01/inspect-k8s-pvc.html) for an
excellent tutorial.

* `kubectl apply -f k8s/pvc-inspector.yml`
* You may need to wait a few seconds...
* `kubectl exec -it pvc-inspector -- sh`
* Inspect `/pvc` directory
* Quit
* `kubectl delete pod pvc-inspector`


## Troubleshooting

Expand All @@ -70,12 +127,22 @@ example of this.
Some failure states result in pods getting stuck in a restart loop that do not
get cleaned up automatically. To find pods in this state:

```
```bash
kubectl get pods
```

To remove a pod that is stuck:

```bash
kubectl delete pod {pod-name}
```
kubectl delete pod <pod name>
```

### File not found error starting Rancher Desktop

You must have a valid `$KUBECONFIG` path. Paths including `~` or paths to files which do
not exist will cause Rancher to fail starting the cluster.


## Reference

* The [Parsl user guide's "Kubernetes Clusters" section](https://parsl.readthedocs.io/en/stable/userguide/configuring.html#kubernetes-clusters) is a good place to start.
14 changes: 11 additions & 3 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@ channels:
- "nodefaults"
dependencies:
- python ~=3.12.0
- parsl-with-kubernetes ~=2024.4
# TODO: Remove once this build is completed: https://github.com/conda-forge/parsl-feedstock/pull/72
- python-kubernetes
# We want to allow incrementing to 2024.5, but forbid 2014.4.8, which had a
# dependency specification problem:
# https://github.com/conda-forge/parsl-feedstock/pull/72
- parsl-with-kubernetes ~=2024.4,>=2014.4.15

# We forked Parsl to get around this issue:
# https://github.com/Parsl/parsl/pull/3357
# TODO: Once the PR above is merged, ensure built for conda-forge and move
# parsl back out of the pip section
- pip:
- "--editable=git+https://github.com/QGreenland-Net/parsl.git@k8s-use-incluser-config-fallback#egg=parsl"
37 changes: 0 additions & 37 deletions job.yml

This file was deleted.

52 changes: 52 additions & 0 deletions k8s/job.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
apiVersion: "batch/v1"
kind: "Job"
metadata:
name: "parsl-init"
spec:
# TODO: when completions is 1, parallelism must be 1, but do we need to specify it?
parallelism: 1
completions: 1
# TODO: Supported in k8s 1.23, but ADC has 1.22; this would be nice :)
# ttlSecondsAfterFinished: 60
template:
metadata:
name: "parsl-init"
spec:
serviceAccountName: "qgnet"
automountServiceAccountToken: true
volumes:
# This is how the Python script gets to the cluster
- name: "parsl-init-script-volume"
configMap:
name: "parsl-init-script"
# Data storage:
- name: "data"
persistentVolumeClaim:
claimName: "qgnet-pvc-test-1"
containers:
- name: "parsl-init"
image: "ghcr.io/qgreenland-net/parsl-exploration:v0.1.1"
volumeMounts:
- name: "parsl-init-script-volume"
mountPath: "/parsl-init-script"
- name: "data"
mountPath: "/data"
command:
- "bash"
- "-c"
- "micromamba run -n base python /parsl-init-script/run.py"
restartPolicy: "Never"
---
apiVersion: "v1"
kind: "PersistentVolumeClaim"
metadata:
name: "qgnet-pvc-test-1"
namespace: "qgnet"
spec:
accessModes:
- "ReadWriteOnce"
volumeMode: "Filesystem"
resources:
requests:
storage: "100Mi"
storageClassName: "csi-rbd-sc"
17 changes: 17 additions & 0 deletions k8s/pvc-inspector.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: "v1"
kind: "Pod"
metadata:
name: "pvc-inspector"
spec:
containers:
- image: "busybox"
name: "pvc-inspector"
command: ["tail"]
args: ["-f", "/dev/null"]
volumeMounts:
- mountPath: "/pvc"
name: "pvc-mount"
volumes:
- name: "pvc-mount"
persistentVolumeClaim:
claimName: "qgnet-pvc-test-1"
38 changes: 38 additions & 0 deletions k8s/serviceaccount.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: qgnet
namespace: qgnet

---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: qgnet
namespace: qgnet
rules:
- apiGroups: ["", "networking.k8s.io", "extensions", "apps","autoscaling"]
#apiGroups: ["", "networking.k8s.io", "extensions", "apps","autoscaling", "rbac.authorization.k8s.io"]
resources: ["*"]
verbs: ["*"]
- apiGroups: ["batch"]
resources:
- jobs
- cronjobs
verbs: ["*"]

---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: qgnet
namespace: qgnet
subjects:
- kind: ServiceAccount
name: qgnet
namespace: qgnet
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: qgnet
2 changes: 1 addition & 1 deletion run-on-remote-cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ kubectl create configmap parsl-init-script --from-file run.py \

# Submit a "Job" to the cluster which runs our script
# TODO: Should we delete any pre-existing job? We're manually doing `kubectl delete` now.
kubectl apply -f job.yml
kubectl apply -f k8s/job.yml


# TODO: Can we also attach to monitor `kubectl describe job` or something?
Loading

0 comments on commit 547a431

Please sign in to comment.