Skip to content

Commit

Permalink
deploying to kubernetes guide
Browse files Browse the repository at this point in the history
  • Loading branch information
jamiedemaria committed Aug 27, 2024
1 parent 8478a26 commit 7869057
Show file tree
Hide file tree
Showing 9 changed files with 1,494 additions and 1 deletion.
219 changes: 218 additions & 1 deletion docs/docs-beta/docs/guides/deployment/kubernetes.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,221 @@
---
title: "Deploy to Kubernetes"
sidebar_position: 21
---
---

TODO - INTRODUCTION

Check failure on line 6 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.todos] Remove or comment out TODOs. Raw Output: {"message": "[Dagster.todos] Remove or comment out TODOs.", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 6, "column": 1}}}, "severity": "ERROR"}

## What you'll learn

- How to use Kubernetes and Helm to deploy Dagster

<details>
<summary>Prerequisites</summary>

To follow the steps in this guide, you'll need:

- Familiarity with [Docker](https://docs.docker.com/)
- Familiarity with [Kubernetes](https://kubernetes.io/docs/home/)
- Familiarity with [Helm](https://helm.sh/docs/)
- A Dagster project to deploy. You can also use the example project [here](/todo). Would be great to have a command they can use to clone the project

Check warning on line 20 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.link-text] Avoid vague text in links like 'here' unless you can pair it with more descriptive text. Raw Output: {"message": "[Dagster.link-text] Avoid vague text in links like 'here' unless you can pair it with more descriptive text.", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 20, "column": 69}}}, "severity": "WARNING"}
- To have Docker installed. [Docker installation guide](https://docs.docker.com/engine/install/)
- To have `kubectl` installed. [Kubernetes installation guide](https://kubernetes.io/docs/tasks/tools/)
- To have a Kubernetes cluster created. If you would like to follow along with this guide on your local machine, you can install Docker Desktop and turn on the included Kubernetes server [Docker Desktop and Kubernetes guide](https://docs.docker.com/desktop/kubernetes/)
- Access to a Docker image registery, such as Amazon Web Services ECR or DockerHub

Check failure on line 24 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'registery'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'registery'?", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 24, "column": 28}}}, "severity": "ERROR"}

Check failure on line 24 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.spelling] Is 'registery' spelled correctly? Raw Output: {"message": "[Dagster.spelling] Is 'registery' spelled correctly?", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 24, "column": 28}}}, "severity": "ERROR"}

Check warning on line 24 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.acronyms] Spell out 'ECR', if it's unfamiliar to the audience. Raw Output: {"message": "[Dagster.acronyms] Spell out 'ECR', if it's unfamiliar to the audience.", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 24, "column": 67}}}, "severity": "INFO"}
- To have Helm 3 installed. [Helm installation guide](https://helm.sh/docs/intro/install/)

</details>


## Step 0a: Understand the Dagster deployment architecture
Do we need this part? Should it be a pre-req that we link to?

## Step 0b: Example project tour
If you are deploying your own Dagster project, skip ahead to Step 1. However, if you are using the example project for this guide, this step will walk you through the contents of the project.

The example project should contain the following files
```bash
TODO FILE TREE
```

The Dagster project can be found in `iris_analysis`. The project itself is found in the `iris_analysis/__init__.py` file. This file contains a single asset that downloads a dataset about iris flowers and logs the number of rows.

The example project also contains a `workspace.yaml` file and a `Dockerfile`. These files will be covered in the following steps.

## Step 1: Get your Dagster project ready to deploy
You will need to add a `workspace.yaml` file to your Dagster project to be ready to deploy with Kubernetes. If you are using the [example project](/todo), you will already have this file.

The `workspace.yaml` file tells Dagster where to find the `Definitions` object in your project. This file should be at the LOCATION of your project.

<CodeExample filePath="guides/deployment/kubernetes/workspace.yaml" language="yaml" title="Example workspace.yaml" />
For example, in the sample `iris_analysis` project, the `Definitions` object is found in the `iris_analysis/__init__.py` file. The `workspace.yaml` tells Dagster to load this project and give it the name `iris_analysis`.

You can also point to a python module in the `workspace.yaml` file. To learn more about writing a `workspace.yaml` see the [Workspace](/todo) guide.

Check warning on line 53 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Terms.engineering] Use 'Python' instead of 'python'. Raw Output: {"message": "[Terms.engineering] Use 'Python' instead of 'python'.", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 53, "column": 25}}}, "severity": "WARNING"}

## Step 2: Write and build a Docker image containing your Dagster project
### Step 2.1: Write a Dockerfile

Check failure on line 56 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.headings-casing] 'Step 2.1: Write a Dockerfile' should be in sentence case Raw Output: {"message": "[Dagster.headings-casing] 'Step 2.1: Write a Dockerfile' should be in sentence case", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 56, "column": 5}}}, "severity": "ERROR"}

Check failure on line 56 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.spelling] Is 'Dockerfile' spelled correctly? Raw Output: {"message": "[Dagster.spelling] Is 'Dockerfile' spelled correctly?", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 56, "column": 23}}}, "severity": "ERROR"}

Check failure on line 56 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'Dockerfile'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'Dockerfile'?", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 56, "column": 23}}}, "severity": "ERROR"}
You will need to build a Docker image that contains your Dagster project and all of its dependencies. The Dockerfile should copy your Dagster project and the `workspace.yaml` file created in Step 1 into the image and install `dagster`, `dagster-postgres`, and `dagster-k8s`, along with any other libraries your project depends on. Finally, ensure that port 80 is exposed. This will be used to set up port-forwarding later.

Check failure on line 57 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'Dockerfile'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'Dockerfile'?", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 57, "column": 107}}}, "severity": "ERROR"}

Check failure on line 57 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.spelling] Is 'Dockerfile' spelled correctly? Raw Output: {"message": "[Dagster.spelling] Is 'Dockerfile' spelled correctly?", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 57, "column": 107}}}, "severity": "ERROR"}

<CodeExample filePath="guides/deployment/kubernetes/Dockerfile" language="docker" title="Example Dockerfile" />
The example project has a dependency on `pandas` so it is included in the `pip install` command.

Check failure on line 60 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.heading-characters] Headings shouldn't have periods or other Markdown formatting Raw Output: {"message": "[Dagster.heading-characters] Headings shouldn't have periods or other Markdown formatting", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 60, "column": 41}}}, "severity": "ERROR"}

Check warning on line 60 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.contractions] Use 'it's' instead of 'it is'. Raw Output: {"message": "[Dagster.contractions] Use 'it's' instead of 'it is'.", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 60, "column": 53}}}, "severity": "INFO"}

### Step 2.2: Build and push a Docker image

To build your Docker image, run the following command from the directory where your Dockerfile is located:

```bash
docker build . -t iris_analysis:1
```
This builds the Docker image from Step 2.1 and gives it the name `iris_analysis` and tag `1`. You can set custom values for both the name and the tag. We recommend that each time you rebuild your Docker image, you assign a new value for the tag to ensure that the correct image is used when running your code.


If you are using a Docker image registry, push the image to your registery. If you are following along on your local machine, you can skip this command.

```bash
docker push iris_analysis:1
```

TODO - specific instructions if pushing to an image registry?


## Step 3: Configure `kubectl` to point at a Kubernetes cluster
Before you can deploy Dagster, you need to configure `kubectl` to develop against the Kubernetes cluster where you want Dagster to be deployed.

If you are using Docker Desktop and the included Kubernetes server, you will need to create a context first. If you already have a Kubernetes cluster and context created for your Dagster deployment you can skip running this command.
```bash
kubectl config set-context dagster --namespace default --cluster docker-desktop --user=docker-desktop
```

Ensure that `kubectl` is using the correct context by running:
```bash
kubectl config use-context <context-name>
```
Where `<context-name>` is the name of the context you want to use. For example, if you ran the `kubectl config set-context` command above, you will run

Check warning on line 93 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Terms.words] Use 'preceding' instead of 'above'. Raw Output: {"message": "[Terms.words] Use 'preceding' instead of 'above'.", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 93, "column": 133}}}, "severity": "WARNING"}
```bash
kubectl config use-context dagster
```

## Step 4: Add the Dagster Helm chart repository

Dagster publishes [Helm charts](https://artifacthub.io/packages/helm/dagster/dagster) for deploying Dagster. New Helm charts are published for each version of Dagster. You should use the Helm chart version that matches the version of Dagster you have installed.

To install the Dagster Helm charts, run the following command:

```bash
helm repo add dagster https://dagster-io.github.io/helm
```

If you have previously added the Dagster Helm charts, run the following command to update the repository:

```bash
helm repo update
```

## Step 5: Configure the Helm chart for your deployment

You will need to modify some values in Dagster's Helm chart to deploy your Dagster project.

### Step 5.1: Copy the default Helm chart values into a `values.yaml`

Run the following command to copy the values installed from the published Helm charts so that you can modify them:

```bash
helm show values dagster/dagster > values.yaml
```
TODO - where to copy this file to?

### Step 5.2: Modify the `values.yaml` file for your deployment
The `values.yaml` file contains configuration options you can set for your deployment. There are comments in the `values.yaml` file explaining these options, and you can learn more about them [here](/todo).

Check warning on line 128 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.link-text] Avoid vague text in links like 'here' unless you can pair it with more descriptive text. Raw Output: {"message": "[Dagster.link-text] Avoid vague text in links like 'here' unless you can pair it with more descriptive text.", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 128, "column": 193}}}, "severity": "WARNING"}

The minimal configuration options you need to set in order to deploy your project are the `deployments.name`, `deployments.image`, and `deployments.dagsterApiGrpcArgs` values. `deployments.name` should be a unique name for your deployment, and `deployments.image` should be set to match the Docker image you built and pushed in Step 2. `dagsterApiGrpcArgs` should be set to NEED HELP WITH HOW TO EXPLAIN THIS.

Check warning on line 130 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Terms.words] Use 'to' instead of 'in order to'. Raw Output: {"message": "[Terms.words] Use 'to' instead of 'in order to'.", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 130, "column": 51}}}, "severity": "WARNING"}

Check warning on line 130 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.wordiness] Remove 'in order' and leave 'to'. Raw Output: {"message": "[Dagster.wordiness] Remove 'in order' and leave 'to'.", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 130, "column": 51}}}, "severity": "INFO"}

Check warning on line 130 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.acronyms] Spell out 'NEED', if it's unfamiliar to the audience. Raw Output: {"message": "[Dagster.acronyms] Spell out 'NEED', if it's unfamiliar to the audience.", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 130, "column": 375}}}, "severity": "INFO"}

Check warning on line 130 in docs/docs-beta/docs/guides/deployment/kubernetes.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.acronyms] Spell out 'HELP', if it's unfamiliar to the audience. Raw Output: {"message": "[Dagster.acronyms] Spell out 'HELP', if it's unfamiliar to the audience.", "location": {"path": "docs/docs-beta/docs/guides/deployment/kubernetes.md", "range": {"start": {"line": 130, "column": 380}}}, "severity": "INFO"}

If you are following this guide on your local machine, you will also need to set `pullPolicy: IfNotPresent`. This will use the local version of the image built in Step 2. However, in production use cases when your Docker images are pushed to image registries, this value should remain `pullPolicy: Always`.

<CodeExample filePath="guides/deployment/kubernetes/minimal_values.yaml" language="yaml" title="Minimal changes to make to values.yaml" />



## Step 6: Install the Helm chart
Now that you have modified the Helm `values.yaml` file, you can install the changes in your Kubernetes cluster.

Run the following command to install the Helm chart and create a release (TODO i dont really know what create a release means in this context, just took it from the existing guide)

```bash
helm upgrade --install dagster dagster/dagster -f /path/to/values.yaml
```
I think it would be nice to explain the various components of this command and what to set them to, but i dont know what the command is really doing

:::note
If you are running an older version of Dagster, pass the --version flag to `helm upgrade` with the version of Dagster you are running. For example, if you are running `dagster==1.7.4` you'll run the command `helm upgrade --install dagster dagster/dagster -f /path/to/values.yaml --version 1.7.4`
:::

The `helm upgrade` command will launch several pods in your Kubernetes cluster. You can check the status of the pod with the command:

```bash
kubectl get pods
```

It may take a few minutes before all pods are in a `RUNNING` state. If the `helm upgrade` was successful, you should see a `kubectl get pods` output similar to this:

```bash
$ kubectl get pods
NAME READY STATUS AGE
dagster-daemon-5787ccc868-nsvsg 1/1 Running 3m41s
dagster-dagit-7c5b5c7f5c-rqrf8 1/1 Running 3m41s
dagster-dagster-user-deployments-iris-analysis-564cbcf9f-fbqlw 1/1 Running 3m41s
dagster-postgresql-0 1/1 Running 3m41s
```

<details>
<summary>Debugging failed pods</summary>

If one of the pods is in an error state, you can view the logs using the command

```bash
kubectl logs <pod-name>
```

For example, if the pod `dagster-dagster-user-deployments-iris-analysis-564cbcf9f-fbqlw` is in a `CrashLoopBackOff` state, the logs can be viewed with the command

```
kubectl logs dagster-dagster-user-deployments-iris-analysis-564cbcf9f-fbqlw
```

</details>

TODO - maybe explain what each of these pods are?

## Step 7: Connect to your Dagster deployment and materialize your assets

### Step 7.1: Start port-forwarding to the webserver pod
Run the following command to set up port forwarding to the webserver pod"

```bash
DAGSTER_WEBSERVER_POD_NAME=$(kubectl get pods --namespace default \
-l "app.kubernetes.io/name=dagster,app.kubernetes.io/instance=dagster,component=dagster-webserver" \
-o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $DAGSTER_WEBSERVER_POD_NAME 8080:80
```

This command gets the full name of the `webserver` pod from the output of `kubectl get pods`, and then sets up port forwarding with the `kubectl port-forward` command.

### Step 7.2: Visit your Dagster deployment
The webserver has been port-forwarded to `8080`, so you can visit the Dagster deployment by going to [http://127.0.0.1:8080](http://127.0.0.1:8080). You should see the Dagster landing page

TODO SCREENSHOT


### Step 7.3: Materialize an asset
From the Dagster UI you can materialize an asset by clicking the "Materialize" button. Dagster will start a Kubernetes job to materialize the asset. You can introspect on the Kubernetes cluster to see this job:


```bash
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
dagster-run-5ee8a0b3-7ca5-44e6-97a6-8f4bd86ee630 1/1 4s 11s
```


## Next steps
- Forwarding Dagster logs from a Kubernetes deployment to AWS, Azure, GCP
- Other configuration options for K8s deployment - secrets,
21 changes: 21 additions & 0 deletions examples/deploy_k8s_beta/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
ARG BASE_IMAGE

FROM "${BASE_IMAGE}"

COPY . /

# This makes sure that logs show up immediately instead of being buffered
ENV PYTHONUNBUFFERED=1

RUN pip install --upgrade pip

RUN \
pip install \
dagster \
dagster-postgres \
dagster-k8s \


WORKDIR /example_project/

EXPOSE 80
14 changes: 14 additions & 0 deletions examples/deploy_k8s_beta/example_project/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import dagster as dg


@dg.asset
def upstream(context: dg.AssetExecutionContext) -> None:
context.log.info("Upstream asset")


@dg.asset(deps=[upstream])
def downstream(context: dg.AssetExecutionContext) -> None:
context.log.info("Downstream asset")


defs = dg.Definitions(assets=[upstream, downstream])
4 changes: 4 additions & 0 deletions examples/deploy_k8s_beta/workspace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
load_from:
- python_file:
relative_path: example_project/__init__.py
location_name: example_project
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
FROM python:3.11

# Copy your Dagster project. You may need to replace the filepath depending on your project structure
COPY . /

# This makes sure that logs show up immediately instead of being buffered
ENV PYTHONUNBUFFERED=1

RUN pip install --upgrade pip

# Install dagster and any other dependencies your project requires
RUN \
pip install \
dagster \
dagster-postgres \
dagster-k8s \
# add any other dependencies here
pandas


WORKDIR /example_project/

# Expose the port that your Dagster instance will run on
EXPOSE 80
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import pandas as pd

import dagster as dg


@dg.asset
def iris_dataset_size(context: dg.AssetExecutionContext) -> None:
df = pd.read_csv(
"https://docs.dagster.io/assets/iris.csv",
names=[
"sepal_length_cm",
"sepal_width_cm",
"petal_length_cm",
"petal_width_cm",
"species",
],
)

context.log.info(f"Loaded {df.shape[0]} data points.")


defs = dg.Definitions(assets=[iris_dataset_size])
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
... # Preceding configuration omitted for brevity

deployments:
- name: "iris-analysis" # set to the name of your deployment
image:
repository: "iris_analysis" # set to the name of your Docker image
# When a tag is not supplied, it will default as the Helm chart version.
tag: 1 # set to the tag of your Docker image

# Only change this value if you are following the guide on your
# local machine. If you are pushing images to a registry,
# leave the value as Always
pullPolicy: IfNotPresent

dagsterApiGrpcArgs:
- "--python-file"
- "/iris_analysis/__init__.py"

... # Following configuration omitted for brevity
Loading

0 comments on commit 7869057

Please sign in to comment.