Skip to content

Commit

Permalink
[azure-docs] refine AzureBlobComputeLogManager guide (dagster-io#26407)
Browse files Browse the repository at this point in the history
## Summary & Motivation

Updates the Azure Blob Storage compute logs documentation to clarify the setup process and fix issues with workload identity configuration. Adds specific instructions for using the Dagster+ Helm chart and includes a new "Next steps" section in the ACR user code guide.

## How I Tested These Changes

- Verified all commands and configurations work in an Azure Kubernetes Service (AKS) environment

## Changelog

- NOCHANGELOG
  • Loading branch information
mlarose authored and pskinnerthyme committed Dec 16, 2024
1 parent d42c696 commit c268d0a
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 31 deletions.
4 changes: 4 additions & 0 deletions docs/content/dagster-plus/deployment/azure/acr-user-code.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -169,3 +169,7 @@ alt="Dagster+ code locations page showing the new code location"
width={1152}
height={320}
/>

## Next steps

Now that you have your code location deployed, you can follow the guide [here](/dagster-plus/deployment/azure/blob-compute-logs) to set up logging in your AKS cluster.
83 changes: 52 additions & 31 deletions docs/content/dagster-plus/deployment/azure/blob-compute-logs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,66 +25,83 @@ First, we'll enable the cluster to use workload identity. This will allow the AK
az aks update --resource-group <resource-group> --name <cluster-name> --enable-workload-identity
```

Then, we'll create a new managed identity for the AKS agent, and a new service account in our AKS cluster.
Then, we'll create a new managed identity for the AKS agent.

```bash
az identity create --resource-group <resource-group> --name agent-identity
kubectl create serviceaccount dagster-agent-service-account --namespace dagster-agent
```

Now we need to federate the managed identity with the service account.
We will need to find the name of the service account used by the Dagster+ Agent. If you used the [Dagster+ Helm chart](/dagster-plus/deployment/agents/kubernetes/configuring-running-kubernetes-agent), it should be `user-cloud-dagster-cloud-agent`. You can confirm by using this command:

```bash
kubectl get serviceaccount -n <dagster-agent-namespace>
```

Now we need to federate the managed identity with the service account used by the Dagster+ Agent.

```bash
az identity federated-credential create \
--name dagster-agent-federated-id \
--identity-name agent-identity \
--resource-group <resource-group> \
--issuer $(az aks show -g <resource-group> -n <aks-cluster-name> --query "oidcIssuerProfile.issuerUrl" -otsv) \
--subject system:serviceaccount:dagster-agent:dagster-agent-service-account
--subject system:serviceaccount:<dagster-agent-namespace>:<dagster-agent-service-account>
```

Finally, we'll edit our AKS agent deployment to use the new service account.
You will need to obtain the client id of this identity for the next few operations. Make sure to save this value:

```bash
kubectl edit deployment <your-user-cloud-deployment> -n dagster-agent
az identity show -g <resource-group> -n agent-identity --query 'clientId' -otsv
```

In the deployment manifest, add the following lines:
We need to grant access to the storage account.

```bash
az role assignment create \
--assignee <managed-identity-client-id> \
--role "Storage Blob Data Contributor" \
--scope $(az storage account show -g <resource-group> -n <storage-account> --query 'id' -otsv)
```

You will need to add new annotations and labels in Kubernetes to enable the use of workload identities. If you're using the Dagster+ Helm Chart, modify your values.yaml to add the following lines:

```yaml
metadata:
...
serviceAccount:
annotations:
azure.workload.identity/client-id: "<managed-identity-client-id>"

dagsterCloudAgent:
labels:
azure.workload.identity/use: "true"

workspace:
labels:
...
azure.workload.identity/use: "true"
spec:
...
template:
...
spec:
...
serviceAccountName: dagster-agent-sa
```
If everything is set up correctly, you should be able to run the following command and see an access token returned:
<Note>
If you need to retrieve the values used by your Helm deployment, you
can run: `helm get values user-cloud > values.yaml`.
</Note>

Finally, update your Helm release with the new values:

```bash
kubectl exec -n dagster-agent -it <pod-in-cluster> -- bash
# in the pod
curl -H "Metadata:true" "http://169.254.169.254/metadata/identity/oauth2/token?resource=https://storage.azure.com/"
helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n <dagster-agent-namespace> -f values.yaml
```

## Step 2: Configure Dagster to use Azure Blob Storage

Now, you need to update the helm values to use Azure Blob Storage for logs. You can do this by editing the `values.yaml` file for your user-cloud deployment.

Pull down the current values for your deployment:
If everything is set up correctly, you should be able to run the following command and see an access token returned:

```bash
helm get values user-cloud > current-values.yaml
kubectl exec -n <dagster-agent-namespace> -it <pod-in-cluster> -- bash
# in the pod
apt update && apt install -y curl # install curl if missing, may vary depending on the base image
curl -H "Metadata:true" "http://169.254.169.254/metadata/identity/oauth2/token?resource=https://storage.azure.com/&api-version=2018-02-01"
```

Then, edit the `current-values.yaml` file to include the following lines:
## Step 2: Configure Dagster to use Azure Blob Storage

Once again, you need to update the Helm values to use Azure Blob Storage for logs. You can do this by editing the `values.yaml` file for your user-cloud deployment to include the following lines:

```yaml
computeLogs:
Expand All @@ -97,18 +114,22 @@ computeLogs:
container: mycontainer
default_azure_credential:
exclude_environment_credential: false
prefix: dagster-logs-
prefix: dagster-logs
local_dir: "/tmp/cool"
upload_interval: 30
```

Finally, update your deployment with the new values:

```bash
helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n dagster-agent -f current-values.yaml
helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n <dagster-agent-namespace> -f values.yaml
```

## Step 3: Verify logs are being written to Azure Blob Storage
## Step 3: Update your code location to enable the use of the AzureBlobComputeLogManager

- Add `dagster-azure` to your `setup.py` file. This will allow you to import the `AzureBlobComputeLogManager` class.

## Step 4: Verify logs are being written to Azure Blob Storage

It's time to kick off a run in Dagster to test your new configuration. If following along with the quickstart repo, you should be able to kick off a run of the `all_assets_job`, which will generate logs for you to test against. Otherwise, use any job that emits logs. When you go to the stdout/stderr window of the run page, you should see a log file that directs you to the Azure Blob Storage container.

Expand Down

0 comments on commit c268d0a

Please sign in to comment.