Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[azure-docs] refine AzureBlobComputeLogManager guide #26407

Merged
merged 1 commit into from
Dec 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -169,3 +169,7 @@ alt="Dagster+ code locations page showing the new code location"
width={1152}
height={320}
/>

## Next steps

Now that you have your code location deployed, you can follow the guide [here](/dagster-plus/deployment/azure/blob-compute-logs) to set up logging in your AKS cluster.
83 changes: 52 additions & 31 deletions docs/content/dagster-plus/deployment/azure/blob-compute-logs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,66 +25,83 @@ First, we'll enable the cluster to use workload identity. This will allow the AK
az aks update --resource-group <resource-group> --name <cluster-name> --enable-workload-identity
```

Then, we'll create a new managed identity for the AKS agent, and a new service account in our AKS cluster.
Then, we'll create a new managed identity for the AKS agent.

```bash
az identity create --resource-group <resource-group> --name agent-identity
kubectl create serviceaccount dagster-agent-service-account --namespace dagster-agent
```

Now we need to federate the managed identity with the service account.
We will need to find the name of the service account used by the Dagster+ Agent. If you used the [Dagster+ Helm chart](/dagster-plus/deployment/agents/kubernetes/configuring-running-kubernetes-agent), it should be `user-cloud-dagster-cloud-agent`. You can confirm by using this command:

```bash
kubectl get serviceaccount -n <dagster-agent-namespace>
```

Now we need to federate the managed identity with the service account used by the Dagster+ Agent.

```bash
az identity federated-credential create \
--name dagster-agent-federated-id \
--identity-name agent-identity \
--resource-group <resource-group> \
--issuer $(az aks show -g <resource-group> -n <aks-cluster-name> --query "oidcIssuerProfile.issuerUrl" -otsv) \
--subject system:serviceaccount:dagster-agent:dagster-agent-service-account
--subject system:serviceaccount:<dagster-agent-namespace>:<dagster-agent-service-account>
```

Finally, we'll edit our AKS agent deployment to use the new service account.
You will need to obtain the client id of this identity for the next few operations. Make sure to save this value:

```bash
kubectl edit deployment <your-user-cloud-deployment> -n dagster-agent
az identity show -g <resource-group> -n agent-identity --query 'clientId' -otsv
mlarose marked this conversation as resolved.
Show resolved Hide resolved
```

In the deployment manifest, add the following lines:
We need to grant access to the storage account.

```bash
az role assignment create \
--assignee <managed-identity-client-id> \
--role "Storage Blob Data Contributor" \
--scope $(az storage account show -g <resource-group> -n <storage-account> --query 'id' -otsv)
```

You will need to add new annotations and labels in Kubernetes to enable the use of workload identities. If you're using the Dagster+ Helm Chart, modify your values.yaml to add the following lines:

```yaml
metadata:
...
serviceAccount:
annotations:
azure.workload.identity/client-id: "<managed-identity-client-id>"

dagsterCloudAgent:
labels:
azure.workload.identity/use: "true"

workspace:
labels:
...
azure.workload.identity/use: "true"
spec:
...
template:
...
spec:
...
serviceAccountName: dagster-agent-sa
```

If everything is set up correctly, you should be able to run the following command and see an access token returned:
<Note>
If you need to retrieve the values used by your Helm deployment, you
can run: `helm get values user-cloud > values.yaml`.
</Note>

Finally, update your Helm release with the new values:

```bash
kubectl exec -n dagster-agent -it <pod-in-cluster> -- bash
# in the pod
curl -H "Metadata:true" "http://169.254.169.254/metadata/identity/oauth2/token?resource=https://storage.azure.com/"
helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n <dagster-agent-namespace> -f values.yaml
mlarose marked this conversation as resolved.
Show resolved Hide resolved
```

## Step 2: Configure Dagster to use Azure Blob Storage

Now, you need to update the helm values to use Azure Blob Storage for logs. You can do this by editing the `values.yaml` file for your user-cloud deployment.

Pull down the current values for your deployment:
If everything is set up correctly, you should be able to run the following command and see an access token returned:

```bash
helm get values user-cloud > current-values.yaml
kubectl exec -n <dagster-agent-namespace> -it <pod-in-cluster> -- bash
# in the pod
apt update && apt install -y curl # install curl if missing, may vary depending on the base image
curl -H "Metadata:true" "http://169.254.169.254/metadata/identity/oauth2/token?resource=https://storage.azure.com/&api-version=2018-02-01"
```

Then, edit the `current-values.yaml` file to include the following lines:
## Step 2: Configure Dagster to use Azure Blob Storage

Once again, you need to update the Helm values to use Azure Blob Storage for logs. You can do this by editing the `values.yaml` file for your user-cloud deployment to include the following lines:

```yaml
computeLogs:
Expand All @@ -97,18 +114,22 @@ computeLogs:
container: mycontainer
default_azure_credential:
exclude_environment_credential: false
prefix: dagster-logs-
prefix: dagster-logs
local_dir: "/tmp/cool"
upload_interval: 30
```

Finally, update your deployment with the new values:

```bash
helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n dagster-agent -f current-values.yaml
helm upgrade user-cloud dagster-cloud/dagster-cloud-agent -n <dagster-agent-namespace> -f values.yaml
```

## Step 3: Verify logs are being written to Azure Blob Storage
## Step 3: Update your code location to enable the use of the AzureBlobComputeLogManager

- Add `dagster-azure` to your `setup.py` file. This will allow you to import the `AzureBlobComputeLogManager` class.
mlarose marked this conversation as resolved.
Show resolved Hide resolved

## Step 4: Verify logs are being written to Azure Blob Storage

It's time to kick off a run in Dagster to test your new configuration. If following along with the quickstart repo, you should be able to kick off a run of the `all_assets_job`, which will generate logs for you to test against. Otherwise, use any job that emits logs. When you go to the stdout/stderr window of the run page, you should see a log file that directs you to the Azure Blob Storage container.

Expand Down
Loading