You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to start a discussion about strategies for deploying Dagster without requiring long-lived credentials for its application database. Using short lived credentials or access tokens would be preferable for the following reasons:
In the event of a database credential exposure, the credentials would have a limited lifespan, reducing the likelihood of data exfiltration.
Credential rotation may always become necessary, even if it is not planned. It would be better for this to be a regular, validated, automated activity rather than a risky manual one.
At the moment, when deploying Dagster OSS to your own cloud environment, the assumption is that Dagster will use a static username and password for its application database. When deploying to Kubernetes via Helm, the username is written into the pod configuration while the password is written to a Kubernetes secret. Both are provided to Dagster via environment variables.
Typically, databases will be provisioned using managed cloud services that support passwordless authentication via the cloud platform's identity provider. Alternatively, an organization might have a secrets management platform that can generate short-lived credentials on request, e.g. Hashicorp Vault / OpenBao. However, neither of these approaches are fully supported in Dagster.
I've come up with the following six options. My questions are:
Is this an exhaustive list or are there options I haven't considered?
Can any of these be ruled out as impractical?
Can any of the downsides that I've identified be mitigated?
Would Dagster consider supporting any of options 1-4 that require some development effort?
I've limited my thinking here to Dagster's own database. Would it also make sense to think about how to use managed identities / dynamic credentials for data-plane components in the user's data platform?
Option 1: Support cloud managed identities natively
This would use the native authentication libraries for the major cloud providers - e.g. for Microsoft Azure, using MSAL with a Managed Identity or Workload Identity to obtain an access token for the database.
Impact: Requires changes to the Helm chart, new database configuration options and first-party support for major cloud providers.
Option 2: Support generic token exchange mechanisms for cloud identities
Generically support token exchange mechanisms such as the OAuth2 Bearer JWT grant flow, e.g. exchanging a Kubernetes ServiceAccount token for an cloud provider's access token to be used as a database credential.
Sample configurations could be provided for some cloud platforms. Alternatively, these could be community-sourced.
Impact: Requires changes to the Helm chart and new database configuration options.
Option 3: Support projecting dynamic credentials as a file
Support reading database credentials from a file instead of an environment variable, and reload those credentials when the contents of the file changes.
This is often used as a strategy for credential rotation in Kubernetes, e.g. writing the credentials to a Kubernetes Secret and projecting that secret into the pod's filesystem. Unlike an environment variable, this does not require the pod to restart when the credentials change.
Impact: Requires changes to the Helm chart and new database configuration options.
Option 4: Support dynamically-generated username/password pairs in environment variables
Enable both the database username and password to be read from a secret, rather than just the password.
This requires a third party system to generate the credentials e.g. Hashicorp Vault / OpenBao.
A drawback of this approach is that it requires a pod restart, which (at least in my testing so far) terminates ongoing jobs. This limits the frequency that the credentials can be refreshed without significant operational impact. Can this be mitigated in some way?
Impact: Requires changes to the Helm chart only.
Option 5: Use a third party tool to obtain cloud identity access tokens
Use a third party tool to obtain a short lived access token for a cloud provider and write it to a Kubernetes secret, which is then provided to the pod as an environment variable. One possible (as yet untested) approach might be to use the External Secrets Operator's Webhook secret generator with the cloud provider's token exchange endpoint.
As with option 4, this requires a pod restart.
Impact: None, (theoretically) achievable with current third party tooling.
Option 6: Use a third party tool to rotate passwords for a single database account
Use a static username and automatically rotate its passwords. Works currently with Vault Secrets Operator / External Secrets Operator.
As with options 4 and 5, this requires a pod restart. It is also likely to cause transient errors in the period between the password being rotated and the successful rollout of the new pod.
Impact: None, achievable with current third party tooling.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi all,
I'd like to start a discussion about strategies for deploying Dagster without requiring long-lived credentials for its application database. Using short lived credentials or access tokens would be preferable for the following reasons:
At the moment, when deploying Dagster OSS to your own cloud environment, the assumption is that Dagster will use a static username and password for its application database. When deploying to Kubernetes via Helm, the username is written into the pod configuration while the password is written to a Kubernetes secret. Both are provided to Dagster via environment variables.
Typically, databases will be provisioned using managed cloud services that support passwordless authentication via the cloud platform's identity provider. Alternatively, an organization might have a secrets management platform that can generate short-lived credentials on request, e.g. Hashicorp Vault / OpenBao. However, neither of these approaches are fully supported in Dagster.
I've come up with the following six options. My questions are:
Option 1: Support cloud managed identities natively
This would use the native authentication libraries for the major cloud providers - e.g. for Microsoft Azure, using MSAL with a Managed Identity or Workload Identity to obtain an access token for the database.
Impact: Requires changes to the Helm chart, new database configuration options and first-party support for major cloud providers.
Option 2: Support generic token exchange mechanisms for cloud identities
Generically support token exchange mechanisms such as the OAuth2 Bearer JWT grant flow, e.g. exchanging a Kubernetes ServiceAccount token for an cloud provider's access token to be used as a database credential.
Sample configurations could be provided for some cloud platforms. Alternatively, these could be community-sourced.
Impact: Requires changes to the Helm chart and new database configuration options.
Option 3: Support projecting dynamic credentials as a file
Support reading database credentials from a file instead of an environment variable, and reload those credentials when the contents of the file changes.
This is often used as a strategy for credential rotation in Kubernetes, e.g. writing the credentials to a Kubernetes Secret and projecting that secret into the pod's filesystem. Unlike an environment variable, this does not require the pod to restart when the credentials change.
Impact: Requires changes to the Helm chart and new database configuration options.
Option 4: Support dynamically-generated username/password pairs in environment variables
Enable both the database username and password to be read from a secret, rather than just the password.
This requires a third party system to generate the credentials e.g. Hashicorp Vault / OpenBao.
A drawback of this approach is that it requires a pod restart, which (at least in my testing so far) terminates ongoing jobs. This limits the frequency that the credentials can be refreshed without significant operational impact. Can this be mitigated in some way?
Impact: Requires changes to the Helm chart only.
Option 5: Use a third party tool to obtain cloud identity access tokens
Use a third party tool to obtain a short lived access token for a cloud provider and write it to a Kubernetes secret, which is then provided to the pod as an environment variable. One possible (as yet untested) approach might be to use the External Secrets Operator's Webhook secret generator with the cloud provider's token exchange endpoint.
As with option 4, this requires a pod restart.
Impact: None, (theoretically) achievable with current third party tooling.
Option 6: Use a third party tool to rotate passwords for a single database account
Use a static username and automatically rotate its passwords. Works currently with Vault Secrets Operator / External Secrets Operator.
As with options 4 and 5, this requires a pod restart. It is also likely to cause transient errors in the period between the password being rotated and the successful rollout of the new pod.
Impact: None, achievable with current third party tooling.
Beta Was this translation helpful? Give feedback.
All reactions