-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Feature]: Preserve SPS S3 buckets throughout deployment lifecycle #272
Comments
I dug into this issue a little bit and haven't really come up with a satisfactory solution. Here is what I explored: solution 1 "lifecycle" meta-argument: https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle
solution 2 Run a script before destroy to create a new temporary bucket and copy data over. Then run a script after create to copy data over to new S3 bucket and remove old temporary bucket.
solution 3 Remove S3 bucket from terraform state before destroying and import S3 bucket into terraform state when creating.
I don't think it would take too long to implement solution 2 and that seems like the best out of the 3. But I feel like I might be missing something obvious, @LucaCinquini and @jpl-btlunsfo any ideas? |
Hi @nikki-t : thanks for investigating. I agree that solutions 1 and 3 are no-go, while 2 seems the most reasonable. I wonder if we can leverage AWS backup and restoration (via Terraform - or manually). Let us investigate some more. |
I wonder if it makes more sense to use a single bucket+ prefixees for most of these items (code, config, logs?) and have those be requirements on a deployment? e.g. "bring your own bucket" as a variable in a deployment? This is pretty much how AWS does its managed sources (choose a bucket...) before deployment. This way the bucket and its contents live outside of the airflow lifecycle itself. |
We certainly want to reduce the number of buckets, and BYOB might be a good idea, actually. |
Yeah I think having the buckets be pre-conditions rather than “in” the airflow terraform could help? I’m not sure if it’s BYOB for the airflow metadata… BYOB makes sense for the data itself. Is SPS-code deployed apps, or is that something to manage the Airflow guts? Would these buckets ever get large, like tens or hundreds of GB? |
A SPS deployment stores data into 4 S3 buckets, for example:
We need to change the Terraform code such that by default these buckets are NOT destroyed when the cluster is taken down, and are reused when the cluster is deployed again. Optionally, the user can set the variable to destroy and recreate the S3 buckets.
The text was updated successfully, but these errors were encountered: