Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for AWS Secrets Manager #151

Merged
merged 8 commits into from
Nov 13, 2023
Merged

Conversation

andrewtruong
Copy link
Contributor

@andrewtruong andrewtruong commented Nov 3, 2023

Adds support for AWS Secret Manager and W&B Secrets using that backend

Tested webhooks e2e on a fresh instance

@andrewtruong andrewtruong marked this pull request as ready for review November 3, 2023 14:01
@andrewtruong andrewtruong requested a review from gls4 as a code owner November 3, 2023 14:01
@andrewtruong andrewtruong requested a review from a team November 3, 2023 14:01
]
effect = "Allow"
resources = var.secret_manager_arn == "" || var.secret_manager_arn == null ? ["arn:aws:secretsmanager:*:${data.aws_caller_identity.current.account_id}:secret:*"] : [var.secret_manager_arn]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm is there a better default here than allowing access to all secret managers? It looks like the default we are setting for the other policies is ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${aws_iam_role.node.name}"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think limitations are based on the secret, not the secret manager (you can think of the secret manager as some service you can't control)

Our secrets are prefixed, so maybe we can change to:

"arn:aws:secretsmanager:*:${data.aws_caller_identity.current.account_id}:secret:${prefix}-*"

from:

"arn:aws:secretsmanager:*:${data.aws_caller_identity.current.account_id}:secret:*"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yeah we need some way to make this a no-op so that we don't give installs access to all secrets. Maybe just put in some junk for the prefix? cc: @gls4 if you have any better ideas here. It be nice to refactor this at some point to not provision the policy at all in the case that no secret manager arn is present.

Copy link
Contributor Author

@andrewtruong andrewtruong Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, maybe we can just enforce that the prefix is non-null (or hardcode it as wandb-secret)

I don't think we use the Secret Manager for anything else atm, but maybe in future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. Please see https://docs.aws.amazon.com/secretsmanager/latest/userguide/reference_iam-permissions.html.

Access to SECRETS needs to be scoped to the cluster, and only for PUT, GET, UPDATE, DELETE operations. There is no need to provide the additional permissions. Moreover, we cannot allow one customer's cluster to read the secrets from another customer, so permissions have got to be scoped correctly from the start.

As for not provisioning the policy in the absence of the arn -- we need to do that now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified permission to be more tightly scoped

variables.tf Outdated Show resolved Hide resolved
variables.tf Outdated Show resolved Hide resolved
Copy link
Contributor

@gls4 gls4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a few changes. We're best off not creating a node role+policy if secrets aren't being used. And the scope of the permissions has to be limited to the node role and only secrets create/managed by that node. More notes in the comments.

"secretsmanager:*",
]
effect = "Allow"
resources = var.secret_manager_arn == "" || var.secret_manager_arn == null ? ["arn:aws:secretsmanager:*:${data.aws_caller_identity.current.account_id}:secret:*"] : [var.secret_manager_arn]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking for an empty string or null, we should be checking for the appropriate ARN pattern using a regex. This is done in the variable declaration in a validation block. Although validation is not appropriate for some cases, here it is preferred to the proposed solutions because ARNs are of a static and known format. An empty string won't meet the format requirement, which means we only need to check for null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified to just do the regex check with prefix

]
effect = "Allow"
resources = var.secret_manager_arn == "" || var.secret_manager_arn == null ? ["arn:aws:secretsmanager:*:${data.aws_caller_identity.current.account_id}:secret:*"] : [var.secret_manager_arn]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. Please see https://docs.aws.amazon.com/secretsmanager/latest/userguide/reference_iam-permissions.html.

Access to SECRETS needs to be scoped to the cluster, and only for PUT, GET, UPDATE, DELETE operations. There is no need to provide the additional permissions. Moreover, we cannot allow one customer's cluster to read the secrets from another customer, so permissions have got to be scoped correctly from the start.

As for not provisioning the policy in the absence of the arn -- we need to do that now.

variables.tf Outdated Show resolved Hide resolved
modules/app_eks/variables.tf Outdated Show resolved Hide resolved
modules/app_eks/iam-role-attachments.tf Outdated Show resolved Hide resolved
examples/public-dns-external/main.tf Outdated Show resolved Hide resolved
Copy link
Contributor

@elainaRenee elainaRenee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! @gls4 can you take another look?

@gls4
Copy link
Contributor

gls4 commented Nov 8, 2023

Nice work @andrewtruong! Last question: you've applied these changes to a local instance, and they work?

@andrewtruong
Copy link
Contributor Author

Nice work @andrewtruong! Last question: you've applied these changes to a local instance, and they work?

Yep, tested on a fresh 0.46.0 instance

@andrewtruong andrewtruong merged commit aa64eb1 into main Nov 13, 2023
4 checks passed
@andrewtruong andrewtruong deleted the andrew/secrets branch November 13, 2023 17:13
jsbroks pushed a commit that referenced this pull request Nov 13, 2023
## [3.4.0](v3.3.0...v3.4.0) (2023-11-13)

### Features

* Add support for AWS Secrets Manager ([#151](#151)) ([aa64eb1](aa64eb1))
@jsbroks
Copy link
Member

jsbroks commented Nov 13, 2023

This PR is included in version 3.4.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants