Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🤫 WIP: Secret experiment #1848

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

🤫 WIP: Secret experiment #1848

wants to merge 3 commits into from

Conversation

moskyb
Copy link
Contributor

@moskyb moskyb commented Nov 24, 2022

This is something that's been on my mind for a little while: how do we make secrets awesome in the agent?

At the moment, we rely on build scripts to fetch secrets at run time, but what if we could make it part of the agent lifecycle, and handle everything all nice?

This PR is an experiment to that end. It pretends that there are elements in the pipeline.yml that look like this:

secret_providers:
  - id: ssm
    type: aws-ssm
    config:
      iam_role: arn:aws:iam::123456789012:role/buildkite-role
      assume_via_oidc: true

  - id: vault
    type: vault_kvv2
    config:
      address: https://vault.example.com
      mount_path: secret
      auth_type: kubernetes
      auth_config:
        role: buildkite
        service_account_token_path: /var/run/secrets/kubernetes.io/serviceaccount/token

secrets: # Global to build, every step has these secrets injected
  - env_var: WIDGET_CORP_API_KEY
    key: path/to/widget-corp-api-key
    provider_id: vault

  - file: ~/.ssh/id_rsa
    key: path/to/ssh-key
    provider_id: ssm

# and/or

steps:
  # ...

  - name: deploy
    key: deploy
    command: ".buildkite/steps/deploy.sh"
    secrets: # Local to step, merged with globals serverside
      - env_var: GITHUB_DEPLOY_TOKEN
        key: /path/to/github-deploy-token
        provider_id: vault

and provides the agent a way to fetch those secrets in such a way that the secret store itself is abstracted away from the buildkite backend - that is, the backend would have no access to customer secrets - and in such a way that multiple stores can be used.

This PR is a super duper POC: there's only one secret backend (SSM parameter store) implemented, there are no tests, and some things are completely broken - using OIDC to log into aws, and writing secrets to files.

Hopefully this PR shows more or less what i'd like to get at, but not necessarily soon.

secrets/provider_registry.go Show resolved Hide resolved
}

type ProviderRegistryConfig struct {
Shell *shell.Shell
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't love passing the bootstrap's shell into the provider registry, but it would be nice for it to be able to log things, and when it creates secrets.EnvSecrets, it needs to be able to pass the shell's env into them so that they can store themselves. This seems a bit nasty and i'd appreciate some C&C on how to make it a bit better

Comment on lines 57 to 82
for _, c := range configs {
wg.Add(1)
go func(config SecretConfig) {
defer wg.Done()

secret, err := pr.Fetch(config)
if err != nil {

errorsCh <- err
return
}
secretsCh <- secret
}(c)
}

go func() {
for err := range errorsCh {
errors = append(errors, err)
}
}()

go func() {
for secret := range secretsCh {
secrets = append(secrets, secret)
}
}()
Copy link
Contributor Author

@moskyb moskyb Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this is getting too goroutiney (edit: in the sense that the concurrency stuff makes the code harder to parse for humans). do we really need to fetch things in parallel? we can, so we might as well, was my justification, but i think the reality is that the number of secrets in each pipeline is gonna be pretty small, and they could probably be serialised.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you keep the parallelisation, might want to at least cap the number of worker goroutines at runtime.NumCPU or similar.

Also is it worth returning each error, or should it treat the whole operation as failed if any of them fail?

Also while you've been out, I've been busy sprinkling context.Context args through the codebase - can requests be cancelled early through use of a context somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you keep the parallelisation, might want to at least cap the number of worker goroutines at runtime.NumCPU or similar

what would the benefit of limiting concurrency be? given that calls to secret managers are going to be IO bound, i'm happy to let greenthreading/goroutine magic handle throttling our concurrency.

Is it worth returning each error, or should it treat the whole operation as failed if any of them fail?

I think it's probably most useful to do both - ie, any len(errors) > 0 should be treated as a failure, but it's also nice to be able to tell the user exactly which secret failed to be fetched, and for what reason.

can requests be cancelled early through use of a context somehow?

yep, i think it's totally reasonable to change the Provider.Fetch interface to accept a context. i'll make that change now :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would the benefit of limiting concurrency be? given that calls to secret managers are going to be IO bound, i'm happy to let greenthreading/goroutine magic handle throttling our concurrency.

That's the happy path - what if they each open a connection simultaneously, exhausting TCP source ports? Or there's a config error?

secrets/secret_config.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants