Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when OCI Layers uncompressed size is bigger than 100MB #1511

Open
KarstenSiemer opened this issue Jan 21, 2025 · 7 comments · May be fixed by #1519
Open

Error when OCI Layers uncompressed size is bigger than 100MB #1511

KarstenSiemer opened this issue Jan 21, 2025 · 7 comments · May be fixed by #1519

Comments

@KarstenSiemer
Copy link

KarstenSiemer commented Jan 21, 2025

Hi all,

thanks for the amazing work.

I want to run the controller in a network restricted area an cannot download providers and so on on the fly. This is why I wanted to put everything into OCI and give that to the runner. Due to the size of the terraform kubernetes provider a OCI layers compressed size is 44MB but its uncompressed size is 134MB.

This results in the following error on the Terraform CRD:

rpc error: code = Unknown desc = failed to untar artifact, error: tar ".terraform/providers/registry.opentofu.org/hashicorp/kubernetes/2.35.1/linux_amd64/terraform-provider-kubernetes" is bigger than max archive size of 104857600 bytes   13m

The default size of the buffer is set here.
There error is thrown here.
I didnt find a standard way in this controllers code to increase the limit to, lets say 4GB (given the pods mem is that high).

Is that something which could already be configured and I have overlooked it or is it something I could work on an PR it?

My runner version is ghcr.io/flux-iac/tf-runner:v0.16.0-rc.5

Thanks in advance.

@akselleirv
Copy link
Collaborator

Hello @KarstenSiemer, please take a look at https://github.com/flux-iac/tofu-controller/pull/1490/files it contains a guide on how to create a tf-runner with the terraform providers cached.

@KarstenSiemer
Copy link
Author

Hello @akselleirv,

thanks for your fast response.
I appreciate your input. This approach has the advantage that the total amount of stored providers across my infrastructure is lower.

However, I see a few disadvantages as well. For once, I don't know in advance which providers any future module might need. Then different modules could require different versions of providers. Then it is hard to know which provider versions were used in any past run. Lets say I want to 100% make sure that I can replicate this state in prod, with the providers used to run the module disconnected from the module version in my git, that is harder to make sure. I'd have to really track the versions using constraints and then have the plan fail if the runner image does not line up.

In total, I'd say that I favor putting the providers, including a lock into the module in oci. This would be perfectly possible to do if I could untar layers >100MB.

What is your oppinion? I'd like my module developers to be able to more freely develop modules.

@KarstenSiemer
Copy link
Author

Hello @akselleirv,

I am currently working on implementing a fix to work with layers bigger than 100MB in tofu-controller.
I will keep the implementation close to what @chanwit did on #301 and make the maximum layer size configurable.
I would say an integer which can be negative or positive would make sense.
The default should stay at 100MB.

It would be nice to be able to set this in MB instead of bytes, as this will be more similar to the grpc size which is currently implemented.

Is there a chance this could be merged into this project or would you favor not implementing this?

Long term I would even recommend to implement fluxcd/pkg/http to fetch and untar any artifacts like the kustomize-controller does here. This would keep the implementation closer to the general direction other flux controller have taken.

@KarstenSiemer
Copy link
Author

KarstenSiemer commented Jan 24, 2025

While implementing, I've come to ask myself if it makes sense to add an argument for this.

runner/server.go

buf := bytes.NewBuffer(req.TarGz)
if err = tar.Untar(buf, tmpDir); err != nil {
...

Loads a buffer with the contents of the archive which is to untar.
Untar() now has a default maximum size of 100MB per layer and it can be set via:

fluxcd/tar/tar.go

func Untar(r io.Reader, dir string, inOpts ...TarOption) (err error) {
  opts := tarOpts{
    maxUntarSize: DefaultMaxUntarSize,
  }
  opts.applyOpts(inOpts...)
...

If we now add an argument to the controller to configure the runners with another integer to set maximum layer size, we make configuration harder for the end user, yet I am unsure what we really gain.
Moreover, the end user defines an artifact for the source-controller which he intends to use with the tofu-controller.
Why would we care, if his whole artifact is just one layer and over 100MB.
If his memory doesn't suffice the requirements of his artifact, the runner for this artifact will fail. Which affects the runner but not directly the tofu-controller. Which is the same thing that is currently already happening to me.

Looking over to flux official kustomize-controller for example, they specify an unlimited maximum for layer size.

fluxcd/internal/controller/kustomization_controller.go

if err = fetch.NewArchiveFetcherWithLogger(
  r.artifactFetchRetries,
  tar.UnlimitedUntarSize,
...

Which to me appears to be the most sensical solution here, too. If we would read the size of the buffer and set this as the maximum layer size, it would be the same thing like unlimited layer size. So we should set the maximum layer size to unlimited. From a customer perspective, I'd think that makes sense. If I want to handle a 1GB archive, it could be a single layer. If I set less memory than this, the bytes.NewBuffer function would error, telling us that it cannot allocate enough memory. Which could happen in the same way already.

From a provider perspective, this change doesn't really add new risks and also doesnt make the software more unstable, so it should be fine.

I think thats what I'll raise as PR.
Please excuse my verbal outpourings.

@akselleirv
Copy link
Collaborator

I see. I guess the solution you proposed here is required to make it work for your use-case, as Terraform providers used in your modules are not predefined and controlled centrally.

I agree that setting the tar.UnlimitedUntarSize would be fine as long as it does not affect other users of the project negatively. From my understanding, it should not.

I'm fine with implementing these changes. @chanwit, do you have any input on this?

@chanwit
Copy link
Collaborator

chanwit commented Jan 27, 2025

We should of course follow any best practices implemented by Flux.

@KarstenSiemer
Copy link
Author

Thanks for the feedback.
I have not yet created the PR as I am testing the code in a lot of different situations to make sure that I am not introducing problems.
For me so far, it is working perfectly and I have not observed any side effects.

I have one question left however, which is writing tests. I have seen a lot of tests written for GitRepository but none for OCIRepository. That is probably because 'github.com/fluxcd/source-controller/api/v1' does not include a spec for it, but v1beta2 does like this.

Would it be okay for you if I create tests using v1beta2? Should I replicate some of the GitRepository tests to also test my code for the OCIRepository untar handling?

@KarstenSiemer KarstenSiemer linked a pull request Feb 6, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants