Error when OCI Layers uncompressed size is bigger than 100MB #1511

KarstenSiemer · 2025-01-21T17:09:12Z

Hi all,

thanks for the amazing work.

I want to run the controller in a network restricted area an cannot download providers and so on on the fly. This is why I wanted to put everything into OCI and give that to the runner. Due to the size of the terraform kubernetes provider a OCI layers compressed size is 44MB but its uncompressed size is 134MB.

This results in the following error on the Terraform CRD:

rpc error: code = Unknown desc = failed to untar artifact, error: tar ".terraform/providers/registry.opentofu.org/hashicorp/kubernetes/2.35.1/linux_amd64/terraform-provider-kubernetes" is bigger than max archive size of 104857600 bytes   13m

The default size of the buffer is set here.
There error is thrown here.
I didnt find a standard way in this controllers code to increase the limit to, lets say 4GB (given the pods mem is that high).

Is that something which could already be configured and I have overlooked it or is it something I could work on an PR it?

My runner version is ghcr.io/flux-iac/tf-runner:v0.16.0-rc.5

Thanks in advance.

akselleirv · 2025-01-21T18:44:58Z

Hello @KarstenSiemer, please take a look at https://github.com/flux-iac/tofu-controller/pull/1490/files it contains a guide on how to create a tf-runner with the terraform providers cached.

KarstenSiemer · 2025-01-21T23:38:00Z

Hello @akselleirv,

thanks for your fast response.
I appreciate your input. This approach has the advantage that the total amount of stored providers across my infrastructure is lower.

However, I see a few disadvantages as well. For once, I don't know in advance which providers any future module might need. Then different modules could require different versions of providers. Then it is hard to know which provider versions were used in any past run. Lets say I want to 100% make sure that I can replicate this state in prod, with the providers used to run the module disconnected from the module version in my git, that is harder to make sure. I'd have to really track the versions using constraints and then have the plan fail if the runner image does not line up.

In total, I'd say that I favor putting the providers, including a lock into the module in oci. This would be perfectly possible to do if I could untar layers >100MB.

What is your oppinion? I'd like my module developers to be able to more freely develop modules.

KarstenSiemer · 2025-01-24T10:31:08Z

Hello @akselleirv,

I am currently working on implementing a fix to work with layers bigger than 100MB in tofu-controller.
I will keep the implementation close to what @chanwit did on #301 and make the maximum layer size configurable.
I would say an integer which can be negative or positive would make sense.
The default should stay at 100MB.

It would be nice to be able to set this in MB instead of bytes, as this will be more similar to the grpc size which is currently implemented.

Is there a chance this could be merged into this project or would you favor not implementing this?

Long term I would even recommend to implement fluxcd/pkg/http to fetch and untar any artifacts like the kustomize-controller does here. This would keep the implementation closer to the general direction other flux controller have taken.

KarstenSiemer · 2025-01-24T16:32:42Z

While implementing, I've come to ask myself if it makes sense to add an argument for this.

runner/server.go

buf := bytes.NewBuffer(req.TarGz)
if err = tar.Untar(buf, tmpDir); err != nil {
...

Loads a buffer with the contents of the archive which is to untar.
Untar() now has a default maximum size of 100MB per layer and it can be set via:

fluxcd/tar/tar.go

func Untar(r io.Reader, dir string, inOpts ...TarOption) (err error) {
  opts := tarOpts{
    maxUntarSize: DefaultMaxUntarSize,
  }
  opts.applyOpts(inOpts...)
...

If we now add an argument to the controller to configure the runners with another integer to set maximum layer size, we make configuration harder for the end user, yet I am unsure what we really gain.
Moreover, the end user defines an artifact for the source-controller which he intends to use with the tofu-controller.
Why would we care, if his whole artifact is just one layer and over 100MB.
If his memory doesn't suffice the requirements of his artifact, the runner for this artifact will fail. Which affects the runner but not directly the tofu-controller. Which is the same thing that is currently already happening to me.

Looking over to flux official kustomize-controller for example, they specify an unlimited maximum for layer size.

fluxcd/internal/controller/kustomization_controller.go

if err = fetch.NewArchiveFetcherWithLogger(
  r.artifactFetchRetries,
  tar.UnlimitedUntarSize,
...

Which to me appears to be the most sensical solution here, too. If we would read the size of the buffer and set this as the maximum layer size, it would be the same thing like unlimited layer size. So we should set the maximum layer size to unlimited. From a customer perspective, I'd think that makes sense. If I want to handle a 1GB archive, it could be a single layer. If I set less memory than this, the bytes.NewBuffer function would error, telling us that it cannot allocate enough memory. Which could happen in the same way already.

From a provider perspective, this change doesn't really add new risks and also doesnt make the software more unstable, so it should be fine.

I think thats what I'll raise as PR.
Please excuse my verbal outpourings.

akselleirv · 2025-01-27T08:34:18Z

I see. I guess the solution you proposed here is required to make it work for your use-case, as Terraform providers used in your modules are not predefined and controlled centrally.

I agree that setting the tar.UnlimitedUntarSize would be fine as long as it does not affect other users of the project negatively. From my understanding, it should not.

I'm fine with implementing these changes. @chanwit, do you have any input on this?

chanwit · 2025-01-27T08:41:20Z

We should of course follow any best practices implemented by Flux.

KarstenSiemer · 2025-01-28T09:17:03Z

Thanks for the feedback.
I have not yet created the PR as I am testing the code in a lot of different situations to make sure that I am not introducing problems.
For me so far, it is working perfectly and I have not observed any side effects.

I have one question left however, which is writing tests. I have seen a lot of tests written for GitRepository but none for OCIRepository. That is probably because 'github.com/fluxcd/source-controller/api/v1' does not include a spec for it, but v1beta2 does like this.

Would it be okay for you if I create tests using v1beta2? Should I replicate some of the GitRepository tests to also test my code for the OCIRepository untar handling?

KarstenSiemer linked a pull request Feb 6, 2025 that will close this issue

fix(oci): allow unlimited layer size #1519

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when OCI Layers uncompressed size is bigger than 100MB #1511

Error when OCI Layers uncompressed size is bigger than 100MB #1511

KarstenSiemer commented Jan 21, 2025 •

edited

Loading

akselleirv commented Jan 21, 2025

KarstenSiemer commented Jan 21, 2025

KarstenSiemer commented Jan 24, 2025

KarstenSiemer commented Jan 24, 2025 •

edited

Loading

akselleirv commented Jan 27, 2025

chanwit commented Jan 27, 2025

KarstenSiemer commented Jan 28, 2025

Error when OCI Layers uncompressed size is bigger than 100MB #1511

Error when OCI Layers uncompressed size is bigger than 100MB #1511

Comments

KarstenSiemer commented Jan 21, 2025 • edited Loading

akselleirv commented Jan 21, 2025

KarstenSiemer commented Jan 21, 2025

KarstenSiemer commented Jan 24, 2025

KarstenSiemer commented Jan 24, 2025 • edited Loading

akselleirv commented Jan 27, 2025

chanwit commented Jan 27, 2025

KarstenSiemer commented Jan 28, 2025

KarstenSiemer commented Jan 21, 2025 •

edited

Loading

KarstenSiemer commented Jan 24, 2025 •

edited

Loading