Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disabling redis in registry #1707

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

roblabla
Copy link

There are bugs in the registry with regards to cache invalidation when deleting an image that can lead to some really problematic issues. See distribution/distribution#4269.

To work around that problem, add a new field to allow disabling redis in the registry.

@roblabla roblabla force-pushed the disable-redis-registry branch from d29d0fa to 4ab1cb4 Compare February 15, 2024 10:42
@Vad1mo
Copy link
Member

Vad1mo commented Feb 15, 2024

@roblabla, this seems like a serious bug in Distribution. I approve this PR.
Were you able to verify, first hand, that this solved the problem.
The reason I am asking is that Harbor has its own logic to garbage collect and is not relying on distribution.
But it might well be that we have the same blind spot as docker distribution.

Copy link
Member

@Vad1mo Vad1mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some errors in the CI

image

@Vad1mo
Copy link
Member

Vad1mo commented Feb 15, 2024

@roblabla did open an issue in harbor, so we can track that issue in parallel. If not, can you do that.

@roblabla
Copy link
Author

@roblabla did open an issue in harbor, so we can track that issue in parallel. If not, can you do that.

Done: goharbor/harbor#19988

Were you able to verify, first hand, that this solved the problem.
The reason I am asking is that Harbor has its own logic to garbage collect and is not relying on distribution.
But it might well be that we have the same blind spot as docker distribution.

As far as I can tell, harbor reuses distribution's garbage collector logic: https://github.com/goharbor/harbor/blob/main/src/registryctl/api/registry/manifest/manifest.go#L86

Anyways, I haven't validated this exact patch, but I did validate that purging the redis cache fixed the problem and allowed to reupload the image in harbor. So disabling redis was the obvious next step.

I'll look into the CI failure tomorrow (I'm not super familiar with helm ^^).

@Vad1mo
Copy link
Member

Vad1mo commented Feb 15, 2024

btw there is a patch in distribution addressing this, distribution/distribution#3323 you might give it a try.

@zyyw
Copy link
Collaborator

zyyw commented Feb 22, 2024

Hi @roblabla , in this issue distribution/distribution#4269, to reproduce it by essentially running the following command:

# And run garbage collection. This will actually delete the data from the blobstore. However, the cache won't be cleaned!
./bin/registry garbage-collect config.yaml -m

Is it possible due to the implementation of ./bin/registry garbage-collect? Will you be able to reproduce it through an e2e scenario of Harbor? For example:

  • deploy a Harbor instance with GC_TIME_WINDOW_HOURS set to 0
  • push an image to Harbor
  • delete the image
  • run GC in Harbor
  • push the image to Harbor again
  • docker pull the image

@Vad1mo
Copy link
Member

Vad1mo commented Feb 22, 2024

@zyyw here is the related issue on harbor side, with reproducible steps.

Related goharbor/harbor#19988

There are bugs in the registry with regards to cache invalidation when
deleting an image that can lead to some really problematic issues. See
distribution/distribution#4269.

To work around that problem, add a new field to allow disabling redis in
the registry.

Signed-off-by: roblabla <[email protected]>
@roblabla roblabla force-pushed the disable-redis-registry branch from 4ab1cb4 to b805ea5 Compare February 22, 2024 12:39
@roblabla
Copy link
Author

Finally got around to getting this done. I can now confirm that this version works - I have it deployed and running.

@zyyw
Copy link
Collaborator

zyyw commented Mar 7, 2024

discussing it here. closing it now.
redis is a required component for distribution and GC does rely on redis.

@zyyw zyyw closed this Mar 7, 2024
@roblabla
Copy link
Author

roblabla commented Mar 7, 2024

I'm... confused by this comment. Redis is an optional component of distribution, it's solely used as a cache to avoid S3 latency, and can be disabled in the config, as evidenced by this PR working... In my use-case, the performance hit of disabling redis is negligible. And on the distribution side, GC does not rely on redis - in fact that's the whole issue, distribution's GC ignores the redis, causing these issues.

So I'm not sure I understand the rationale here?

@Vad1mo
Copy link
Member

Vad1mo commented Mar 7, 2024

discussing it here. closing it now. redis is a required component for distribution and GC does rely on redis.

I would like to reopen this PR.

  1. #19988 is correctly addressing the problem.
    And this is obvious that the problem can't be resolved in a reasonable amount of time as it depends on docker distribution.

  2. This is a fix for an issue, that can't be fixed otherwise.

  3. Looking at this Helm chart in retrospect, it is not correctly designed (Most of the charts unfortunate aren't). The Configuration options are only exposed via variables without offering an escape hatch, to bypass values and just set whatever is needed directly in the config file.

@Vad1mo Vad1mo reopened this Mar 7, 2024
Copy link

github-actions bot commented May 7, 2024

This PR is being marked stale due to a period of inactivty. If this PR is still relevant, please comment or remove the stale label. Otherwise, this PR will close in 30 days.

@github-actions github-actions bot added the Stale label May 7, 2024
@roblabla
Copy link
Author

roblabla commented May 7, 2024

Still relevant.

@github-actions github-actions bot removed the Stale label May 8, 2024
@roblabla
Copy link
Author

It's been a while, and the issue is still not fixed upstream. Could this get merged in the interim so a workaround is at least available?

@roblabla
Copy link
Author

Bump?

Copy link

This PR is being marked stale due to a period of inactivty. If this PR is still relevant, please comment or remove the stale label. Otherwise, this PR will close in 30 days.

@github-actions github-actions bot added the Stale label Sep 21, 2024
@roblabla
Copy link
Author

roblabla commented Oct 7, 2024

This is the last time I will bump this PR for the stalebot's sake. It's been half a year since I submitted this simple 10-line change. The upstream bug is still present, with no fix in sight. This PR provides a working and simple workaround to the problem. I do not understand what could be controversial about it.

I'll put it frankly: it feels insulting to have a bot claim my PR is "stale" and auto-close while it's waiting for review by the repo maintainers. I understand wanting to get rid of old PRs whose authors have disappeared, but this is just not a good solution to this problem.

If this PR gets closed by stalebot in whatever the timespan is, then so be it, I'll assume Harbor does not care about external contributions, and move on.

@Vad1mo Vad1mo added never-stale and removed Stale labels Oct 7, 2024
@m1m1x
Copy link

m1m1x commented Nov 25, 2024

Hi!

I got the exact same issue and the workaround mentioned here is working perfectly.

Could consider to merge this PR please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants