-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Sregistries / data replication #387
Comments
What quickly comes to mind is K8s, which I've used to deploy other Django apps so it definitely is possible, and then you'd need a particular configuration to handle the different replication logic. Ping @kkaftan, who was working on this! Is this something y'all could share knowledge for? I was thinking if we come up with a good set of recipes it would be good to have a repository alongside here. Let me know your thoughts! |
Actually @kkaftan and I are working together on some Sregistry stuff and we also did the K8s stuff together :) But for this use case we can't use K8s so we try to come up with something else. As soon we are making progress and think it's ready to share, we will 👍 |
Gotcha. So hmm. Keeping different servers in sync, are you wanting to host your own or use a cloud provider? E.g., if you just remove the Minio image from the docker-compose, you could use a hosted object storage that comes with more bells and whistles than vanilla sregistry. Looking at this minio article, the most annoying bit seems to be:
and the instructions there are in a GUI, but it looks like it can be scripted with
Then you'd probably need some kind of notification the "non-main" registries subscribe to in order to get metadata about the containers/collections and create/delete them appropriately, looks like Minio has notifications: https://docs.min.io/docs/python-client-api-reference.html#set_bucket_notification. The (now offline) Singularity Hub used to use Google Cloud Storage, which is probably what I'd use if I wanted to replicate stuff, because they do that really well. E.g., you can set up functions to trigger on changes to one storage https://cloud.google.com/functions/docs/calling/storage and they have different options to transfer / replicate too https://cloud.google.com/architecture/migration-to-google-cloud-transferring-your-large-datasets#what_is_data_transfer. You could also consider spanner, which I haven't used but I've heard does well with scaling. E.g., you could have all the apps essentially sharing the same database, and then a distributed storage alongside it. Heck, it wouldn't even have to be perfect - e.g., if you find that a particular object is in the main bucket (but not replication A yet) then you can do the transfer of the object on demand. And I'm not as experienced but I'd bet AWS has similar services. Anyhoo, let me know some details about deployment, etc. and we can think further about this! It wouldn't be too terrible to add support for another storage/database backend. It's a pretty neat problem. @opadron have you done anything like this before? I'll definitely keep thinking about it. |
I'm not familiar with Sregistry internals, so I can't really speak to that, but generally, architecting a data replication scheme for any application is going to be an involved process. You can get a lot of that architecting for free if you go with a cloud provider's solution or use MinIO's replication features, but as @vsoch alluded to, you still have to decide the how, when, and what for data replication, and those answers are going to depend heavily on your application and requirements. In my experience, you'd want your application (Sregistry, in this case?) to explicitly support running in a replicated environment, because it needs to be aware of the CAP limitations that you run into when replicating. Running multiple replicas of an app that is not replication-aware and handling the data replication out-of-band rarely works out. |
Hello everyone, we got an update on that. We used MinIO and PostgreSQL Replication mechanisms to achieve what our use case demanded. On our main Sregistry everything works fine - images can be pushed, users can be handled etc. The images and container/user data then are replicated to our "satellite" Sregistry. What we would use this setup for is for situations where we have multiple networks, high demand and big images. That way users could get their images from Sregistries that are closer to them/in the same network and (big) images are not copied over several networks on every pull. If you are interested in the setup, feel free to take a look at our documentation for it (temporarily hosted there): |
This is a cool project! If/when that document is in more final form, if you want to do a writeup to put in the docs for others to try I'd be happy to host it! |
Hello,
a colleague and I would like to run multiple instances of Sregistry and have them synchronized with a "main" Sregistry: images are pushed to a main registry and other registries pull the data from it to make it available in their locations. An example would be running Sregistries in different geographical locations but they should still be able to operate when the connection to the main registry is interrupted.
We thought about using the replication mechanisms of the PostgreSQL DB and MinIO as they seem to be the only components relevant for having the same experience/data at , but it looks like that needs a lot more configuration than what is delivered with Sregistry and we are still figuring out how to include that in the relatively automatic setup of a docker-compose environment.
So, to get to our question: Is there anything that speaks against using replication, something that might break the Sregistry?
Maybe you got some better ideas or already did something similar.
Every input is welcome 👍
The text was updated successfully, but these errors were encountered: