-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running gitea across multiple servers #2959
Comments
@svarlamov Your thoughts are our long-term objective. For sqlite version, it's only suitable for single node. For other databases, you can connect to a database proxy for distribute deployment. And also for reverse proxy load balance and shared session(via Macaron redis session) are both ready. As you mentioned above, there are two enhancements need to do. I think the second is more difficult than the first one. As I know there are no out-of-box go git library which support S3 or other cloud storage system. So we have to implement it ourselves in https://github.com/go-gitea/git. I have sent a PR go-gitea/git#92 to begin the work. In fact, we also need some work on split SSH server or hooks with databases. I have added an issue #1023 and serval PRs to do that but not finished. After that, the SSH server could be divided with HTTP server. If somebody wants to provide public git services based on Gitea, all these things maybe considered. |
And what about just hosting a bunch of dockercontainers with a shared file system over multiple hosts? |
@kolaente but you have to mount the shared file system every container |
@lunny Sounds interesting... So the SSH stuff basically needs to get 'sharded' for lack of a better worded? And what about 'locking' repos so that people don't step over each other? |
@kolaente That's definitely a possibility, however, I'm wondering if that will still have issues working out of the box as you will still need a distributed queue and some sort of locking method I believe... Also, from a scalability perspective, you will likely need to shard those volumes sooner or later anyways |
@lunny After doing some more research on this topic, the following items have come to my attention:
My research also highlighted the Github eng team's blog, where they discuss similar issues and echo many of the same concerns:
Their conclusion can be most simply described as 'sharding', where the repositories ultimately sit on native filesystems (as Gitea currently stores them) and requests are then proxied to the right servers through a (in their case dynamic) system that knows where each repo is. In their design, the repos are actually stored across 3 replicas for high availability, which is pretty useful, especially if you end up running all of this on commodity hardware... This sort of sharding logic seems to fit pretty naturally into Gitea and is probably the most feasible solution that could be implemented once Gitea's other services are ported to multi-host. Separately, LFS would also need to be ported. Based on my current understanding, it is pretty straightforward to migrate the LFS store off of the filesystem and it could either be done through the same sharding method or through a more generic driver-based system where gitea itself doesn't actually bother with the details of how the objects are stored... I would also like to raise a new topic that is relevant to multi-host deploys, and that is multi-host content search! Looks like this would need to use a 3rd party indexing/search platform such as elastic search... And as far as the 'locking' issue is concerned, it seems that, as I mention in point i below, that is managed by a neat custom mutex wrapper that controls these 'pools' -- this would definitely have to be re-written for multi-host support and should rely on an existing distributed tool I'm keen to hear everyone's thoughts on this :) P.S. Below I've also pasted some notes (they don't aim to be exhaustive by any means) of modules that look to be stateful and would need to be modified to support multi-host deployments... a. https://github.com/go-gitea/gitea/blob/master/modules/cron/cron.go <- This seems to assume a single-host setup. For multi-host, this sort of stuff is not usually done within the primary application server (except for internal stuff, such as memory-related jobs...). Perhaps this can be optionally turned off and handled by a separate server? Or moved to a generic service? There are lots of wonderful open source options here b. https://github.com/go-gitea/gitea/blob/master/modules/lfs/content_store.go <- This would need to point to the storage method for LFS, or potentially an LFS driver? c. https://github.com/go-gitea/gitea/blob/master/modules/ssh/ssh.go <- Assumes single-host setup d. https://github.com/go-gitea/gitea/blob/master/modules/cache/cache.go <- Looks like this is good to go with built-in support for 3rd party caching systems configured in the options! e. https://github.com/go-gitea/gitea/blob/master/modules/indexer/repo.go <- Indexing relies on the server's FS... f. https://github.com/go-gitea/gitea/blob/master/modules/mailer/mailer.go <- Likely needs to use a distributed task queuing service, and should probably support retries beyond a few seconds... There are some good tools for the job here that are widely supported g. https://github.com/go-gitea/gitea/tree/master/modules/markup <- Looks like this is bound to the local filesystem h. https://github.com/go-gitea/gitea/blob/master/modules/notification/notification.go <- Seems to be ready to use memcache, so should be fine i. https://github.com/go-gitea/gitea/blob/master/modules/sync/exclusive_pool.go <- This seems to perform the 'locking' functionality that we've been discussing... This would essentially have to get moved up to a higher-level distributed service that would actually manage the state, with this file becoming a wrapper? |
@svarlamov we need a FileSystem or Storage abstract layer. |
@lunny How would that fit with a sharding approach? I'm thinking we would be reading from the local FS in the end, but it's just a matter of how to get the request onto the right machine |
I've quickly tested a multicontainer with shared volume approach (in rancher). It seems it cannot start the webservice when running with multiple instances, I'll do some tests later today. @svarlamov Rancher can abstract the FS for you to AWS for example. For the database you could do something with a galera instance. |
@kolaente Interesting... I will play around with it as well... In regards to the shared volume topic, please have a look at my comments above, namely the discussion of latency, particularly when coupled with Git's object-store architecture. This is what is discussed in Github's engineering blog as well, and they outline why you can't effectively run a git service on top of that sort of a shared FS architecture |
So, after some testing I found the following: when gitea shares the data volume, it cannot start more than one container. But if everything except the But as soon as you start sending forms (for example when creating an issue), you get crfs token errors all over the place... There is some work to do. I also created a PR to add Gitea to the rancher community catalog: rancher/community-catalog#680 |
Did you try changing caching from memory to memcached? Also session storage should be changed from file to something else |
Some benchmarks and CPU numbers putting Gitea behind nginx. This is tested on my FreeNAS machine with Gitea & nginx running in a jail.
Using bombardier on another machine to drive HTTP load. Gitea configured to use mysql and memcached. Nginx configured with 8 worker processes. Baseline idle:
Connecting to gitea directly with 200 connections:
Latency results:
Connecting using nginx as a reverse proxy:
Latency results:
Nginx was able to not just keep up with the load but also served the results in less time. What was interesting was when you started doing a lot more connections. Testing out 2000 connections caused gitea to choke and start returning 5xx codes while nginx kept up (even if it did slow down a bit, latency 90% latency was around 600 ms). |
@jed-frey cool, so apparantly it would already help to loadbalance with some nginx? Have you tried increasing the Guess I'll try some thing running it in parallel when I have time during the holidays. Also, my PR I mentioned earlier got merged - Gitea is now in the "official" Rancher Community Catalog! |
is there any progress on this? |
@imacks Not that I know of. Still seems to be that NFS/sharding is the best way to make this work at the moment |
It could be that an S3 storage engine would work fine if Gitea worked like a write-through cache. Keep a local copy on the server and force push to an S3-git repository every time there's a push to the repository. Keep a write-ahead log so you can account for what has/hasn't been moved into S3 and what is 'cached' on the local host. It might still requires some sharding to prevent mutually exclusive pushes being made, but probably less. |
@jhackettpps I think that the solution you raise is an interesting one, however -- at least in my interpretation of your solution -- it does not really address the issue of a dataset that's too large to practically/economically fit onto disk. So S3 is great if you have a few 100GBs and several servers for HA, but once you have a few dozen TBs then it starts to become less appealing... My understanding is that the Gitaly solution -- which is basically just sharding and they're trying to cache on the machines to optimize Git itself as well -- is the industry standard. Similar to what GH does as well... |
my current workaround is somewhat like sharding, but more hacky. i run multiple gitea containers, then use a homebrewed load balancer to redirect based on the user account requested. thus each container serves a dedicated number of assigned users only, and a user is always served by the same container. a container itself is failsafed by healthcheck restarts only. the caveat is that users between 2 containers can't see each other's repos, but it doesn't matter for me because most users are doing private repos here. |
Is there a tool to stress test Git servers? I've just using HTTP benchmark tools to hammer home pages to see how it loads but a full stress test with cloning and pushing would be helpful. |
Yeah, gitaly is without doubt the best option, I was under the impression
it was quite a lot of work to implement anything around it though (GitLab
have quite a lot of people working on it I expect). Plus, I didn't think
gitea was necessarily aiming to compete at that level.
…On 5 June 2018 at 00:25, Jed Frey ***@***.***> wrote:
Is there a tool to stress test Git servers?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2959 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/Ag22oEZBZz1WsplzLPnROVPfA0GKo0MMks5t5cHegaJpZM4QnXmL>
.
|
@jhackettpps I think your last post captures two important points -- Gitea's core objective is not for large-scale Git and the folks at GitLab have already invested a lot into their solution, which is already in production actually... It'd be cool if Gitea could have storage 'drivers' and be able to use filesystem Git or Gitaly for users who need true HA. I haven't dug too far into the Gitaly APIs, but I believe from early on one of their goals was to make it easy to implement as a replacement for local Git within the Gitlab Rails code... |
I disagree. GitLab is a resource pig. Just one instance is annoying. My script to get it installed on FreeNAS jails was non-trivial. There's a reason I moved to Gitea. Being able to bring that to enterprise would be an amazing asset. |
@svarlamov I symphasize with @jed-frey, GitLab is so resource hungry it makes Oracle look good. I'm not saying that having production-ready gitea readily available should be a top priority, but it should definitely be proposed and considered as a viable step forward. On a related note, is there anything that can be done to make gitea cooperate better with S3/-compatible services, besides just throwing s3fs-fuse and symlinks at it? (Not trying to say S3 should be the solution for public/large-org instances, I just really wanna see how badly I can abuse my DO droplets). |
In April of 2017 I setup a cluster of systems using AWS (See this cf script) backed by an EFS share (a managed NFS share). The 3 node cluster made use of a msyql compatible RDS instance and a Redis instance for session data. Each node runs Nginx as a proxy for gitea and there was an AWS load balancer in front of the nodes. The load balancer also managed port 22, so you've got 1 IP for the whole setup. I had configured the load balancer to use sticky sessions as well, so a single user is glued to a single node during their session. The system worked "just fine" except that the main repo page which loads the commit messages for the files could take a long time to load if there were thousands of commits in the repo. Because of that, we did #3452. Now that 1.5 has that included, I'm looking at testing that AWS cluster again to see if performance is sufficient for production use. Other than that issue HTTP and git actions operated at acceptable speeds. It's true that they were not as fast as the native filesystem but they were within the bounds of user expectation. |
@jag3773 your setup is similar to mine. i had an incident on aws last week where gitea went mad with disk writes and aws charged me for that. i couldn't figure out what went wrong though. |
Hi everyone. Has somebody tried GlusterFS? So basically, we always do writes into a single datacenter but on different instances it replicates in that datacenter and asynchronously replicates into another datacenter. As we have a distributed volume we only ask "some" instances during the write. With such setup we can control the amount of replicas + have distributed volumes (to increase bandwidth and storage) + can have asynchronous replication in other datacenters in case the whole datacenter is been shutdown. Locking worked pretty well. I've tested scenarios like:
But again, I didn't test it in a real environment and don't have any numbers. Also, i've tried such setup with gogs but I am planing to move to gitea. Possibly this info may be helpful for someone. |
btw. for that to work, gitea probably also needs to have a good cache/session storage that works good across environments. currently redis is a good way to make it work, but when using k8s with helm redis-ha it won't work, since the redis client is not sentinel aware, but I've raised two issues at macaron (web framework that gitea is using) to support it: |
I setup a Docker Swarm in the Hetzner Cloud with this plugin and it works super amazing. All my worker are in the same region so when one node dies the Hetzner volume with the data is attached to the other node and then gitea is started. And for the https part I use traefik. Stack files can be found here: https://github.com/ntimo/gitea-docker-swarm |
Still I wonder what's the benefit of such a setup, given that the bottleneck will still be your NFS share. You could argue availability is better but then again: there's no big loss in case of a single node failure when we talk about a container setup. The expected downtime is minimal (container restart time) so the benefit vs. complexity of a distributed setup as outlined in this topic is probably not a real game changer. In the end we're talking about at what layer to replicate / sync the data on. Your current setup is using the filesystem to do the job. This if course is an easy solution since you do not need to do much and it'll work with nearly every application. But it does only scale up, not out. So what I'd like to see for a real scale-out solution is data replication at the application layer. |
@M451 i don't think it makes sense that every application reinvents the wheel and implements it's own data replication. Maybe there are libraries for that? Spontaneous idea: If we would support S3, we could use Minio. |
I have no experience with this myself, but I think you can scale file-system throughput horizontally by clustering the file-system (see GlusterFS, Amazon EFS) |
@davidak I did not purpose to reinvent the wheel. There are many of existing replication and consistancy algorithms (e.g. paxos, raft) and open source implementations of them available. Also there are databases available that implement them (e.g. mongodb). So, no need to reinvent the wheel. All I'm saying that IF the goal is to provide scale-out, then replication at a higher level would be nice. @feluxe and what I just said is pretty much the answer to what you suggest. Sure it's possible to replicate data at a lower level. GlusterFS, Ceph and others exist. Maybe it's even the better option in terms of efficency (to be evaluated). BUT what makes scale-out on layer 7 interesting is that you decouple the application from the backbone infrastructure and this limits the requirements on the infrastructure. E.g. if you want to have a fully stateless version of Gitea some day one possible way would be to dockerize it (I'm aware there are already dockerized versions). IF you implement a feature (e.g. scale out in this case) on a lower layer then what a docker-container encapsulates, then that will be a hard requirement for the docker environment / infrastructure the user operates. Which negates many of the use cases we want to use containers for in the first place (make the application independend of the underlaying infrastructure for all it's features). I'm not suggesting one of the options is better. GlusterFS or any other infra dependency could be a supported alternative and short-term solution before anything on a higher layer get's implemented. You just should be aware of the implications. |
I think Gitea has supported to run across multiple server i.e. https://gitea.com and how to do that is a deployment issue but not a development. I will close this one and you can discuss them on discord or discourse. |
Been working with Gitea for a few days now, and I find the project quite intriguing! Now, I'm wondering what it would take to be able to run gitea on a multi-host setup, likely behind a single nginx load balancer... I think that it would be very useful to enable people to host across servers, first and foremost because of storage limitations to a single machine, as well as setups where we don't want to actually keep the files locally, but we want to hold them on S3 or a similar service... In addition, with larger teams that are online 24/7, availability and CPU begins to become a bottleneck...
Overall, it looks like Gitea's architecture would not be too challenging to distribute given it already has a central DB and stateless HTTP APIs (please correct me here if I'm wrong...), however, it seems to have two key things holding it back right now:
I am excited to hear the maintainers' thoughts on this, perhaps how it fits into the roadmap for Gitea, and what sort of work this would involve. 😄
The text was updated successfully, but these errors were encountered: