Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caddy stops working, service shows running but not responsive #6072

Closed
jakanaka opened this issue Jan 29, 2024 · 17 comments
Closed

Caddy stops working, service shows running but not responsive #6072

jakanaka opened this issue Jan 29, 2024 · 17 comments

Comments

@jakanaka
Copy link

Caddy: v2.7.6 h1
OS: Ubuntu 22 LTS

It just stopped working idk why, stopped in a sense that running but not receiving request or not responding, there are a lot of these in the logs

image

@mholt
Copy link
Member

mholt commented Jan 29, 2024

Can you please post the full logs as a text file (we can't really do anything useful with screenshots)

@mholt mholt added the needs info 📭 Requires more information label Jan 29, 2024
@jakanaka
Copy link
Author

Can you please post the full logs as a text file (we can't really do anything useful with screenshots)

there is like 5.5gb of syslog and these are the last 10000 lines from it which is about 1.5mb, is it ok if i share the last 10000 lines ?

@mholt
Copy link
Member

mholt commented Jan 29, 2024

Sure, we can start with that. You can upload a txt file I think.

@jakanaka
Copy link
Author

Sure, we can start with that. You can upload a txt file I think.

log.txt

@mholt
Copy link
Member

mholt commented Jan 29, 2024

Thanks.

Huh, there's still no error message in that. It's all the end of a stack trace. I suspect a data race? Can you go back even further in the logs to get the actual error message?

@jakanaka
Copy link
Author

Sure, we can start with that. You can upload a txt file I think.

sir can you tell me how does caddy handle a huge amount of ddos? can high volume of ddos make it stop like this?

@jakanaka
Copy link
Author

Thanks.

Huh, there's still no error message in that. It's all the end of a stack trace. I suspect a data race? Can you go back even further in the logs to get the actual error message?

i restarted it seems to be working, i will have to look into that huge syslog to find actual error

@mholt
Copy link
Member

mholt commented Jan 29, 2024

Any server, whether it's apache or nginx or caddy, etc, has resource constraints by the OS. And the OS in turn has resource constraints. If you get flooded with traffic every server will crash because your OS will crash it. To mitigate DDoS you should use a CDN service like Cloudflare.

And thanks, if you can get to the bottom (top?) of it, that would be helpful for us.

@mholt
Copy link
Member

mholt commented Jan 29, 2024

In the meantime could you please share the config?

@jakanaka
Copy link
Author

Any server, whether it's apache or nginx or caddy, etc, has resource constraints by the OS. And the OS in turn has resource constraints. If you get flooded with traffic every server will crash because your OS will crash it. To mitigate DDoS you should use a CDN service like Cloudflare.

And thanks, if you can get to the bottom (top?) of it, that would be helpful for us.

i did not notice any cpu spike today, and my system did not crash. i will try to find it. also i will rate limit caddy with the 3rd party plugin i saw. let's see what happens. also i download it from caddy's official repo on ubuntu. it's a fresh install of ubuntu with all the snap stuff removed after installing. wondering if any dependency is missing, should i run docker image?

@jakanaka
Copy link
Author

In the meantime could you please share the config?

its a simple reverse proxy for jellyfin

sub.example.com {
	tls [email protected]
	encode gzip
	reverse_proxy http://127.0.0.1:8096 {
		header_up X-Real-IP {remote_host}
	}
}

@jakanaka
Copy link
Author

Any server, whether it's apache or nginx or caddy, etc, has resource constraints by the OS. And the OS in turn has resource constraints. If you get flooded with traffic every server will crash because your OS will crash it. To mitigate DDoS you should use a CDN service like Cloudflare.

And thanks, if you can get to the bottom (top?) of it, that would be helpful for us.

ah, i think i found the issue,,, as i said it is a fresh install, so i did not any sort of tweak to it aside from removing the snap stuff. here is the log from where it started
log2.txt

@jakanaka
Copy link
Author

jakanaka commented Jan 29, 2024

# ulimit -u
1029786
# ps -eLf | wc -l
2497

@mholt
Copy link
Member

mholt commented Jan 29, 2024

Yep, so busy servers will use a lot of threads, and you need to allow it in the OS.

@mholt mholt closed this as not planned Won't fix, can't repro, duplicate, stale Jan 29, 2024
@jakanaka
Copy link
Author

jakanaka commented Jan 29, 2024

Yep, so busy servers will use a lot of threads, and you need to allow it in the OS.

it looks like the caddy's service file has LimitNPROC=512 hardcoded, increasing it to 4k and lets see if it that does fix it.
refs:
https://caddy.community/t/panic-failed-to-create-new-os-thread/17667/3
https://unix.stackexchange.com/a/345596

any more things should i change to handle high volume of traffic, sir?

@francislavoie
Copy link
Member

Related: caddyserver/dist#107

NPROC limits the processes for the entire caddy user. Are you running other programs under that user?

@jakanaka
Copy link
Author

jakanaka commented Jan 29, 2024

Related: caddyserver/dist#107

NPROC limits the processes for the entire caddy user. Are you running other programs under that user?

# ps -U caddy
    PID TTY          TIME CMD
3844211 ?        00:26:53 caddy

i don't think so, at least not at this moment

btw, one more thing i need to mention, not sure if a caddy issue or jellyfin issue cz i don't have nginx running to check if that happens with nginx too. when jellyfin reversed proxied by caddy and when i get connected remotely through the domain (https), jellyfin sends around 1500 requests in the login page and downloads around 50mb of data in total. you can check this in a private window

@mholt mholt removed the needs info 📭 Requires more information label Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants