Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Key addition failed" blocks web interface #61

Open
ygrek opened this issue Jul 19, 2018 · 5 comments
Open

"Key addition failed" blocks web interface #61

ygrek opened this issue Jul 19, 2018 · 5 comments
Labels
bug Something isn't working major

Comments

@ygrek
Copy link
Member

ygrek commented Jul 19, 2018

Original report by diafygi (Bitbucket: diafygi, GitHub: diafygi).


I'm seeing the following logs fairly often in db.log:

2018-07-19 18:31:16 add_keys_merge failed: Eventloop.SigAlarm
2018-07-19 18:31:16 Key addition failed: Eventloop.SigAlarm

When these logs happen, the web interface becomes unresponsive and sks db process spikes to 100% CPU. I guess this key merging is blocking the web interface from serving requests. I don't mind the CPU spike (it's only one thread), but the web unresponsiveness is getting me kicked out of the sks-keyservers.net pool frequently.

Possible solutions:

  • Split key merging into a separate process from serving web requests. I don't mind running another process (maybe something like sks keymerge) that will handle the high CPU spikes.
  • Figure out a way to not spike the CPU so high and not block web requests when merging huge keys. No idea if this is possible given the current code architecture.

Related mailing list threads:

@ygrek
Copy link
Member Author

ygrek commented Jul 20, 2018

Original comment by Pascal Levasseur (Bitbucket: pascal_levasseur, ).


Same issues at sks.bonus-communis.eu.

Can somebody explain us the behavior of sks ?

@ygrek
Copy link
Member Author

ygrek commented Jul 20, 2018

Original comment by Kim Minh Kaplan (Bitbucket: kmkaplan, GitHub: kmkaplan).


I believe you can tune this timeout with -wserver_timeout although it will also allow clients more time.
https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00072.html

@ygrek
Copy link
Member Author

ygrek commented Oct 9, 2018

Original comment by Fleish (Bitbucket: fleish, GitHub: fleish).


I'm not sure extending the web timeout will solve the SigAlarm issue. I'm currently experiencing it and it seems to be related to trying to process certain keys received during the recon process. I've been trying to sort out which one is the issue and came up with these 2 common hashes that were always mentioned in the logs in the run leading up to when a server reported the failure:

2309DD6AF2606DDCD801307FAC024E28

5699F92E6316448E43FB6D18C6F0943F

@ygrek
Copy link
Member Author

ygrek commented Nov 5, 2018

Original comment by Yegor Timoshenko (Bitbucket: yegortimoshenko, GitHub: yegortimoshenko).


This is an unintended fallout from #60.

@ygrek
Copy link
Member Author

ygrek commented Nov 19, 2018

Original comment by Yegor Timoshenko (Bitbucket: yegortimoshenko, GitHub: yegortimoshenko).


Would like to quote https://lists.nongnu.org/archive/html/sks-devel/2018-11/msg00074.html here. If someone wants to momentarily fix the issue, you can apply the following patch: https://lists.nongnu.org/archive/html/sks-devel/2018-07/msg00053.html

However, all the poison key that's causing the issue is it's just a normal key with a lot of user packets where 5-10MB chunks of user packets were sent to different keyservers (see #60 for repro). Anyone can generate it.

I'm not really sure how to fix the root cause of this issue, that recon goes on and on trying to merge this key. This issue in its current form means we can cause complete denial of service to any key we want to target, say we can make it so server won't be able to sync real (i.e. signed) user packets.

Even if we check for signatures (see #41), that same attack could have used cryptographically sound signature packets and deny user any further changes to the key, destroying the network at the same time.

If anyone has suggestions how to fix this that don't devolve into denial of service on any level (be it per-key or whole network), please tell!

@ygrek ygrek added major bug Something isn't working labels May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working major
Projects
None yet
Development

No branches or pull requests

1 participant