-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LDAP purge operation (cleanup_groups()
) times out, so that sssd_be
is terminated by internal watchdog
#7851
Comments
I should add that sssd_be is at 100% single thread cpu while doing the searches, and all commands that depend on sssd are blocked (eg. ps) |
Is there "Child [...] ('...':'...') was terminated by own WATCHDOG" message in /var/log/sssd.log and in system journal that corresponds to this moment? |
Well, this again uses sssd/src/providers/ldap/ldap_id_cleanup.c Line 467 in e2408c2
Probably deref ( sysdb_asq_search() ) can be used here as well...
|
any idea of why:
Thanks ps. you can get full logs in the RH Case 04069463 |
See #7851 (comment)
Because |
Right, this is it - internal watchdog.
|
Right so since we have control over the timeout and we can get a fix for the db read speed backported like in the other issue we should be good? |
Issue is clear, but patches to fix this issue is yet to be written (so there is nothing to backport at the moment). But if I understand correctly, you were using purge to work around #7793, so hopefully it is not that critical for you if lookup latency improved. |
cleanup_groups()
) times out, so that sssd_be
is terminated by internal watchdog
What do you mean saying "the backport" in this context? Anyway, I can't give you any hard promises wrt specific product versions in general. And definitely not now, when patches weren't even reviewed upstream. But if patches get accepted and no regressions are found, sure thing we'll try our best to deliver it. |
Alright thanks appreciate it!
…On Thu, 27 Feb 2025, at 1:19 PM, Alexey A Tikhonov wrote:
> does the backport mean that we'll be able to get #7793 <#7793> fixed in RHEL 8.10 too?
>
What do you mean saying "the backport" in this context?
Anyway, I can't give you any hard promises wrt specific product versions in general. And definitely not now, when patches weren't even reviewed upstream. But if patches get accepted and no regressions are found, sure thing we'll try out best to deliver it.
—
Reply to this email directly, view it on GitHub <#7851 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2BPLNZSJIDBMOQDZTO7TL2R4GGXAVCNFSM6AAAAABXZDWAMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBXHE2DIMBRG4>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
alexey-tikhonov*alexey-tikhonov* left a comment (SSSD/sssd#7851) <#7851 (comment)>
> does the backport mean that we'll be able to get #7793 <#7793> fixed in RHEL 8.10 too?
>
What do you mean saying "the backport" in this context?
Anyway, I can't give you any hard promises wrt specific product versions in general. And definitely not now, when patches weren't even reviewed upstream. But if patches get accepted and no regressions are found, sure thing we'll try out best to deliver it.
—
Reply to this email directly, view it on GitHub <#7851 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC2BPLNZSJIDBMOQDZTO7TL2R4GGXAVCNFSM6AAAAABXZDWAMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBXHE2DIMBRG4>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Alexei, if improving the performance of the purge is so problematic, it's acceptable for us to clear the database completely, but we don't want to have a hard service restart that would cause any requests in the time interval that the service is down to fail. Essentially if there was a database reset option that simply purged the whole database with a locking operation (like the current purge) that just kept the clients waiting while the db is reset that would be fine. Thanks |
This is related to #7793 as discussed with Alexei Tikonov, using the latest patched version (v3) related to that Bug
Steps to reproduce:
entry_cache_timeout
entry_cache_user_timeout
entry_cache_group_timeout
to expire the entries
Observed behaviour:
[be[ad.dneg.com]] [cleanup_groups] (0x1000): Searching with: ...
each search operation takes 1/2 seconds
This is especially bad since even if it didn't crash the purge operation seems to be blocking any request to sssd?
I will open a case on the RH support portal and upload related backend lvl 9 logs
Thanks
The text was updated successfully, but these errors were encountered: