Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYSDB: perf improvements in sysdb_add_group_member_overrides(), part 2 #7866

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

alexey-tikhonov
Copy link
Member

@alexey-tikhonov alexey-tikhonov commented Mar 6, 2025

  • Most impactful patch is "Replaced sysdb_search_entry() with sysdb_cache_search_entry()
    to avoid sysdb_merge_msg_list_ts_attrs()"
  • Second impactful is "Avoid logging to the backtrace unconditionally in hot paths"
  • All other patches optimize code under sss_nss_protocol_fill_members() (those helpers are also used in other cases), but I admit impact is pretty contained (2..3% of time in my test setup), so if you think some of patches make code readability worse - I'm fine to drop those.

Testing with the same setup as in #7841 (comment) but default debug settings:

time SSS_NSS_USE_MEMCACHE=NO getent -s sss group [email protected] > /dev/null

2.11.0-0.250306.171312 (vanilla) : 2.207 .. 2.434

sssd-9.pr7866-06233 (don't read ts) : 1.142 .. 1.271
sssd-9.pr7866-06235 (debug) : 1.035 .. 1.119
sssd-9.pr7866-06249 (fill-members): 1.007 .. 1.086

@alexey-tikhonov
Copy link
Member Author

@joakim-tjernlund, since you tested previous PR in this ares, would you be interested to try this as well (it should be applied on the top of #7841)?

@alexey-tikhonov
Copy link
Member Author

@marco-kusa, one of original patches in #7841 had to be dropped during review. Would you be able to test this PR in your env ('ignore_group_members = false' code path)?

@joakim-tjernlund
Copy link
Contributor

@joakim-tjernlund, since you tested previous PR in this ares, would you be interested to try this as well (it should be applied on the top of #7841)?

Added an top of master on one machine for now

@joakim-tjernlund
Copy link
Contributor

@joakim-tjernlund, since you tested previous PR in this ares, would you be interested to try this as well (it should be applied on the top of #7841)?

Added an top of master on one machine for now

Now on several machines(5-10)

@alexey-tikhonov
Copy link
Member Author

@joakim-tjernlund, since you tested previous PR in this ares, would you be interested to try this as well (it should be applied on the top of #7841)?

Added an top of master on one machine for now

Now on several machines(5-10)

And what are observations so far?

@aplopez
Copy link
Contributor

aplopez commented Mar 18, 2025

I tested this PR against the current master branch (459cc6b) in the following scenario:

  • Running in the sssd-ci-containers,
  • Using the IPA server,
  • 2000 users, each with its private group,
  • 1 extra group, all 2000 users are members of it,
  • A loop calling id for each user.

I wanted to see how SSSD would behave in this case, not expecting any significative improvement. I ran the loop twice for each case. Before each loop, I deleted the logs and cache, and restarted SSSD.

Master

[root@client /]# rm -f /var/log/sssd/* /var/lib/sss/db/*; systemctl restart sssd.service
[root@client /]# time for ((i=2001; i <= 4000; i++)); do id u${i}@ipa.test > /dev/null; done

real	3m24.460s
user	0m4.097s
sys	0m6.317s
[root@client /]# rm -f /var/log/sssd/* /var/lib/sss/db/*; systemctl restart sssd.service
[root@client /]# time for ((i=2001; i <= 4000; i++)); do id u${i}@ipa.test > /dev/null; done

real	3m19.424s
user	0m4.402s
sys	0m6.403s

This PR

[root@client /]# rm -f /var/log/sssd/* /var/lib/sss/db/*; systemctl restart sssd.serviceq
[root@client /]# time for ((i=2001; i <= 4000; i++)); do id u${i}@ipa.test > /dev/null; done

real	3m38.338s
user	0m4.800s
sys	0m6.633s
[root@client /]# rm -f /var/log/sssd/* /var/lib/sss/db/*; systemctl restart sssd.service
[root@client /]# time for ((i=2001; i <= 4000; i++)); do id u${i}@ipa.test > /dev/null; done

real	3m36.469s
user	0m4.819s
sys	0m6.669s

My test is slower with this PR than without it. May I have done something wrong?

@alexey-tikhonov
Copy link
Member Author

alexey-tikhonov commented Mar 18, 2025

Before each loop, I deleted the logs and cache

I'm pretty sure most of time is spent in 'sssd_be', not in 'sssd_nss'...
Let me see if I can reproduce this with LDAP.

@alexey-tikhonov alexey-tikhonov added coverity Trigger a coverity scan and removed coverity Trigger a coverity scan labels Mar 18, 2025
@alexey-tikhonov
Copy link
Member Author

Let me see if I can reproduce this with LDAP.

I can't:

time for ((i=1000001; i <= 1002001; i++)); do id u${i}@ldap.test > /dev/null; done

sssd-2.11.0-0.250314.164005.git459cc6b15.fc42.x86_64

  • power saver:
    • 3m28.476s
    • 3m36.098s (light web browsing in parallel)
  • performance (+perf attached)
    • 3m6.477s

sssd-9.pr7866-06249.fc42.x86_64

  • power saver:
    • 3m27.411s
    • 3m28.687s
  • performance (+perf attached)
    • 3m7.599s

The only case where I got any difference - while I was web-browsing while running the test on the same laptop.
Flamegraphs also look pretty much the same.

This is inline with my expectations: the only large group in this setup is "1 extra group, all 2000 users are members of it". This group is actually read only once, all other reads hit mem-cache.

If you can reliably reproduce this performance degradation (while making sure overall load of the host stays the same), please capture logs with debug level 9 and microseconds, a perf flamegraph and share those.

@alexey-tikhonov
Copy link
Member Author

alexey-tikhonov commented Mar 18, 2025

Still, you can use your setup to actually test code paths being touched in this PR - just resolve this large group in a loop with default debug settings and mem-cache disabled:
time for ((i=1; i < 1000; i++)); do SSS_NSS_USE_MEMCACHE=NO getent -s sss group [email protected] > /dev/null; done

That's what I got with LDAP:

sssd-2.11.0-0.250314.164005.git459cc6b15.fc42.x86_64

  • 1m0.583s
  • 0m59.860s
  • 0m59.597s
  • 1m5.433s
  • 1m0.360s

9.pr7866-06249.fc42

  • 0m22.878s
  • 0m25.566s
  • 0m23.319s
  • 0m23.352s
  • 0m23.477s

@alexey-tikhonov
Copy link
Member Author

Testing with the same setup as in #7841 (comment) but default debug settings:

In the #7793 environment results are much more modest: merely 10% perf gain.

That's somewhat expected because in that env - a lot of users but only a (small) fraction are members of a given group - it is search-by-memberof what takes the time, not resulting list processing (what is being optimized in this PR).

(But, as expected, #7872 makes it blazing fast - ~ x600 faster).

@aplopez
Copy link
Contributor

aplopez commented Mar 19, 2025

Still, you can use your setup to actually test code paths being touched in this PR - just resolve this large group in a loop with default debug settings and mem-cache disabled: time for ((i=1; i < 1000; i++)); do SSS_NSS_USE_MEMCACHE=NO getent -s sss group [email protected] > /dev/null; done

I don't have exactly this setup. In my case (2000 members in the group) I saw no improvement nor degradation.

Copy link
Contributor

@sumit-bose sumit-bose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

there is a typo in the "'tmp_ctx' was removed as it wasn't really used anyway" commit message, 'aboid' vs 'avoid'.

bye,
Sumit

Skip function if group->memberUid is empty.
In this case there are no user objects in the cache that would have
memberOf == group->dn anyway.
Ensure that `get_user_members_recursively()` returns only POSIX users
via search filter. This avoids the need to populate and later check
SYSDB_UIDNUM attr.
Don't read unneeded attributes from override_dn.
Replaced `sysdb_search_entry()` with `sysdb_cache_search_entry()`
to avoid `sysdb_merge_msg_list_ts_attrs()` that isn't needed here
(timestamps aren't used anyway).
@alexey-tikhonov
Copy link
Member Author

there is a typo in the "'tmp_ctx' was removed as it wasn't really used anyway" commit message, 'aboid' vs 'avoid'.

Thank you, fixed.

if requested debug level isn't set. Meant to be used in hot (performance sensitive)
code paths only.
In case of reading a large group (comparable to entire cache) it accounts
for some non trivial CPU time (cca ~6..7%)
'tmp_ctx' was removed as it wasn't really used anyway.

Code could be changed to make a real use of 'tmp_ctx': to avoid
touching '_dom_name' output arg if update of '_shortname' fails.
But this is quite unrealistic case and function is in a hot path,
so better to avoid unneeded memory manipulations.
Don't use sss_parse_internal_fqname() as domain name copy
isn't needed.
Avoid alloc/free tmp_ctx. Not much benefits but a function
is in a hot path.
Avoid unnecessary string copy.
There were no users of those functions that would need a new copy.
Function is unused since 26c722d
Scan format and alloc string once instead of talloc_strndup_append()
for every chunk.
Copy link
Contributor

@sumit-bose sumit-bose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

thank you for the updates, ACK.

bye,
Sumit

@alexey-tikhonov alexey-tikhonov added Accepted coverity Trigger a coverity scan Ready to push Ready to push and removed Waiting for review coverity Trigger a coverity scan labels Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants