SYSDB: perf improvements in sysdb_add_group_member_overrides(), part 2 #7866

alexey-tikhonov · 2025-03-06T18:59:18Z

Most impactful patch is "Replaced sysdb_search_entry() with sysdb_cache_search_entry()
to avoid sysdb_merge_msg_list_ts_attrs()"
Second impactful is "Avoid logging to the backtrace unconditionally in hot paths"
All other patches optimize code under sss_nss_protocol_fill_members() (those helpers are also used in other cases), but I admit impact is pretty contained (2..3% of time in my test setup), so if you think some of patches make code readability worse - I'm fine to drop those.

Testing with the same setup as in #7841 (comment) but default debug settings:

time SSS_NSS_USE_MEMCACHE=NO getent -s sss group [email protected] > /dev/null

2.11.0-0.250306.171312 (vanilla) : 2.207 .. 2.434

sssd-9.pr7866-06233 (don't read ts) : 1.142 .. 1.271
sssd-9.pr7866-06235 (debug) : 1.035 .. 1.119
sssd-9.pr7866-06249 (fill-members): 1.007 .. 1.086

alexey-tikhonov · 2025-03-11T17:54:39Z

@joakim-tjernlund, since you tested previous PR in this ares, would you be interested to try this as well (it should be applied on the top of #7841)?

alexey-tikhonov · 2025-03-11T17:57:13Z

@marco-kusa, one of original patches in #7841 had to be dropped during review. Would you be able to test this PR in your env ('ignore_group_members = false' code path)?

joakim-tjernlund · 2025-03-11T18:06:29Z

@joakim-tjernlund, since you tested previous PR in this ares, would you be interested to try this as well (it should be applied on the top of #7841)?

Added an top of master on one machine for now

joakim-tjernlund · 2025-03-14T13:47:02Z

@joakim-tjernlund, since you tested previous PR in this ares, would you be interested to try this as well (it should be applied on the top of #7841)?

Added an top of master on one machine for now

Now on several machines(5-10)

alexey-tikhonov · 2025-03-14T14:05:23Z

@joakim-tjernlund, since you tested previous PR in this ares, would you be interested to try this as well (it should be applied on the top of #7841)?

Added an top of master on one machine for now

Now on several machines(5-10)

And what are observations so far?

src/util/usertools.c

aplopez · 2025-03-18T15:59:38Z

I tested this PR against the current master branch (459cc6b) in the following scenario:

Running in the sssd-ci-containers,
Using the IPA server,
2000 users, each with its private group,
1 extra group, all 2000 users are members of it,
A loop calling id for each user.

I wanted to see how SSSD would behave in this case, not expecting any significative improvement. I ran the loop twice for each case. Before each loop, I deleted the logs and cache, and restarted SSSD.

Master

[root@client /]# rm -f /var/log/sssd/* /var/lib/sss/db/*; systemctl restart sssd.service
[root@client /]# time for ((i=2001; i <= 4000; i++)); do id u${i}@ipa.test > /dev/null; done

real	3m24.460s
user	0m4.097s
sys	0m6.317s
[root@client /]# rm -f /var/log/sssd/* /var/lib/sss/db/*; systemctl restart sssd.service
[root@client /]# time for ((i=2001; i <= 4000; i++)); do id u${i}@ipa.test > /dev/null; done

real	3m19.424s
user	0m4.402s
sys	0m6.403s

This PR

[root@client /]# rm -f /var/log/sssd/* /var/lib/sss/db/*; systemctl restart sssd.serviceq
[root@client /]# time for ((i=2001; i <= 4000; i++)); do id u${i}@ipa.test > /dev/null; done

real	3m38.338s
user	0m4.800s
sys	0m6.633s
[root@client /]# rm -f /var/log/sssd/* /var/lib/sss/db/*; systemctl restart sssd.service
[root@client /]# time for ((i=2001; i <= 4000; i++)); do id u${i}@ipa.test > /dev/null; done

real	3m36.469s
user	0m4.819s
sys	0m6.669s

My test is slower with this PR than without it. May I have done something wrong?

alexey-tikhonov · 2025-03-18T16:51:52Z

Before each loop, I deleted the logs and cache

I'm pretty sure most of time is spent in 'sssd_be', not in 'sssd_nss'...
Let me see if I can reproduce this with LDAP.

alexey-tikhonov · 2025-03-18T18:33:57Z

Let me see if I can reproduce this with LDAP.

I can't:

time for ((i=1000001; i <= 1002001; i++)); do id u${i}@ldap.test > /dev/null; done

sssd-2.11.0-0.250314.164005.git459cc6b15.fc42.x86_64

power saver:
- 3m28.476s
- 3m36.098s (light web browsing in parallel)
performance (+perf attached)
- 3m6.477s

sssd-9.pr7866-06249.fc42.x86_64

power saver:
- 3m27.411s
- 3m28.687s
performance (+perf attached)
- 3m7.599s

The only case where I got any difference - while I was web-browsing while running the test on the same laptop.
Flamegraphs also look pretty much the same.

This is inline with my expectations: the only large group in this setup is "1 extra group, all 2000 users are members of it". This group is actually read only once, all other reads hit mem-cache.

If you can reliably reproduce this performance degradation (while making sure overall load of the host stays the same), please capture logs with debug level 9 and microseconds, a perf flamegraph and share those.

alexey-tikhonov · 2025-03-18T18:46:53Z

Still, you can use your setup to actually test code paths being touched in this PR - just resolve this large group in a loop with default debug settings and mem-cache disabled:
time for ((i=1; i < 1000; i++)); do SSS_NSS_USE_MEMCACHE=NO getent -s sss group [email protected] > /dev/null; done

That's what I got with LDAP:

sssd-2.11.0-0.250314.164005.git459cc6b15.fc42.x86_64

1m0.583s
0m59.860s
0m59.597s
1m5.433s
1m0.360s

9.pr7866-06249.fc42

0m22.878s
0m25.566s
0m23.319s
0m23.352s
0m23.477s

alexey-tikhonov · 2025-03-19T08:42:28Z

Testing with the same setup as in #7841 (comment) but default debug settings:

In the #7793 environment results are much more modest: merely 10% perf gain.

That's somewhat expected because in that env - a lot of users but only a (small) fraction are members of a given group - it is search-by-memberof what takes the time, not resulting list processing (what is being optimized in this PR).

(But, as expected, #7872 makes it blazing fast - ~ x600 faster).

aplopez · 2025-03-19T09:29:59Z

Still, you can use your setup to actually test code paths being touched in this PR - just resolve this large group in a loop with default debug settings and mem-cache disabled: time for ((i=1; i < 1000; i++)); do SSS_NSS_USE_MEMCACHE=NO getent -s sss group [email protected] > /dev/null; done

I don't have exactly this setup. In my case (2000 members in the group) I saw no improvement nor degradation.

sumit-bose

Hi,

there is a typo in the "'tmp_ctx' was removed as it wasn't really used anyway" commit message, 'aboid' vs 'avoid'.

bye,
Sumit

src/responder/common/responder_common.c

src/db/sysdb_views.c

Skip function if group->memberUid is empty. In this case there are no user objects in the cache that would have memberOf == group->dn anyway.

Ensure that `get_user_members_recursively()` returns only POSIX users via search filter. This avoids the need to populate and later check SYSDB_UIDNUM attr.

Don't read unneeded attributes from override_dn.

Replaced `sysdb_search_entry()` with `sysdb_cache_search_entry()` to avoid `sysdb_merge_msg_list_ts_attrs()` that isn't needed here (timestamps aren't used anyway).

alexey-tikhonov · 2025-03-20T16:04:10Z

there is a typo in the "'tmp_ctx' was removed as it wasn't really used anyway" commit message, 'aboid' vs 'avoid'.

Thank you, fixed.

if requested debug level isn't set. Meant to be used in hot (performance sensitive) code paths only.

In case of reading a large group (comparable to entire cache) it accounts for some non trivial CPU time (cca ~6..7%)

'tmp_ctx' was removed as it wasn't really used anyway. Code could be changed to make a real use of 'tmp_ctx': to avoid touching '_dom_name' output arg if update of '_shortname' fails. But this is quite unrealistic case and function is in a hot path, so better to avoid unneeded memory manipulations.

Avoid unneeded strlen()'s

Don't use sss_parse_internal_fqname() as domain name copy isn't needed.

Avoid alloc/free tmp_ctx. Not much benefits but a function is in a hot path.

Avoid unnecessary string copy.

Function wasn't used since ed891c0

There were no users of those functions that would need a new copy.

Function is unused since 26c722d

Scan format and alloc string once instead of talloc_strndup_append() for every chunk.

sumit-bose

Hi,

thank you for the updates, ACK.

bye,
Sumit

alexey-tikhonov added Bugzilla branch: sssd-2-9 labels Mar 6, 2025

alexey-tikhonov force-pushed the perf-add-overrides branch 4 times, most recently from 014b4fe to 1973b50 Compare March 10, 2025 10:49

alexey-tikhonov mentioned this pull request Mar 10, 2025

Disk cache failure with large db sizes #7793

Closed

alexey-tikhonov force-pushed the perf-add-overrides branch from d6089f2 to cfa33a1 Compare March 10, 2025 20:37

alexey-tikhonov marked this pull request as ready for review March 11, 2025 13:26

andreboscatto requested review from sumit-bose and aplopez March 11, 2025 13:34

andreboscatto assigned sumit-bose and aplopez Mar 11, 2025

alexey-tikhonov mentioned this pull request Mar 11, 2025

SYSDB: perf improvements in sysdb_add_group_member_overrides() #7841

Closed

alexey-tikhonov mentioned this pull request Mar 12, 2025

SYSDB: perf improvements in sysdb_add_group_member_overrides(), part 3 #7872

Open

alexey-tikhonov added coverity Trigger a coverity scan Waiting for review labels Mar 12, 2025

aplopez reviewed Mar 14, 2025

View reviewed changes

src/util/usertools.c Show resolved Hide resolved

alexey-tikhonov added coverity Trigger a coverity scan and removed coverity Trigger a coverity scan labels Mar 18, 2025

sumit-bose reviewed Mar 20, 2025

View reviewed changes

src/responder/common/responder_common.c Show resolved Hide resolved

src/db/sysdb_views.c Outdated Show resolved Hide resolved

alexey-tikhonov added 5 commits March 20, 2025 16:32

SYSDB: update in sysdb_add_group_member_overrides()

75fcdf1

Skip function if group->memberUid is empty. In this case there are no user objects in the cache that would have memberOf == group->dn anyway.

SYSDB: update in sysdb_add_group_member_overrides()

48ccf44

Ensure that `get_user_members_recursively()` returns only POSIX users via search filter. This avoids the need to populate and later check SYSDB_UIDNUM attr.

SYSDB: debug message fixed

4c8e01c

SYSDB: update in sysdb_add_group_member_overrides()

1b69517

Don't read unneeded attributes from override_dn.

SYSDB: update in get_user_members_recursively()

8dc4a54

Replaced `sysdb_search_entry()` with `sysdb_cache_search_entry()` to avoid `sysdb_merge_msg_list_ts_attrs()` that isn't needed here (timestamps aren't used anyway).

alexey-tikhonov force-pushed the perf-add-overrides branch from d547038 to 4ca7303 Compare March 20, 2025 16:02

alexey-tikhonov requested a review from sumit-bose March 20, 2025 16:04

alexey-tikhonov added 11 commits March 21, 2025 11:23

DEBUG: a new helper that skips backtrace

594c072

if requested debug level isn't set. Meant to be used in hot (performance sensitive) code paths only.

Avoid logging to the backtrace unconditionally in hot paths.

58e2749

In case of reading a large group (comparable to entire cache) it accounts for some non trivial CPU time (cca ~6..7%)

UTIL: sss_parse_internal_fqname() optimization

444fd72

Avoid unneeded strlen()'s

UTIL: sized_domain_name() optimization

f217b01

Don't use sss_parse_internal_fqname() as domain name copy isn't needed.

RESPONDER: sized_output_name() optimization

6f9abf8

Avoid alloc/free tmp_ctx. Not much benefits but a function is in a hot path.

UTIL: sss_output_name() optimization

e6c8294

Avoid unnecessary string copy.

RESPONDER: delete sss_resp_create_fqname()

2b49379

Function wasn't used since ed891c0

UTIL: remake sss_*replace_space() to inplace version

88900ab

There were no users of those functions that would need a new copy.

UTIL: delete sss_fqname()

a67cba7

Function is unused since 26c722d

UTIL: sss_tc_fqname2() optimization

3fbeeee

Scan format and alloc string once instead of talloc_strndup_append() for every chunk.

alexey-tikhonov force-pushed the perf-add-overrides branch from 4ca7303 to 3fbeeee Compare March 21, 2025 10:29

sumit-bose approved these changes Mar 21, 2025

View reviewed changes

alexey-tikhonov added Accepted coverity Trigger a coverity scan Ready to push Ready to push and removed Waiting for review coverity Trigger a coverity scan labels Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYSDB: perf improvements in sysdb_add_group_member_overrides(), part 2 #7866

SYSDB: perf improvements in sysdb_add_group_member_overrides(), part 2 #7866

alexey-tikhonov commented Mar 6, 2025 •

edited

Loading

alexey-tikhonov commented Mar 11, 2025

alexey-tikhonov commented Mar 11, 2025

joakim-tjernlund commented Mar 11, 2025

joakim-tjernlund commented Mar 14, 2025

alexey-tikhonov commented Mar 14, 2025

aplopez commented Mar 18, 2025

alexey-tikhonov commented Mar 18, 2025 •

edited

Loading

alexey-tikhonov commented Mar 18, 2025

alexey-tikhonov commented Mar 18, 2025 •

edited

Loading

alexey-tikhonov commented Mar 19, 2025

aplopez commented Mar 19, 2025

sumit-bose left a comment

alexey-tikhonov commented Mar 20, 2025

sumit-bose left a comment

SYSDB: perf improvements in sysdb_add_group_member_overrides(), part 2 #7866

Are you sure you want to change the base?

SYSDB: perf improvements in sysdb_add_group_member_overrides(), part 2 #7866

Conversation

alexey-tikhonov commented Mar 6, 2025 • edited Loading

alexey-tikhonov commented Mar 11, 2025

alexey-tikhonov commented Mar 11, 2025

joakim-tjernlund commented Mar 11, 2025

joakim-tjernlund commented Mar 14, 2025

alexey-tikhonov commented Mar 14, 2025

aplopez commented Mar 18, 2025

Master

alexey-tikhonov commented Mar 18, 2025 • edited Loading

alexey-tikhonov commented Mar 18, 2025

alexey-tikhonov commented Mar 18, 2025 • edited Loading

alexey-tikhonov commented Mar 19, 2025

aplopez commented Mar 19, 2025

sumit-bose left a comment

Choose a reason for hiding this comment

alexey-tikhonov commented Mar 20, 2025

sumit-bose left a comment

Choose a reason for hiding this comment

alexey-tikhonov commented Mar 6, 2025 •

edited

Loading

alexey-tikhonov commented Mar 18, 2025 •

edited

Loading

alexey-tikhonov commented Mar 18, 2025 •

edited

Loading