Skip to content

Introduce a regex tenant resolver #6713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SungJin1212
Copy link
Member

@SungJin1212 SungJin1212 commented Apr 22, 2025

This PR introduces a regex tenant resolver to allow regex in the X-Scope-OrgID value when the user uses tenant-federation feature.
It introduces two flags, tenant-federation.regex-matcher-enabled and tenant-federation.user-sync-interval.

  • The tenant-federation.regex-matcher-enabled enables the regex resolver, which allows regex to the X-Scope-OrgID value.
  • The tenant-federation.user-sync-interval specifies how frequently to scan users. The scanned users are used to calculate matched tenantIDs.

The regex matching rule follows the Prometheus regex matcher (=~), See here.

For example, if there are 3 tenants, whose IDs are user-1, user-2, and user-3. We can set X-Scope-OrgID to user-.+ to query whole tenants.
Also, we can use an existing way like setting user-1|user-2|user-3 to X-Scope-OrgID.

It reuses userScanner to find considered tenant IDs. So, only tenants who uploaded blocks are subject to regex resolution.

Which issue(s) this PR fixes:
Fixes #6588

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@SungJin1212 SungJin1212 marked this pull request as draft April 22, 2025 01:29
@SungJin1212 SungJin1212 force-pushed the Support-regex-to-tenant-federation branch 8 times, most recently from 1b8963e to e6a4222 Compare April 22, 2025 06:02
@SungJin1212 SungJin1212 marked this pull request as ready for review April 22, 2025 06:27
@SungJin1212
Copy link
Member Author

@CharlieTLe
Could you take a look when you have time?

@SungJin1212 SungJin1212 force-pushed the Support-regex-to-tenant-federation branch from e6a4222 to 63f97d2 Compare April 24, 2025 06:46
// because if the # of matched tenantIDs is only one, `X-Scope-OrgID` header is
// set to input regex.
byPassForSingleQuerier = false
tenant.WithDefaultResolver(tenantfederation.NewRegexResolver(prometheus.DefaultRegisterer, t.Cfg.TenantFederation.UserSyncInterval, util_log.Logger, t.Distributor.AllUserStats))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a better way to gather all users. Calling t.Distributor.AllUserStats seems a bit expensive just to get user IDs.

And it cannot cover users that don't ingest anymore but maybe their data still present on long term storage.

Copy link
Member Author

@SungJin1212 SungJin1212 Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it cannot cover users that don't ingest anymore but maybe their data still present on long term storage.

Thanks for catching.
How about utilizing the userScanner?

Copy link
Member Author

@SungJin1212 SungJin1212 Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling t.Distributor.AllUserStats seems a bit expensive just to get user IDs.

Do you have any good ideas?

Copy link
Member

@CharlieTLe CharlieTLe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 3, 2025
@SungJin1212 SungJin1212 force-pushed the Support-regex-to-tenant-federation branch from 63f97d2 to 605f91a Compare May 7, 2025 02:31
@SungJin1212 SungJin1212 force-pushed the Support-regex-to-tenant-federation branch from 605f91a to d1b80e0 Compare May 15, 2025 09:00
@SungJin1212
Copy link
Member Author

@CharlieTLe
I changed not to attach __tenant_id__ label when the number of matched tenants is one (same behavior when the user uses the multi-resolver).

@SungJin1212 SungJin1212 force-pushed the Support-regex-to-tenant-federation branch 2 times, most recently from b7ef458 to 7321e4b Compare June 12, 2025 12:02
@SungJin1212 SungJin1212 force-pushed the Support-regex-to-tenant-federation branch 6 times, most recently from eaeb19e to 1135b06 Compare June 13, 2025 02:27
CHANGELOG.md Outdated
@@ -11,6 +11,9 @@
* [FEATURE] Ingester: Support out-of-order native histogram ingestion. It automatically enabled when `-ingester.out-of-order-time-window > 0` and `-blocks-storage.tsdb.enable-native-histograms=true`. #6626 #6663
* [FEATURE] Ruler: Add support for percentage based sharding for rulers. #6680
* [FEATURE] Ruler: Add support for group labels. #6665
* [FEATURE] Query federation: Introduce a regex tenant resolver to allow regex in `X-Scope-OrgID` value. #6713
- Add a `tenant-federation.regex-matcher-enabled` flag. If it enabled, user can input regex to `X-Scope-OrgId`, the matched tenantIDs are automatically involved.
- Add a `tenant-federation.user-sync-interval` flag, it specifies how frequently to scan users. The scanned users are used to calculate matched tenantIDs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document it as experimental feature in the doc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should document that the user discovery is based on scanning block storage so new users are only available every 2h (assuming blocks are only uploaded every 2h).

return nil, errors.Wrap(err, "failed to create the bucket client")
}

userScanner, err := users.NewScanner(cfg, bucketClient, logger, reg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to wrap the registry with the component name? I think you will get duplicate metrics registration issue as we initialize multiple user scanner in different components. This can fail if running in single binary mode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safe, but it's better to add the component name.

Help: "Number of discovered users.",
})

go r.updateUsersLoop(userSyncInterval, userScanTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maintain this using a service rather than a simple goroutine?

defer cancel()

// only active users are considered
activeUsers, _, _, err := r.userScanner.ScanUsers(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include deleting users as well. Store Gateway still load blocks from deleting users and it should be expected to query those users within the clean up delay.

userScanTimeout = time.Second * 30
)

type RegexResolver struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some comments of what RegexResolver and RegexValidator does? I am not too familiar with this code path so unsure their difference

@SungJin1212 SungJin1212 force-pushed the Support-regex-to-tenant-federation branch from 1135b06 to 1a3d2f1 Compare June 13, 2025 11:50
@SungJin1212 SungJin1212 force-pushed the Support-regex-to-tenant-federation branch from 1a3d2f1 to 8737cf6 Compare June 13, 2025 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/querier lgtm This PR has been approved by a maintainer size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for dynamic tenant selection in query federation
3 participants