-
Notifications
You must be signed in to change notification settings - Fork 829
scheduler: fragment queue and querier pick-up coordination #6968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
scheduler: fragment queue and querier pick-up coordination #6968
Conversation
pkg/cortex/modules.go
Outdated
@@ -414,7 +416,12 @@ func (t *Cortex) initQuerier() (serv services.Service, err error) { | |||
|
|||
t.Cfg.Worker.MaxConcurrentRequests = t.Cfg.Querier.MaxConcurrent | |||
t.Cfg.Worker.TargetHeaders = t.Cfg.API.HTTPRequestHeadersToLog | |||
return querier_worker.NewQuerierWorker(t.Cfg.Worker, httpgrpc_server.NewServer(internalQuerierRouter), util_log.Logger, prometheus.DefaultRegisterer) | |||
ipAddr, err := ring.GetInstanceAddr(t.Cfg.Alertmanager.ShardingRing.InstanceAddr, t.Cfg.Alertmanager.ShardingRing.InstanceInterfaceNames, util_log.Logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using alertmanager config here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gPRC params I needed are under RingConfig
struct, which is called ShardedRing
here, but it doesn't exist under querier
[update] I will add new field (ring configs) for querier 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm I don't think we want to add a Ring for querier. We just need the configurations for the addresses and interface, etc
@@ -0,0 +1,21 @@ | |||
package distributed_execution | |||
|
|||
type FragmentKey struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments for public exposed types
|
||
type FragmentKey struct { | ||
queryID uint64 | ||
fragmentID uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not a big deal... But fragment ID doesn't need to be a uint64 type. We don't expect to have that many fragments as it is already scoped with query ID
fragmentID uint64 | ||
} | ||
|
||
func MakeFragmentKey(queryID uint64, fragmentID uint64) *FragmentKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to return a pointer for this I think.
|
||
type Fragmenter interface { | ||
Fragment(node logicalplan.Node) ([]Fragment, error) | ||
getNewID() uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be part of the interface? It is very weird to have 1 method in the interface to be public and another one is private.
I would remove this from the interface
} | ||
|
||
func (f *DummyFragmenter) getNewID() uint64 { | ||
return 1 // for dummy plan_fragments testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is just for testing, you can just hardcode to 1 in the Fragment
function
f.mappings[*key] = addr | ||
} | ||
|
||
func (f *FragmentTable) GetMappings(queryID uint64, fragmentIDs []uint64) ([]string, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we find a more descriptive name? It took me a while to understand what mapping it is. If it is getting child querier addresses we find a better name
defer f.mu.Unlock() | ||
|
||
keysToDelete := make([]distributed_execution.FragmentKey, 0) | ||
for key := range f.mappings { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the methods you have, is it easier to change mappings from mappings map[distributed_execution.FragmentKey]string
to map[uint64]map[uint64]string
?
You can find the map by just a lookup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I made the FragmentKey
struct so that it is easier to maintain (for example: if we ever want to change the types for the IDs or add more fields, we dont have to go through the codebase to fix it), and the code will be easier to understand (more literal). This fragment key type is also reused for remote nodes and child-root execution accesses to result cache in future PRs.
|
||
import "github.com/thanos-io/promql-engine/logicalplan" | ||
|
||
type Fragmenter interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to move the Fragmenter to distributed_execution
as fragmentation is specific to remote distribution.
The fragment table can be just moved to scheduler
folder
ac9f641
to
4f75333
Compare
Signed-off-by: rubywtl <[email protected]>
4f75333
to
942674c
Compare
What this PR does:
This PR introduces a Fragmenter interface that splits logical query plans into fragments when distributed execution is enabled. The Fragmenter appends metadata to each fragment for tracking, which the scheduler then uses to route fragments to appropriate queriers. The scheduler maintains a mapping between fragments and querier addresses to track fragment locations across the distributed system.
Which issue(s) this PR fixes:
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]