-
Notifications
You must be signed in to change notification settings - Fork 829
add logical plan distributed optimizer to query frontend #6974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: rubywtl <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to find a better name. : ) unmarshal
is just a method and shouldn't be a file name
start time.Time | ||
end time.Time | ||
step time.Duration | ||
remoteExecCount int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's compare result logical plan instead of remoteExecCount
. remoteExecCount can be misleading as remote node might be added to the wrong place
query string | ||
start time.Time | ||
end time.Time | ||
step time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. We don't have to parameterize start
, end
and step
if they don't matter much in this test. We can just hardcode when creating the plan
@@ -70,7 +72,14 @@ func (d distributedQueryMiddleware) newLogicalPlan(qs string, start time.Time, e | |||
} | |||
optimizedPlan, _ := logicalPlan.Optimize(logicalplan.DefaultOptimizers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #6873, we should use the configured the optimizers instead of default optimizers.
return unmarshalNode(data) | ||
} | ||
|
||
func unmarshalNode(data []byte) (logicalplan.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some comment explaining why we need to copy the deserialize logic from thanos engine to Cortex
) | ||
|
||
type NodeType = logicalplan.NodeType | ||
type Node = logicalplan.Node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those seems not needed. When you return in the function you can just specify return type to be logicalplan.NodeType
and logicalplan.Node
|
||
type Remote struct { | ||
Op parser.ItemType | ||
Expr Node `json:"-"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need Op
?
FragmentAddr string | ||
} | ||
|
||
func NewRemoteNode() Node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might need to take expr as a parameter
return []*Node{&r.Expr} | ||
} | ||
func (r *Remote) String() string { | ||
return fmt.Sprintf("%s%s", r.Op.String(), r.Expr.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to mention the node name remote
. Maybe similar to what Thanos has fmt.Sprintf(remote(%s), r.Expr.String())
temp := (*child).Clone() | ||
*child = NewRemoteNode() | ||
*(*child).Children()[0] = temp | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though it is just a dummy optimizer, we should probably add constraints to only mark as remote node if the child has aggregation. We don't want to optimize queries like up + up
as each child returns raw data instead of aggregated data
What this PR does:
Implements a distributed optimizer in the distributed execution middleware that introduces remote nodes in the logical plan to mark fragmentation points. It focuses on binary aggregation queries and includes un-marshal functionality to maintain remote node integrity in the processing pipeline
Which issue(s) this PR fixes:
For distributed query execution feature
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]