add logical plan distributed optimizer to query frontend #6974

rubywtl · 2025-08-15T16:52:41Z

What this PR does:
Implements a distributed optimizer in the distributed execution middleware that introduces remote nodes in the logical plan to mark fragmentation points. It focuses on binary aggregation queries and includes un-marshal functionality to maintain remote node integrity in the processing pipeline

Which issue(s) this PR fixes:
For distributed query execution feature

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: rubywtl <[email protected]>

yeya24 · 2025-08-15T17:42:49Z

pkg/distributed_execution/unmarshal.go

Let's try to find a better name. : ) unmarshal is just a method and shouldn't be a file name

yeya24 · 2025-08-15T17:43:53Z

pkg/distributed_execution/distributed_optimizer_test.go

+		start           time.Time
+		end             time.Time
+		step            time.Duration
+		remoteExecCount int


Let's compare result logical plan instead of remoteExecCount. remoteExecCount can be misleading as remote node might be added to the wrong place

yeya24 · 2025-08-15T17:46:03Z

pkg/distributed_execution/distributed_optimizer_test.go

+		query           string
+		start           time.Time
+		end             time.Time
+		step            time.Duration


Nit. We don't have to parameterize start, end and step if they don't matter much in this test. We can just hardcode when creating the plan

yeya24 · 2025-08-15T17:49:26Z

pkg/querier/tripperware/distributed_query.go

@@ -70,7 +72,14 @@ func (d distributedQueryMiddleware) newLogicalPlan(qs string, start time.Time, e
 	}
 	optimizedPlan, _ := logicalPlan.Optimize(logicalplan.DefaultOptimizers)


After #6873, we should use the configured the optimizers instead of default optimizers.

yeya24 · 2025-08-15T17:50:19Z

pkg/distributed_execution/unmarshal.go

+	return unmarshalNode(data)
+}
+
+func unmarshalNode(data []byte) (logicalplan.Node, error) {


Let's add some comment explaining why we need to copy the deserialize logic from thanos engine to Cortex

yeya24 · 2025-08-15T17:52:56Z

pkg/distributed_execution/remote_node.go

+)
+
+type NodeType = logicalplan.NodeType
+type Node = logicalplan.Node


Those seems not needed. When you return in the function you can just specify return type to be logicalplan.NodeType and logicalplan.Node

yeya24 · 2025-08-15T17:58:57Z

pkg/distributed_execution/remote_node.go

+
+type Remote struct {
+	Op   parser.ItemType
+	Expr Node `json:"-"`


Do we need Op?

yeya24 · 2025-08-15T18:00:50Z

pkg/distributed_execution/remote_node.go

+	FragmentAddr string
+}
+
+func NewRemoteNode() Node {


This might need to take expr as a parameter

yeya24 · 2025-08-15T18:02:25Z

pkg/distributed_execution/remote_node.go

+	return []*Node{&r.Expr}
+}
+func (r *Remote) String() string {
+	return fmt.Sprintf("%s%s", r.Op.String(), r.Expr.String())


We need to mention the node name remote. Maybe similar to what Thanos has fmt.Sprintf(remote(%s), r.Expr.String())

yeya24 · 2025-08-15T21:57:46Z

pkg/distributed_execution/distributed_optimizer.go

+				temp := (*child).Clone()
+				*child = NewRemoteNode()
+				*(*child).Children()[0] = temp
+			}


Even though it is just a dummy optimizer, we should probably add constraints to only mark as remote node if the child has aggregation. We don't want to optimize queries like up + up as each child returns raw data instead of aggregated data

add logical plan distributed optimizer to query frontend

46f063d

Signed-off-by: rubywtl <[email protected]>

pull-request-size bot added the size/XL label Aug 15, 2025

dosubot bot added the component/query-frontend label Aug 15, 2025

yeya24 reviewed Aug 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add logical plan distributed optimizer to query frontend #6974

add logical plan distributed optimizer to query frontend #6974

Uh oh!

rubywtl commented Aug 15, 2025

Uh oh!

yeya24 Aug 15, 2025

Uh oh!

yeya24 Aug 15, 2025

Uh oh!

yeya24 Aug 15, 2025

Uh oh!

yeya24 Aug 15, 2025

Uh oh!

yeya24 Aug 15, 2025

Uh oh!

yeya24 Aug 15, 2025

Uh oh!

yeya24 Aug 15, 2025

Uh oh!

yeya24 Aug 15, 2025

Uh oh!

yeya24 Aug 15, 2025

Uh oh!

yeya24 Aug 15, 2025

Uh oh!

Uh oh!

		@@ -70,7 +72,14 @@ func (d distributedQueryMiddleware) newLogicalPlan(qs string, start time.Time, e
		}
		optimizedPlan, _ := logicalPlan.Optimize(logicalplan.DefaultOptimizers)

add logical plan distributed optimizer to query frontend #6974

Are you sure you want to change the base?

add logical plan distributed optimizer to query frontend #6974

Uh oh!

Conversation

rubywtl commented Aug 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!