feat(Optimizer): eliminate common sub-expressions #659

lokax · 2022-06-06T16:58:45Z

Signed-off-by: lokax m632656684@gmail.com

Implement this feature #658 for projection and aggregate.

I removed projection merge rule in LogicalProjection::new(), because we will need a projection below the current projection in some case.

like this:

> explain select l_extendedprice * (1 - l_discount), l_extendedprice * (1 - l_discount) * (1 + l_tax) from lineitem;
PhysicalProjection:
    InputRef #0
    (InputRef #0 * (1 + InputRef #1))
  PhysicalProjection:
      (InputRef #0 * (1 - InputRef #1))
      InputRef #2
    PhysicalTableScan:
        table #64,
        columns [5, 6, 7],
        with_row_handler: false,
        is_sorted: false,
        expr: None

In the above example, it will evaluate (1 - l_discount) 2 times and l_extendedprice * A 2 times before optimization. This would not be a problem as long as the expression itself is not expensive. However, evaluating a expensive scalar function multiple times could incur a performance overhead. After optimization, common sub-expressions are evaluated only once.

TennyZhuang · 2022-06-06T17:04:07Z

@st1page PTAL

st1page · 2022-06-06T17:25:47Z

src/optimizer/rules/eliminate_cse_rule.rs

+impl Rule for ProjectEliminateCSE {
+    fn apply(&self, plan: PlanRef) -> Result<PlanRef, ()> {
+        let projection = plan.as_logical_projection()?;
+        let mut cse_eliminator = CSEEliminator::new();
+        let mut proj_exprs = projection.project_expressions().to_vec();
+        // Search Phase


it might be better if we add the projection merge rule here. and we can merge the nest projection input in AggregateEliminateCSE too.

it might be better if we add the projection merge rule here. and we can merge the nest projection input in AggregateEliminateCSE too.

Sorry, I don't know what you want to tell me. For simple implementation, it needs multiple consecutive projection operators which play the role of holding temporary variables. In addtition, it seems that consecutive and identical projection operators are produced during column pruning.
Maybe we could:
column pruning --> projection merge(remove identical projection) --> eliminate cse

I think dummy projections (with only InputRefs) are necessary to support CSE for now. To eliminated them, we have to introduce local reference inside operator, or output_indices. cc @st1page

st1page · 2022-06-06T17:49:59Z

src/optimizer/rules/eliminate_cse_rule.rs

+    exprs_maps: Vec<BoundExpr>, // TODO: HashMap
+    counts: Vec<(usize, bool)>,


please add some comments on the fields too.
And why they are not in the same data structure Vec<(BoundExpr, usize, bool)>? I think even when we use hashmap it should be in the same data structure too, like

struct SubExpr{ count: usize. new_projectionindex: Option<usize> }

also some comments on the struct to discribe the general idea of the implementation plz!

also some comments on the struct to discribe the general idea of the implementation plz!

ok 😇

please add some comments on the fields too. And why they are not in the same data structure Vec<(BoundExpr, usize, bool)>? I think even when we use hashmap it should be in the same data structure too, like

struct SubExpr{ count: usize. new_projectionindex: Option<usize> }

It looks better

st1page · 2022-06-06T17:51:39Z

src/optimizer/rules/eliminate_cse_rule.rs

+enum EliminatorState {
+    Search,
+    Collect,
+    Replace,
+    End,
+}


Could you please add some comments here? just similar to the comments on the corresponding functions. and also please remember to use /// instead // as rustdoc

I think State seems not necessaray. What about using different structs to do different phases instead of state transition?

I think State seems not necessaray. What about using different structs to do different phases instead of state transition?

It seems that expr_maps: HashMap<BoundExpr, SubExpr> will be accessed during the search phase and collect phase. Using diffrent structs to access and modify expr_maps maybe look likes wierd? But Replace state and End state can be removed, they are just to avoid me doing something wrong.

It seems that expr_maps: HashMap<BoundExpr, SubExpr> will be accessed during the search phase and collect phase. Using diffrent structs to access and modify expr_maps maybe look likes wierd?

IMHO it makes sense for each phase to process some data and then pass data to the next phase. Using state machine to model the processing pipeline makes the logic of different tasks mixed together and hard to read (And I didn't fully understand it yet) :(

What about:

/// output balabala, modify balabala fn eliminate(exprs: &mut Vec<BoundExpr>) -> Option<Vec<BoundExpr>> { let expr_maps = search(exprs.iter())?; let child_projection_exprs = collect(exprs.iter(), expr_maps); rewrite(exprs, child_projection_exprs); Some(child_projection_exprs) } /// do balabala, output balabala fn search(exprs: &Vec<BoundExpr>) -> Option<HashMap<BoundExpr, SubExpr>> { struct Visitor { expr_maps } impl ExprVisitor for Visitor {...} let visitor = ...; ... visitor.expr_maps } /// do balabala fn collect(exprs: &Vec<BoundExpr>, subexprs: ...) -> Vec<BoundExpr> { struct Visitor {...} impl ExprVisitor for Visitor {...} ... } /// do balabala fn rewrite(exprs: &mut Vec<BoundExpr>, child_projection_exprs) { struct Rewriter {...} ... }

st1page · 2022-06-06T17:55:27Z

src/optimizer/rules/eliminate_cse_rule.rs

+}
+
+struct CSEEliminator {
+    exprs_maps: Vec<BoundExpr>, // TODO: HashMap


maybe the hash map is necessary because the matching algorithm seems O(n^2) now. we can do it in future PRs later.

+1 and I think Vec is not simpler?

src/optimizer/rules/eliminate_cse_rule.rs

xxchan · 2022-06-06T19:22:18Z

src/optimizer/rules/eliminate_cse_rule.rs

+enum EliminatorState {
+    Search,
+    Collect,
+    Replace,
+    End,
+}


I think State seems not necessaray. What about using different structs to do different phases instead of state transition?

xxchan · 2022-06-06T19:23:03Z

src/optimizer/rules/eliminate_cse_rule.rs

+}
+
+struct CSEEliminator {
+    exprs_maps: Vec<BoundExpr>, // TODO: HashMap


+1 and I think Vec is not simpler?

xxchan

We need #593! 🥵

skyzh · 2022-06-07T02:25:18Z

We need #593! 🥵

Next time for sure!

skyzh · 2022-06-09T11:41:40Z

I'm working on a new project https://github.com/risinglightdb/sqlplannertest-rs. With planner test, we can merge such optimizer PRs with more confidence!

skyzh · 2022-06-09T11:41:53Z

Hopefully I can finish it this weekend :)

skyzh · 2022-06-09T15:30:28Z

It's here! #661

skyzh · 2022-06-09T15:31:06Z

Please add planner test cases for affected queries after #661 gets merged :)

lokax · 2022-06-15T16:48:54Z

It's here! #661

Thanks. 😇😇

Signed-off-by: lokax <m632656684@gmail.com>

lokax · 2022-08-24T12:59:23Z

Some things have been modified.

xxchan

Please add some planner tests ❤️

Signed-off-by: lokax <m632656684@gmail.com>

xxchan · 2022-08-26T16:00:55Z

src/optimizer/rules/eliminate_cse_rule.rs

+impl Rule for ProjectEliminateCSE {
+    fn apply(&self, plan: PlanRef) -> Result<PlanRef, ()> {
+        let projection = plan.as_logical_projection()?;
+        let mut cse_eliminator = CSEEliminator::new();
+        let mut proj_exprs = projection.project_expressions().to_vec();
+        // Search Phase


I think dummy projections (with only InputRefs) are necessary to support CSE for now. To eliminated them, we have to introduce local reference inside operator, or output_indices. cc @st1page

xxchan · 2022-08-26T18:16:28Z

src/optimizer/rules/eliminate_cse_rule.rs

+
+/// This is a helper struct for eliminate common sub-expressions
+/// Does not eliminate common aggregate functions
+struct CSEEliminator {


Eliminate CS, not Eliminate CSE 😄

xxchan · 2022-08-26T19:02:32Z

src/optimizer/rules/eliminate_cse_rule.rs

+enum EliminatorState {
+    Search,
+    Collect,
+    Replace,
+    End,
+}


It seems that expr_maps: HashMap<BoundExpr, SubExpr> will be accessed during the search phase and collect phase. Using diffrent structs to access and modify expr_maps maybe look likes wierd?

IMHO it makes sense for each phase to process some data and then pass data to the next phase. Using state machine to model the processing pipeline makes the logic of different tasks mixed together and hard to read (And I didn't fully understand it yet) :(

What about:

/// output balabala, modify balabala fn eliminate(exprs: &mut Vec<BoundExpr>) -> Option<Vec<BoundExpr>> { let expr_maps = search(exprs.iter())?; let child_projection_exprs = collect(exprs.iter(), expr_maps); rewrite(exprs, child_projection_exprs); Some(child_projection_exprs) } /// do balabala, output balabala fn search(exprs: &Vec<BoundExpr>) -> Option<HashMap<BoundExpr, SubExpr>> { struct Visitor { expr_maps } impl ExprVisitor for Visitor {...} let visitor = ...; ... visitor.expr_maps } /// do balabala fn collect(exprs: &Vec<BoundExpr>, subexprs: ...) -> Vec<BoundExpr> { struct Visitor {...} impl ExprVisitor for Visitor {...} ... } /// do balabala fn rewrite(exprs: &mut Vec<BoundExpr>, child_projection_exprs) { struct Rewriter {...} ... }

xxchan · 2022-08-26T19:08:28Z

tests/planner_test/eliminate_cse.planner.sql

+-- keep short circuit
+EXPLAIN SELECT x + 5 < y AND x + y < 3, x + y FROM test;


Could you elaborate a bit what is "short circuit" here? I think x + y should always be computed, so there's no short circuit?

lokax · 2023-02-01T15:16:20Z

Thanks for the review. I have no passion to finish it. And the optimizer has been refactored. So I closed this pr.

wangrunji0408 · 2023-02-01T16:40:41Z

Feel free to retry it on the new optimizer if you are interested. It would be easier than this one in my view.
Thank you anyway! 🥰

TennyZhuang requested review from st1page and xxchan June 6, 2022 17:04

st1page reviewed Jun 6, 2022

View reviewed changes

xxchan reviewed Jun 6, 2022

View reviewed changes

lokax force-pushed the main branch from a2cafa5 to 833ecb8 Compare August 2, 2022 06:33

feat(Optimizer): eliminate common sub-expressions

a31a587

Signed-off-by: lokax <m632656684@gmail.com>

lokax force-pushed the main branch from 833ecb8 to 4b3279a Compare August 2, 2022 08:21

feat(Optimizer): eliminate cse

148b0ed

Signed-off-by: lokax <m632656684@gmail.com>

lokax force-pushed the main branch from 4b3279a to 148b0ed Compare August 2, 2022 08:25

lokax requested review from xxchan and st1page August 24, 2022 12:59

xxchan reviewed Aug 24, 2022

View reviewed changes

test(optimizer): planner test for eliminate cse

73f80da

Signed-off-by: lokax <m632656684@gmail.com>

xxchan reviewed Aug 26, 2022

View reviewed changes

lokax closed this Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(Optimizer): eliminate common sub-expressions #659

feat(Optimizer): eliminate common sub-expressions #659

lokax commented Jun 6, 2022

TennyZhuang commented Jun 6, 2022

st1page Jun 6, 2022

lokax Jun 15, 2022

xxchan Aug 26, 2022

st1page Jun 6, 2022

xxchan Jun 6, 2022

lokax Jun 15, 2022

st1page Jun 6, 2022

xxchan Jun 6, 2022

lokax Jun 15, 2022

xxchan Aug 26, 2022

st1page Jun 6, 2022

xxchan Jun 6, 2022

xxchan Jun 6, 2022

xxchan Jun 6, 2022

xxchan left a comment

skyzh commented Jun 7, 2022

skyzh commented Jun 9, 2022

skyzh commented Jun 9, 2022

skyzh commented Jun 9, 2022 •

edited

Loading

skyzh commented Jun 9, 2022

lokax commented Jun 15, 2022

lokax commented Aug 24, 2022

xxchan left a comment

xxchan Aug 26, 2022

xxchan Aug 26, 2022

xxchan Aug 26, 2022

xxchan Aug 26, 2022

lokax commented Feb 1, 2023

wangrunji0408 commented Feb 1, 2023

		exprs_maps: Vec<BoundExpr>, // TODO: HashMap
		counts: Vec<(usize, bool)>,

		-- keep short circuit
		EXPLAIN SELECT x + 5 < y AND x + y < 3, x + y FROM test;

feat(Optimizer): eliminate common sub-expressions #659

feat(Optimizer): eliminate common sub-expressions #659

Conversation

lokax commented Jun 6, 2022

TennyZhuang commented Jun 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xxchan left a comment

Choose a reason for hiding this comment

skyzh commented Jun 7, 2022

skyzh commented Jun 9, 2022

skyzh commented Jun 9, 2022

skyzh commented Jun 9, 2022 • edited Loading

skyzh commented Jun 9, 2022

lokax commented Jun 15, 2022

lokax commented Aug 24, 2022

xxchan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lokax commented Feb 1, 2023

wangrunji0408 commented Feb 1, 2023

skyzh commented Jun 9, 2022 •

edited

Loading