You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In cascades framework, physical properties have two features:
derived
required
Derived are the physical properties derived from children, for example, scan from a table whose column 1 is ordered ascend makes the scan node have the derived property as column 1 ordered ascend.
Required are the physical properties required by the query, for example, select * from t1 order by t1.v1 makes the root node having the required physical properties as v1 is required to be ordered.
Once derived properties cannot support all required properties, operator created by enforcer rule is inserted in the operator tree.
And where to insert the operator created by enforcer rule will depend on the calculated costs. (Determine select v1, v2 from t1 order by v1 is transferred to Sort->Projection->TableScan or Projection->Sort->TableScan, more complicated cases include join, filter, subqueries...)
Design
Derived properties are for expression base, which means multiple expressions in one group in optd can have different derived physical properties. (For example, sort-merge join provide the column in order, while hash join not, but they are logically equivalent.)
Required properties are for group base, imagining a query like select * from t1 order by v1. The root node group should have the required properties (ordering = v1).
Problems to be solved
A critical issue is that we should calculate the cost to determine where to put enforcer rule, for example, Select node can choose children which already have the required properties in their derived and the cost is the children expression cost, it can also choose children which not have the derived properties and insert an enforcer rule on top of that, the cost then becomes the enforcer operator + the children cost.
However, as the children representations are placeholder for group id in optd, we cannot directly pick best children (winner) with/without required physical properties. And as the derived properties are for expression based, we might need to traverse all the expressions in the child group.
This change of logic hugely interferes the optimizeGroup task(we need to provide an interface for required physical properties) and optimizeInput task(we need to calculate different winners for different physical properties requirement) and the main memo structure(we need to add physical properties storage for each expr/exprId and add infer derived physical properties when adding new expr to group).
References
CockroachDB's doc(https://github.com/cockroachdb/cockroach/blob/fe36467047fbbf88569fc03cff97fb4d728122f3/pkg/sql/opt/xform/optimizer.go#L319) provides a good example of how to develop physical properties and enforcer rule. It launches two optimizeGroup tasks, one with required properties, one without. The task without required properties need to add the cost of enforcer operator. And it then compares the cost of their result. For the children, it is separating the child group in different properties, so these plans are different: (select G2="ordering: y" G3) and (select G2 G3).
Columnbia's paper provides definition of the winner structure with physical properties. It states as
In Columbia, a winner is also used to store the temporary result of a search. While the costs of physical multi-expressions of a group are being calculated, the cheapest yet found expression is stored as a winner. During the optimization process, the winner is improved and finally the best (cheapest) plan is found. Sometimes, when no physical multi-expression can be found with the required physical property, we store the multi-expression pointer as NULL to indicate no winner for that physical property. Since no winner is also a solution for a search of this sub-problem, this information is memoized and will be useful in the future optimizing process. The following is the definition of data members in a WINNER class in Columbia.
change the winner in groupInfo into an array of winner with its physical properties (derived properties).
Adds context(required properties) to optimizeGroup, optimizeInput, OptimizeExpression.
For group with required properties, call optimizeGroup twice to get two kinds of winner, winner with required properties and winner without required properties + cost of enforcer operator, and storing winner for the required properties for current group.
for each newly added expressions, call infer_physical_properties to created derived physical properties.
The text was updated successfully, but these errors were encountered:
Concept
In cascades framework, physical properties have two features:
Derived are the physical properties derived from children, for example, scan from a table whose column 1 is ordered ascend makes the scan node have the derived property as column 1 ordered ascend.
Required are the physical properties required by the query, for example, select * from t1 order by t1.v1 makes the root node having the required physical properties as v1 is required to be ordered.
Once derived properties cannot support all required properties, operator created by enforcer rule is inserted in the operator tree.
And where to insert the operator created by enforcer rule will depend on the calculated costs. (Determine
select v1, v2 from t1 order by v1
is transferred toSort->Projection->TableScan
orProjection->Sort->TableScan
, more complicated cases include join, filter, subqueries...)Design
Derived properties are for expression base, which means multiple expressions in one group in optd can have different derived physical properties. (For example, sort-merge join provide the column in order, while hash join not, but they are logically equivalent.)
Required properties are for group base, imagining a query like
select * from t1 order by v1.
The root node group should have the required properties (ordering = v1).Problems to be solved
A critical issue is that we should calculate the cost to determine where to put enforcer rule, for example, Select node can choose children which already have the required properties in their derived and the cost is the children expression cost, it can also choose children which not have the derived properties and insert an enforcer rule on top of that, the cost then becomes the enforcer operator + the children cost.
However, as the children representations are placeholder for group id in optd, we cannot directly pick best children (winner) with/without required physical properties. And as the derived properties are for expression based, we might need to traverse all the expressions in the child group.
This change of logic hugely interferes the optimizeGroup task(we need to provide an interface for required physical properties) and optimizeInput task(we need to calculate different winners for different physical properties requirement) and the main memo structure(we need to add physical properties storage for each expr/exprId and add infer derived physical properties when adding new expr to group).
References
CockroachDB's doc(https://github.com/cockroachdb/cockroach/blob/fe36467047fbbf88569fc03cff97fb4d728122f3/pkg/sql/opt/xform/optimizer.go#L319) provides a good example of how to develop physical properties and enforcer rule. It launches two optimizeGroup tasks, one with required properties, one without. The task without required properties need to add the cost of enforcer operator. And it then compares the cost of their result. For the children, it is separating the child group in different properties, so these plans are different:
(select G2="ordering: y" G3)
and(select G2 G3)
.Columnbia's paper provides definition of the winner structure with physical properties. It states as
Calcite is creating subGroups within groups. subGroups are expressions that logically equivalent and have same physical properties. It then optimizeSubGroups and finds winner within each subgroup. (https://github.com/apache/calcite/blob/4823cb7760913f236e7f0f2cb149325b55a3f124/core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java#L84)
Implementation
Generally, my idea is to
The text was updated successfully, but these errors were encountered: