Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enhance VirtualTable to have expression as value #711

Merged
merged 4 commits into from
Oct 7, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,11 @@ message ReadRel {

// Definition of which type of scan operation is to be performed
oneof read_type {
VirtualTable virtual_table = 5;
VirtualTable virtual_table = 5 [deprecated = true];
LocalFiles local_files = 6;
NamedTable named_table = 7;
ExtensionTable extension_table = 8;
VirtualExpressionTable virtual_expression_table = 9;
}

// A base table. The list of string is used to represent namespacing (e.g., mydb.mytable).
Expand All @@ -91,6 +92,11 @@ message ReadRel {
repeated Expression.Literal.Struct values = 1;
Copy link

@acruise acruise Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than deprecate VirtualTable and create VirtualExpressionTable, would we consider deprecating the values field and adding a new expressions field?

edit: I'm not a fan of Protobuf quirks becoming load-bearing, but this seems to indicate a field can be replaced with a oneof while maintaining binary compatibility. 😅

Copy link
Member

@vbarua vbarua Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: consider this retracted

Expanding on this, I'm almost inclined to have both? Like

 message VirtualTable {
    repeated Expression.Literal.Struct values = 1;
    repeated Expression expression = 2;
}

which emits first the values and then expressions. Makes the common case really easy to produce (literals only).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this. It creates this confusing situation where you have something like

literal1, literal2, expression1, literal3

and it gets stored as

values: [literal1, literal2]
expression: [expresion1, literal3]


> Rather than deprecate VirtualTable and create VirtualExpressionTable, would we consider deprecating the values field and adding a new expressions field?

I don't see how that is better (or clearer for someone new to the format).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have a protobuf check set up so if we can't use the oneof it will let us know. I do know that it is less permissive than I'm used to:

https://github.com/substrait-io/substrait/blob/main/.github/workflows/pr.yml#L35

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protobuf oneof can't have repeated fields link
A oneof cannot be repeated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add oneof as mentioned above so both fields can contain values
I too share concern with @jacques-n having both fields where initial literals are in Expression_Literal and rest in Expression (assuming there is one expression) doesn't seem intuitive .

}

// A table composed of expressions.
message VirtualExpressionTable {
repeated Expression values = 1;
}

// A stub type that can be used to extend/introduce new table types outside
// the specification.
message ExtensionTable {
Expand Down
12 changes: 12 additions & 0 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,18 @@ is encoded as records consisting of literal values.
| -------- | ----------- | -------- |
| Data | Required | Required |

#### Virtual Expression Table

A virtual expression table is a table whose contents are embedded in the plan itself.
Each column is an expression that can be resolved without referencing any input data.
For example, a literal, a function call involving literals, or any other expression that does
not require input.

| Property | Description | Required |
| -------- | ----------- | -------- |
| Data | Required | Required |


#### Named Table

A named table is a reference to data defined elsewhere. For example, there may be a catalog
Expand Down
Loading