Fix(optimizer)!: Preserve struct-column parentheses for RisingWave dialect #5376

MisterWheatley · 2025-07-08T09:11:08Z

This PR adds logic to the optimizer to preserve (possibly) semantically significant parentheses in the RisingWave dialect, see docs here

A minimal example looks like follows: Assume you have a column struct_col (STRUCT<nested_col INT>) and wish to access the nested column to bring it up to a top-level column. The select statement in RW would be (with quoting):

SELECT
  ("table"."struct_col")."nested_col" AS top_level_col
FROM
  "table"

The logic added checks if the dialect is RisingWave and the given parentheses follow the pattern (<exp>).<identifier> to preserve semantically significant parentheses.

In addition this PR contains a minor refactor of the complicated conditional to instead use 2 guard clauses for better clarity.

Edit: Relevant Slack discussion: https://tobiko-data.slack.com/archives/C0448SFS3PF/p1751957408688319

…lect Added logic to preserve (possible) semantically significant parentheses in RisingWave dialect. Refactoring of logic in `simplify.simplify_parens` to use guard-clauses. BREAKING CHANGE: Added dialect as argument to `simplify_parens` function

…rens

sqlglot/optimizer/simplify.py

Co-authored-by: Jo <[email protected]>

MisterWheatley · 2025-07-08T11:26:11Z

Hi @georgesittas

As per our discussion in Slack, I did some testing and dug around in both the docs and source code of RisingWave and decided to move my findings here for better archiving/visibility

It appears that RisingWave are trying to align (partly) with Postgres in accessing Composite Types, docs here in terms of syntax for accessing structs.
As far as I am able to determine, RisingWave follows a BigQuery like syntax with the exception that you need a set of parentheses around each nested level.

Relevant issues from RisingWave are:

Tracking: struct (aka row or record) DataType Support risingwavelabs/risingwave#9694

I am going to do a bit more work by adding more tests that work directly in RisingWave and then ensuring this PR covers those.

I'll let you know if I run into any issues and/or get stuck and could use some help. Thanks again for the fast responses.

georgesittas · 2025-07-08T11:36:36Z

Do you mean there's other issues besides simplifying parentheses? Did you check out this comment by any chance?

MisterWheatley · 2025-07-08T11:56:50Z

Do you mean there's other issues besides simplifying parentheses? Did you check out this comment by any chance?

Partially. It is still fundamentally down to parentheses having semantic meaning for struct types in RisingWave.

However to support e.g. struct expansion and correct qualification of column(s) there is a need for more work.

In essence I (ideally) want to be able to (for a column struct_col STRUCT<int_col INT, vchar_col VARCHAR>) to be able to do the following optimization

SELECT
  (struct_col).*
FROM
  t
)

to the following:

SELECT
  ("t"."struct_col")."int_col" AS "int_col",
  ("t"."struct_col")."vchar_col" AS "vchar_col"
FROM
  "t" as "t"

i.e. we also need to handle these parentheses correctly in the qualify_columns logic somewhat equivalent to BigQuery struct expansion.

Edit: I am unsure if we have any other dependencies in terms of just adding the exp.Dot to the list of exclusions as you suggested above. I will get back to you with regards to that, but I am wary that it would have unintended consequences somewhere in e.g. BigQuery

georgesittas · 2025-07-08T17:23:28Z

Just to make sure– have you verified that there are issues with the qualification logic as it is today? Or was that next on your list, i.e., testing? I don't see anything that is obviously unsupported in your example, e.g., for this query:

SELECT (struct_col).* FROM t

I expect that we're already parsing struct_col as a Column, so its resolution should just work today. The rest of the identifiers in the dot path aren't Column instances, so they should also be left untouched, leading to the form you suggested.

MisterWheatley · 2025-07-09T06:34:52Z

Just to make sure– have you verified that there are issues with the qualification logic as it is today? Or was that next on your list, i.e., testing? I don't see anything that is obviously unsupported in your example, e.g., for this query:
SELECT (struct_col).* FROM t
I expect that we're already parsing struct_col as a Column, so its resolution should just work today. The rest of the identifiers in the dot path aren't Column instances, so they should also be left untouched, leading to the form you suggested.

Glad to confirm and double test. I just tested on a clean install of v27.0.0 with the following snippet:

from sqlglot import parse_one
from sqlglot.optimizer import optimize
import sqlglot
sql = """
SELECT
  (struct_col).*
FROM
  t
"""

schema = {
    "t" : {
        "struct_col" : "struct<nested_int INT, nested_char VARCHAR>"
    }
}

rw_ast = parse_one(sql,dialect='risingwave')
rw_opt = optimize(rw_ast,schema=schema,dialect='risingwave')
print(rw_opt)

gives the following result

SELECT "t"."struct_col".* FROM "t" AS "t"

So the qualification sort of works, but there are two issues still that I am working to solve:

Preserve the parentheses as they have semantic meaning here.
Do star expansion equivalent to what is currently done in BigQuery. If you run my toy example with dialect = "bigquery" it gives: SELECT "t"."struct_col"."nested_int" AS "nested_int", "t"."struct_col"."nested_char" AS "nested_char" FROM "t" AS "t"

…t test for RW

… RisingWave

…al function

MisterWheatley · 2025-07-09T11:20:25Z

Hi @georgesittas

I did a lot more testing, but I think I managed to get it to a working stage that I am happy with for another review round.

I am aware of the comment on an old PR here about introducing dialect specifics into the optimizer, when some of the logic should be shareable between e.g. BigQuery and RisingWave in regards to expanding struct.* columns. I deliberately didn't attempt to do a refactor, since that would have increased the complexity and I wanted to keep it as simple as possible

Let me know if you have more feedback and/or need more test cases added and I will get on it asap

georgesittas

@MisterWheatley thanks for the update– left another round of comments.

sqlglot/optimizer/simplify.py

sqlglot/optimizer/qualify_columns.py

tests/test_optimizer.py

sqlglot/optimizer/qualify_columns.py

Co-authored-by: Jo <[email protected]>

…orrect level for RisingWave Updated logic for expanding (struct_col).* expressions in RisingWave to correctly handle the level of nesting. Moved struct expansion tests to tests/fixtures/qualify_columns.sql on behest of maintainers.

MisterWheatley · 2025-07-10T13:46:19Z

Hi again @georgesittas

I have tried to update according to your comments and questions.

Based on your question regarding unpacking, I uncovered a problem with the previous way of doing it, hence why the code is now quite a bit different. High-level the new logic walks "up" the AST from the exp.Column and down the exp.DataType.kind simultaneously to check if they are in alignment to be expanded between AST and STRUCT definition

The re-added casts/type annotations were partly to fix a few mypy errors when running make check. If this is still incorrect, let me know and I can try and refactor to not use casts. I am not an expert in mypy, so I would appreciate a bit of guidance or a pointer to some relevant docs in that case.

I moved all the struct.* tests into the fixture file as you suggested to keep them together, including the existing BigQuery ones.

Again, thanks for the quick response times and looking forward to your feedback on the latest changes

georgesittas

Hey @MisterWheatley, thanks for addressing comments and sorry for the delayed response– I was OOO. The PR looks good to go; I will address the last couple of comments that I left after merging.

georgesittas · 2025-07-15T07:49:46Z

sqlglot/optimizer/qualify_columns.py

+    # find column definition to get data-type
+    dot_column = t.cast(exp.Column, expression.find(exp.Column))


I think it'd be safer if we did an instance check to ensure dot_column is indeed a Column in L612.

georgesittas · 2025-07-15T07:57:50Z

sqlglot/optimizer/qualify_columns.py

+    outer_paren = expression.this
+
+    for struct_field_def in t.cast(exp.DataType, current_struct).expressions:
+        new_identifier = exp.Identifier(this=struct_field_def.name)


We should respect the identifier found in the type here, otherwise we may drop quotes when we shouldn't.

MisterWheatley · 2025-07-15T08:27:53Z

Hey @MisterWheatley, thanks for addressing comments and sorry for the delayed response– I was OOO. The PR looks good to go; I will address the last couple of comments that I left after merging.

I assumed as much. Thanks for the follow up and taking it the last mile, very much appreciated.

georgesittas · 2025-07-15T08:32:10Z

Just a few minor changes on my end: 8f43c5f. Nice work!

MisterWheatley added 2 commits July 8, 2025 08:47

style: Ran formatter and tests. Fixed type annotation for simplify_pa…

575f942

…rens

georgesittas reviewed Jul 8, 2025

View reviewed changes

sqlglot/optimizer/simplify.py Outdated Show resolved Hide resolved

sqlglot/optimizer/simplify.py Show resolved Hide resolved

Fix: Make dialect in simplify_parens optional.

67cb4d1

Co-authored-by: Jo <[email protected]>

MisterWheatley marked this pull request as draft July 8, 2025 11:26

MisterWheatley added 5 commits July 9, 2025 09:28

Fix(optimizer): Tweaks to make simple non-nested star expand pass uni…

f4c4c9f

…t test for RW

Fix(optimizer): Added test for deep nested unpacking for BigQuery and…

1027a4f

… RisingWave

style: Ran formatting check

211c57c

fix: Remove unuses function from RisingWave dialect test

300ced5

docs: updated docstring of new _expand_struct_stars_risingwave intern…

12bf26a

…al function

MisterWheatley requested a review from georgesittas July 9, 2025 11:20

MisterWheatley marked this pull request as ready for review July 9, 2025 11:21

georgesittas reviewed Jul 9, 2025

View reviewed changes

MisterWheatley and others added 2 commits July 10, 2025 08:31

fix: apply suggestions from code review 2

24b16d0

Co-authored-by: Jo <[email protected]>

MisterWheatley requested a review from georgesittas July 10, 2025 13:46

georgesittas reviewed Jul 15, 2025

View reviewed changes

georgesittas merged commit 9e8d3ab into tobymao:main Jul 15, 2025
6 checks passed

		# find column definition to get data-type
		dot_column = t.cast(exp.Column, expression.find(exp.Column))

Fix(optimizer)!: Preserve struct-column parentheses for RisingWave dialect #5376

Fix(optimizer)!: Preserve struct-column parentheses for RisingWave dialect #5376

Conversation

MisterWheatley commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MisterWheatley commented Jul 8, 2025

Uh oh!

georgesittas commented Jul 8, 2025

Uh oh!

MisterWheatley commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georgesittas commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MisterWheatley commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MisterWheatley commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georgesittas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MisterWheatley commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georgesittas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

georgesittas Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

georgesittas Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MisterWheatley commented Jul 15, 2025

Uh oh!

georgesittas commented Jul 15, 2025

Uh oh!

Uh oh!

MisterWheatley commented Jul 8, 2025 •

edited

Loading

MisterWheatley commented Jul 8, 2025 •

edited

Loading

georgesittas commented Jul 8, 2025 •

edited

Loading

MisterWheatley commented Jul 9, 2025 •

edited

Loading

MisterWheatley commented Jul 9, 2025 •

edited

Loading

MisterWheatley commented Jul 10, 2025 •

edited

Loading

georgesittas left a comment •

edited

Loading