Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write NDVs for Iceberg table to the Puffin files. #56274

Open
amoghmargoor opened this issue Feb 25, 2025 · 1 comment
Open

Write NDVs for Iceberg table to the Puffin files. #56274

amoghmargoor opened this issue Feb 25, 2025 · 1 comment
Assignees
Labels
type/enhancement Make an enhancement to StarRocks

Comments

@amoghmargoor
Copy link

amoghmargoor commented Feb 25, 2025

Enhancement

Currently, the statistics computed for Iceberg table will be written to native table external_column_statistics but to share it between clusters is not ideal. It needs to be computed for every cluster which needs to access the same Iceberg table. Ideally it will be great if in addition to writing statistics to external_column_statistics it also writes to puffin file. Starrocks currently can read NDVs from puffin file: enable_read_iceberg_puffin_ndv. So it would be great to even write to it. As other engine like Trino is already doing it for Iceberg format, it will be good from standardisation purpose too.

As data sketches are being used to capture NDVs in puffin file, incremental update of NDVs become possible when inserting data.

@amoghmargoor amoghmargoor added the type/enhancement Make an enhancement to StarRocks label Feb 25, 2025
@amoghmargoor
Copy link
Author

I am taking an initial look at the code changes needed for it.

@dirtysalt dirtysalt self-assigned this Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Make an enhancement to StarRocks
Projects
None yet
Development

No branches or pull requests

2 participants