You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the statistics computed for Iceberg table will be written to native table external_column_statistics but to share it between clusters is not ideal. It needs to be computed for every cluster which needs to access the same Iceberg table. Ideally it will be great if in addition to writing statistics to external_column_statistics it also writes to puffin file. Starrocks currently can read NDVs from puffin file: enable_read_iceberg_puffin_ndv. So it would be great to even write to it. As other engine like Trino is already doing it for Iceberg format, it will be good from standardisation purpose too.
As data sketches are being used to capture NDVs in puffin file, incremental update of NDVs become possible when inserting data.
The text was updated successfully, but these errors were encountered:
Enhancement
Currently, the statistics computed for Iceberg table will be written to native table
external_column_statistics
but to share it between clusters is not ideal. It needs to be computed for every cluster which needs to access the same Iceberg table. Ideally it will be great if in addition to writing statistics toexternal_column_statistics
it also writes to puffin file. Starrocks currently can read NDVs from puffin file:enable_read_iceberg_puffin_ndv
. So it would be great to even write to it. As other engine like Trino is already doing it for Iceberg format, it will be good from standardisation purpose too.As data sketches are being used to capture NDVs in puffin file, incremental update of NDVs become possible when inserting data.
The text was updated successfully, but these errors were encountered: