You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From v0.6.0 onwards, the structure of the Table is composed by files that contain multiple blocks, each of them belonging to the same or different cubes. This is part of the Multiblock format, that allowed Qbeast to balance the file layout without losing indexing benefits.
Now, blocks help us locate a particular cube on the file, but a single block is not addressable/retrievable from the spark reader. Although we are using Delta File Skipping to discard data based on min/max, we are not supporting such fine-grained search when Sampling is applied.
This change requires some work regarding #175 . Datasource V2 is more extensible and allows us to implement our reader. In this case, the reader should be designed to skip entire groups of rows based on the block number.
PS: This is something that @alexeiakimov had tried in previous issues, but some other priorities were raised.
TODOs:
Analyze how to make blocks addressable from a Parquet File.
Implement Datasource V2 for Qbeast
Make a PoC
Develop the feature and test
The text was updated successfully, but these errors were encountered:
From v0.6.0 onwards, the structure of the Table is composed by files that contain multiple
blocks
, each of them belonging to the same or different cubes. This is part of the Multiblock format, that allowed Qbeast to balance the file layout without losing indexing benefits.Now,
blocks
help us locate a particular cube on the file, but a single block is not addressable/retrievable from the spark reader. Although we are using Delta File Skipping to discard data based on min/max, we are not supporting such fine-grained search when Sampling is applied.This change requires some work regarding #175 . Datasource V2 is more extensible and allows us to implement our reader. In this case, the reader should be designed to skip entire groups of rows based on the block number.
PS: This is something that @alexeiakimov had tried in previous issues, but some other priorities were raised.
TODOs:
The text was updated successfully, but these errors were encountered: