Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to skip files based on partitions alone #263

Open
samansmink opened this issue Jun 21, 2024 · 2 comments
Open

Fail to skip files based on partitions alone #263

samansmink opened this issue Jun 21, 2024 · 2 comments

Comments

@samansmink
Copy link

In the duckdb delta extension I'm not seeing file skipping based on partitions.

So first of all, file skipping based on stats does seem to work correctly. For example, in the following case kernel correctly skips the files based on the predicate. So only 1 of the two files is passed to duckdb for scanning:

FROM delta_scan('${DAT_PATH}/out/reader_tests/generated/basic_append/delta')
WHERE number > 4

Now I have added some test data in duckdb delta which aims to test file skipping for all types that we can push down now. To do so I generate a few tables in the format /generated/test_file_skipping/{type}/delta_lake. See the line generating these tables here.

Now what I would expect is to be able to skip by this table using:

FROM delta_scan('./data/generated/test_file_skipping/bigint/delta_lake')
WHERE part=0

However when I instrument DuckDB to print the files kernel is passing me, I can see that even though the filter is pushed down, both files are passed:

 Pushing down filter part = 0
 Scanning path file:///Users/sam/Development/delta-kernel-testing/data/generated/test_file_skipping/bigint/delta_lake/part=0/0-00900a4a-99cf-4d43-993c-41950d6ed025-0.parquet
 Scanning path file:///Users/sam/Development/delta-kernel-testing/data/generated/test_file_skipping/bigint/delta_lake/part=1/0-00900a4a-99cf-4d43-993c-41950d6ed025-0.parquet
@hntd187
Copy link
Collaborator

hntd187 commented Jun 21, 2024

Hey Sam, I'm currently working on this. Right now data skipping doesn't take hive style partition paths like this into account, I have to upstream a few expression changes for this to also be compatible in delta-rs, but just so you're aware it's on my radar.

@santosh-d3vpl3x
Copy link

Just ran into this, would be great to have partition pushdown work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants