Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[design] Decide on other functions to implement #5

Open
dacort opened this issue Feb 22, 2023 · 1 comment
Open

[design] Decide on other functions to implement #5

dacort opened this issue Feb 22, 2023 · 1 comment

Comments

@dacort
Copy link
Owner

dacort commented Feb 22, 2023

athena_scan is the most basic thing to implement, but scans an entire table. Unfortunately, the way Athena works, it will be difficult to optimize that for large tables. And in most cases, I'm assuming folks are going to want a small slice of a table so at the very least, we'll need a pushdown function. It could be interesting to utilize UNLOAD, though, and then let DuckDB load the parquet files from S3.

  • athena_scan - just returns all the data from a single table
  • athena_scan_pushdown - similar to the postgres scanner, returns all the data filtered by certain predicates/partitions
  • athena_unload - Utilizes an UNLOAD query in Athena to write results to parquet in S3, then duckdb can just load the parquet files.
  • athena_query - Runs an athena query and returns the results
@ghalimi
Copy link

ghalimi commented Feb 23, 2023

Using Athena to expose Iceberg's metadata API would dramatically simplify DuckDB's integration with Iceberg. The most useful part of this API would be TableScan, which would make it possible to retrieve Iceberg partitions for a table with a given set of filtering predicates. As far as I know, Athena's API does not support that yet unfortunately, but it should not be too difficult to add, as I'm sure the Iceberg Java API must be used internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants