[design] Decide on other functions to implement #5

dacort · 2023-02-22T00:17:46Z

athena_scan is the most basic thing to implement, but scans an entire table. Unfortunately, the way Athena works, it will be difficult to optimize that for large tables. And in most cases, I'm assuming folks are going to want a small slice of a table so at the very least, we'll need a pushdown function. It could be interesting to utilize UNLOAD, though, and then let DuckDB load the parquet files from S3.

athena_scan - just returns all the data from a single table
athena_scan_pushdown - similar to the postgres scanner, returns all the data filtered by certain predicates/partitions
athena_unload - Utilizes an UNLOAD query in Athena to write results to parquet in S3, then duckdb can just load the parquet files.
athena_query - Runs an athena query and returns the results

The text was updated successfully, but these errors were encountered:

ghalimi · 2023-02-23T00:51:03Z

Using Athena to expose Iceberg's metadata API would dramatically simplify DuckDB's integration with Iceberg. The most useful part of this API would be TableScan, which would make it possible to retrieve Iceberg partitions for a table with a given set of filtering predicates. As far as I know, Athena's API does not support that yet unfortunately, but it should not be too difficult to add, as I'm sure the Iceberg Java API must be used internally.

dacort mentioned this issue Sep 15, 2023

I am wondering what's the point of this extension #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[design] Decide on other functions to implement #5

[design] Decide on other functions to implement #5

dacort commented Feb 22, 2023

ghalimi commented Feb 23, 2023 •

edited

Loading

[design] Decide on other functions to implement #5

[design] Decide on other functions to implement #5

Comments

dacort commented Feb 22, 2023

ghalimi commented Feb 23, 2023 • edited Loading

ghalimi commented Feb 23, 2023 •

edited

Loading