You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
athena_scan is the most basic thing to implement, but scans an entire table. Unfortunately, the way Athena works, it will be difficult to optimize that for large tables. And in most cases, I'm assuming folks are going to want a small slice of a table so at the very least, we'll need a pushdown function. It could be interesting to utilize UNLOAD, though, and then let DuckDB load the parquet files from S3.
athena_scan - just returns all the data from a single table
athena_scan_pushdown - similar to the postgres scanner, returns all the data filtered by certain predicates/partitions
athena_unload - Utilizes an UNLOAD query in Athena to write results to parquet in S3, then duckdb can just load the parquet files.
athena_query - Runs an athena query and returns the results
The text was updated successfully, but these errors were encountered:
Using Athena to expose Iceberg's metadata API would dramatically simplify DuckDB's integration with Iceberg. The most useful part of this API would be TableScan, which would make it possible to retrieve Iceberg partitions for a table with a given set of filtering predicates. As far as I know, Athena's API does not support that yet unfortunately, but it should not be too difficult to add, as I'm sure the Iceberg Java API must be used internally.
athena_scan
is the most basic thing to implement, but scans an entire table. Unfortunately, the way Athena works, it will be difficult to optimize that for large tables. And in most cases, I'm assuming folks are going to want a small slice of a table so at the very least, we'll need apushdown
function. It could be interesting to utilizeUNLOAD
, though, and then let DuckDB load the parquet files from S3.athena_scan
- just returns all the data from a single tableathena_scan_pushdown
- similar to the postgres scanner, returns all the data filtered by certain predicates/partitionsathena_unload
- Utilizes an UNLOAD query in Athena to write results to parquet in S3, then duckdb can just load the parquet files.athena_query
- Runs an athena query and returns the resultsThe text was updated successfully, but these errors were encountered: