Closed
Description
This is my plan this week for reviews, etc. I am putting it here to make it visible and keep myself organized
I am very much focused on the following projects
- [EPIC] Faster performance for parquet predicate evaluation for non selective filters arrow-rs#7456 with @zhuqi-lucas (to enable
filter_pushdown
Enable parquet filter pushdown by default #3463) - Parquet Variant: Merge initial read API in Initial API for reading Variant data and metadata arrow-rs#7535 from @mkarbo and @scovich
- Parquet Variant: follow up tasks / start working on the Write / Builder API
I hope to push along the following
- DataFusion Feature: merge
async
user defined functions and file follow up with @goldmedal Introduce Async User Defined Functions #14837 - DataFusion performance: reduce the size of
Expr
and rerun benchmarks Reduce size ofExpr
struct #14366 / Reduce size ofExpr
struct #16199
Stretch Goals
- DataFusion Feature: Update example of using multiple threadpools with object store Example for using a separate threadpool for CPU bound work (try 2) #14286 (comment)
- DataFusion perf script from @logan-keede : Shell script to collect benchmarks for multiple versions #15144
- DataFusion perf script draft: feat(benchmark): collect benchmarks for last 5 versions in line protocol format #15846
- DataFusion PR about pruning ordering: pipe column orderings into pruning predicate creation #15821
Nice to have (really would be great to have someone help review):
- DataFusion: Aggregate UDFs in FFI: feat: Add Aggregate UDF to FFI crate #14775
- Arrow: Avro cleanup: Avro codec enhancements arrow-rs#6965
Metadata
Metadata
Assignees
Labels
No labels