Closed
Description
This is an attempt to organize myself and make what I plan to work on more visible
Weekly High Level Goals
- Make arrow release: Release arrow-rs / parquet minor version
54.3.0
(Mar 2025) arrow-rs#7107 - Complete v1 of tree explain plans [EPIC] Complete
SQL EXPLAIN
Tree Rendering #14914 ready for wider review with @irenjj - get tpch data generator screaming fast with @clflushopt: Make it easier to run TPCH queries with datafusion-cli #14608
- Work with @XiangpengHao to get Enable parquet filter pushdown by default #3463 ready for merge
Other projects I think are strategically important
For a list of projects, see
These are the ones I plan to look at / review PRs in order
- Anything performance related
- Change mapping of SQL
VARCHAR
fromUtf8
toUtf8View
#15096 (comment) - Spark functions: feat: Add
datafusion-spark
crate #15168 - Hardening external sorts: A complete solution for stable and safe sort with spill #14692 with @2010YOUY01
- pushing expressions down into scans: Support Push down expression evaluation in
TableProviders
#14993
Weekly Stretch / Nice to have goals
- Try to move an example or two from datafusion-examples
Background
I am putting this list on github because:
- I like how github renders checklists w/ PR titles so it is easy to track (I currently have a local text file...)
- I thought others might be interested from seeing what I am doing / planning to do
- It makes me feel better that I don't have time to review all the PRs 😭
The way I am trying to prioritize PRs is in the following order
- Bug fixes
- Documentation / UX / API improvements (things that make DataFusion easier/better to work with)
- Performance improvements
- New features with wide appeal
- New functions
Note new features and functions are deliberately at the bottom
Metadata
Metadata
Assignees
Labels
No labels