Description
Is your feature request related to a problem or challenge?
DataFuson is growing by almost all measures: community π€ , features πͺΆ , and codebase size β which is good π However, this growth is causing challenges such as:
- Lengthy review cycles (especially for new features). For example the PR for lateral subqueries took 5 weeks to review and merge
- PRs that are written but then not merged as they seem to be too large in scope (e.g. hugging face from @xinlifoobar , FlightSQLDriver from @ccciudatu, etc)
- Uncertainty on feature scope -- for example, should we be adding all the (very cool) DuckDB SQL extensions / aggregates to make the default SQL engine as easy as possible or should those be implemented extension packages?
As described in the Design Goals, it is important for DataFusion to:
- Work βout of the boxβ: Provide a very fast, world class query engine with minimal setup or required configuration.
- Customizable everything: All behavior should be customizable by implementing traits.
However, this description doesn't offer any specific criteria about which features should be in the core (to work "out of the box") and which should be implemented as extensions
I am worried that if we take all possiblely useful features, the DataFusion core will become unmanageble / unmaintainable. Already we are struggling with review capacity (it takes days / weeks to review new feautre PRs)
Describe the solution you'd like
I would like a clearly articulated set of criteria of when features should be added to the core vs when they should be in downstream projects / crates built with the extension APIs
Describe alternatives you've considered
No response
Additional context
No response