Skip to content

[DISCUSS] Document criteria for adding new features / what belongs in core DataFusion (e.g. sql syntax, functions, etc)Β #12357

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

DataFuson is growing by almost all measures: community πŸ€— , features πŸͺΆ , and codebase size βœ… which is good πŸŽ‰ However, this growth is causing challenges such as:

  1. Lengthy review cycles (especially for new features). For example the PR for lateral subqueries took 5 weeks to review and merge
  2. PRs that are written but then not merged as they seem to be too large in scope (e.g. hugging face from @xinlifoobar , FlightSQLDriver from @ccciudatu, etc)
  3. Uncertainty on feature scope -- for example, should we be adding all the (very cool) DuckDB SQL extensions / aggregates to make the default SQL engine as easy as possible or should those be implemented extension packages?

As described in the Design Goals, it is important for DataFusion to:

  1. Work β€œout of the box”: Provide a very fast, world class query engine with minimal setup or required configuration.
  2. Customizable everything: All behavior should be customizable by implementing traits.

However, this description doesn't offer any specific criteria about which features should be in the core (to work "out of the box") and which should be implemented as extensions

I am worried that if we take all possiblely useful features, the DataFusion core will become unmanageble / unmaintainable. Already we are struggling with review capacity (it takes days / weeks to review new feautre PRs)

Describe the solution you'd like

I would like a clearly articulated set of criteria of when features should be added to the core vs when they should be in downstream projects / crates built with the extension APIs

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions