New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

chore(engine): Wireframe basic prototype of query executor #16935

Draft

rfratto wants to merge 2 commits into grafana:main from rfratto:query-executor-prototype

Member

rfratto commented Mar 27, 2025 •

edited

Loading

This PR introduces an extremely basic wireframe for the query executor. The query executor uses a stream of Apache Arrow records.

Currently, it only supports the Limit plan node.

Additionally, the PR includes an exploration of how to write tests for plan nodes which accept Arrow streams and transform them.

Just to help make the code more visible in Github, the first commit only contains update to vendor; you can look at the second commit to see the prototype.

rfratto added 2 commits

March 27, 2025 10:44


          chore: add dependency on github.com/apache/arrow-go/v18

4bb9792


          chore(engine): add wireframe for query executor

683dac7

This commit introduces a basic prototypal wireframe for the new engine's
query executor, with initial support for the Limit plan node.

The wireframe includes work on a general methodology for how to write
tests for plan nodes that transform a stream of Arrow records.

jeschkies reviewed

View reviewed changes

pkg/engine/executor/executor.go

Comment on lines +32 to +37

    
              // Result denotes a single result from the executor. A result can either be an

              // [arrow.Record], or an error.

              type Result struct {

              	val arrow.Record

              	err error

              }

Contributor

jeschkies Mar 31, 2025

How do you feel about calling this Batch or ResultBatch?

pkg/engine/executor/evaluator.go

Comment on lines +37 to +40

+              	iters := make([]iter.Seq[Result], len(children))
+              	for i, child := range children {
+              		iters[i] = e.processNode(ctx, child)
+              	}

Contributor

jeschkies Mar 31, 2025

From my experience with the old engine this is a little tricky because it's not clear when nodes are evaluated and results are created. Is it lazy or not? E.g. the underlying method could aggregate all results into a slice and return that or it could create a result on each yield call.

pkg/engine/executor/evaluator.go

Comment on lines +43 to +48

+              		switch n := n.(type) {
+              		case *physical.DataObjScan:
+              			e.processDataObjScan(n, iters)(yield)
+              		case *physical.Limit:
+              			e.processLimit(n, iters)(yield)
+              		default:

Contributor

jeschkies Mar 31, 2025

How do you feel about using a visitor pattern like in logsl/syntax/visit.go instead?

Originally I thought pattern matching would do as with the walk implementation. However, there were a few issues with that approach.

The Go compiler does not check for exhaustiveness in pattern matches. This lead to bugs.
The walk order would be somewhat random.
The visitor would give a structure for keeping intermediate results.

I have to admit that I'm not totally convinced the visitor is a great choice especially when one is only interested in specific node types. It did work well for the clone method, though.

Contributor

jeschkies Mar 31, 2025

It just crossed my mind. How do you feel about a stack based iteration to remove the recursion?

pkg/engine/executor/evaluator.go

+              			limit  = int64(n.Limit)
+              		)
+              		for r := range input[0] {

Contributor

jeschkies Mar 31, 2025

Should this for over all inputs or maybe flattened inputs?

pkg/engine/executor/evaluator.go

+              		for r := range input[0] {
+              			rec, err := r.Value()
+              			if err != nil {
+              				yield(errorResult(fmt.Errorf("error reading record: %w", err)))

Contributor

jeschkies Mar 31, 2025

How does the call stack look like? Isn't it inside out or is Go reversing it?

pkg/engine/executor/evaluator.go

+              	"fmt"
+              	"iter"
+              	"github.com/apache/arrow-go/v18/arrow/memory"

Contributor

jeschkies Mar 31, 2025

I've mentioned it to @chaudum in a 1:1: I'm a little critical regarding the Apache Arrow and its Go implementation. We've done a hackathon with it and came to two conclusions

The implementation and especially its SIMD based compute kernels weren't ready (that was end of 2023).
Loki's main bottlenecks are string processing and allocations. Numeric vectorization that Arrow etc bring have little impact. I do not know how well the Arrow is with strings and allocations, though.

That sad I do not want to impose and am pretty sure you've considered more options that I have. I just wanted to share my inside 🙂

chaudum reviewed

View reviewed changes

pkg/engine/executor/evaluator.go

+              		iters[i] = e.processNode(ctx, child)
+              	}
+              	return func(yield func(Result) bool) {

Contributor

chaudum Apr 1, 2025

I personally find it very hard to reason how iterators work :/

pkg/engine/executor/evaluator.go

Comment on lines +68 to +71

+              		if len(input) != 1 {
+              			yield(errorResult(errors.New("limit nodes must have exactly one input")))
+              			return
+              		}

Contributor

chaudum Apr 1, 2025

I find it non-intuitive that the validation happens only when the iterator is actually iterated over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet