feat(query): add partial query solver engine #2536

fubuloubu · 2025-03-04T00:13:14Z

What I did

This PR adds (provisional) support for "partial queries", e.g. breaking large user queries through the Query Management System into sub-queries according to the most efficient manner of provisioning them (based on installed plugins, plugin state, service access, environment, etc.)

Most of the time, when running some long-term query over a range of history, often certain providers are not enough to fully provision the entire query as some parts may be missing. Examples of this include disk cache with cryo (or the not-fully-functional ape-cache 1st party plugin), indexing services who might be a few minutes behind head w/ their API, or other such application-specific limitations (rate limits? free tier limits? etc.)

Thanks to this plugin, it is now possible to better support these types of queries efficiently, and also allow plugins to reduce overhead cost of conversion of their outputs, thanks to additions to the API to allow specifying different manners of providing raw Ape model access, or dataframes using narwhals's dataframe-agnostic API

fixes: #702
fixes: #1079
fixes: #2013

How I did it

First off, all built-in queries were modified using a new baseclass approach to add the concept of "query indexes" as an abstract API (and then implemented across relevant queries). Then the concept of a "cursor" was added to represent a provider-specific subset of the query without evaluating it. The cursors aided in the development of a new algorithm that finds the most efficient subset of all provider's cursor objects that covers the original query. Lastly, this new algorithm and all machinery to make best use of it was implemented into query subsystem API objects as well as the query manager, gated behind an environment variable APE_ENABLE_EXPERIMENTAL_QUERY_BACKEND

How to verify it

NOTE: To use the new solver, first enable it using APE_ENABLE_EXPERIMENTAL_QUERY_BACKEND=1

All regular functionality of the framework as well as any query-capable plugins should continue to function properly. Support for new query functionality needs to be added prior to v0.9 where the changes will be made breaking.

Also note that while it should function without any breaking changes to current v0.8, it is intended to replace the functionality of the query system when v0.9 rolls around, dropping support for the using the old method of querying from 2nd/3rd party plugins. The drop of support should be transparent to them however, as the old API methods will no longer be called, and the new ones have defaults which should not hinder proper operation.

Checklist

All changes are completed
Change is covered in tests
Documentation is complete

- update all `QueryType`s and add `BaseCursorAPI` objects to `ape.api.query` - add support to `QueryManager` for using the new partial query solver system - add `APE_ENABLE_EXPERIMENTAL_QUERY_BACKEND` environment variable to enable the new (experimental) partial query solver system - build support for default RPC queries through new partial query system NOTE: Contains notes for v0.9 breaking changes *not made yet*

fubuloubu · 2025-03-04T00:19:24Z

maintainer's note: the idea behind this PR is to begin adding support for the new experimental query API to downstream 2nd/3rd party plugins in anticipation of v0.9, where legacy support will be dropped.

…ctor)

antazoey

nice, consider my feedback then ill approve
also i would like to make sure this doesnt break ape-titanoboa

antazoey · 2025-03-04T21:52:08Z

src/ape/api/config.py

+class QueryConfig(PluginConfig):
+    """Add 'query:' key to your config."""
+
+    backend: DataframeImplementation = DataframeImplementation.PANDAS


What if happens is the query manager gets used w/o Pandas installed?

So, narwhals is an abstraction package that has code to translate operations on your choice of dataframe implementation, so if you don't have pandas installed but pick pandas then it will raise (hence the config option)

Many people prefer polars over pandas, and this would allow us to support both without having to ship with either package (meaning you would install your preferred package and set the config to use it as default)

on option, we could have a dynamic default that prefers polars if it is installed but uses pandas otherwise

Yeah we can do that, because it's unlikely that someone would have 2 or more installed, and if they do they can just set it via config (or kwarg)

antazoey · 2025-03-04T21:53:32Z

src/ape/api/providers.py

        if self.hash is None:
-            # Unable to query transactions.
+            # NOTE: Only "unsealed" blocks do not have a hash
+            raise ProviderError("Unable to find block transactions: not sealed yet")


Would it be bad to logger.error() here and just return an empty list?

Not sure if there's a use case to work with that, but potentially thatmigjt be better

antazoey · 2025-03-04T21:55:56Z

src/ape/api/query.py

+        end_index: Optional[int] = None,
+    ) -> "Self":
+        """
+        Create a copy of this object with the query window shrunk inwards to `start_index` and/or


you have to use two tics for it to work in doc-strings

Suggested change

Create a copy of this object with the query window shrunk inwards to `start_index` and/or

Create a copy of this object with the query window shrunk inwards to ``start_index`` and/or

antazoey · 2025-03-04T22:04:05Z

tests/functional/test_block.py

-    expected: list = []
-    assert actual == expected
-
-    # Ensure still works when hash is None (was a bug where this crashed).


Hashes are not always needed for blocks, such as Boa, which get you a block's transaction regardless of it has a made-up hash or not.

I was trying to figure out why this test existed, thank you for commenting

antazoey · 2025-03-04T22:05:10Z

src/ape/api/providers.py

        if self.hash is None:
-            # Unable to query transactions.
+            # NOTE: Only "unsealed" blocks do not have a hash


Not every provider Ape talks to is a blockchain. Made up dev blocks from providers like boa don't necessarily need hashes even though they have associated transactions.

As an aside, boa could also not use a real hash for block hash (e.g. block 1 is hash 0x1, etc)

I think we already made this change, but it was during development that led to the discovery of the bug leading to the test.

I think I am good with adjusting the test to catch an exception (instead of just completely deleting the test), as it was found with purpose.

fubuloubu · 2025-03-04T22:26:05Z

nice, consider my feedback then ill approve
also i would like to make sure this doesnt break ape-titanoboa

In general, I want to check all the other plugins by having a PR queued up before merging this.

fubuloubu added 3 commits March 3, 2025 18:31

refactor(query)!: add .num_transactions to BlockTransactionQuery

ea7de2d

test: refactor old test that did not make sense

0f52699

fubuloubu added category: feature New feature or request 0.8 0.9 labels Mar 4, 2025

fubuloubu requested review from antazoey and johnson2427 March 4, 2025 00:13

fix: Self is not available <3.11

806f572

fubuloubu force-pushed the feat/query/add-partial-queries branch from 1a9d54e to 8caae73 Compare March 4, 2025 01:20

fix: remove Pandas as core dependency, add Narwhals (needed some refa…

dece7c9

…ctor)

fubuloubu force-pushed the feat/query/add-partial-queries branch from 8caae73 to dece7c9 Compare March 4, 2025 01:45

fubuloubu added 3 commits March 3, 2025 21:40

fix: used Py3.10+ Union type

a646e10

feat(config): add ability to configure default query backend type

7cdc734

fix(query): forgot to set backend class because of temporary API issues

a675de0

fubuloubu force-pushed the feat/query/add-partial-queries branch from 7d31a92 to a675de0 Compare March 4, 2025 03:08

fubuloubu added 3 commits March 3, 2025 22:27

fix: add backport for pairwise when <3.10

30b69fa

fix: still using py3.10 union type

782fb61

refactor: fix type heirarchy and other small bugs

8f6f24c

antazoey reviewed Mar 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(query): add partial query solver engine #2536

feat(query): add partial query solver engine #2536

fubuloubu commented Mar 4, 2025

fubuloubu commented Mar 4, 2025

antazoey left a comment •

edited

Loading

antazoey Mar 4, 2025

fubuloubu Mar 4, 2025

antazoey Mar 5, 2025

fubuloubu Mar 5, 2025

antazoey Mar 4, 2025

fubuloubu Mar 4, 2025

antazoey Mar 4, 2025

antazoey Mar 4, 2025

fubuloubu Mar 4, 2025

antazoey Mar 4, 2025

fubuloubu Mar 4, 2025

antazoey Mar 10, 2025

fubuloubu commented Mar 4, 2025

	Create a copy of this object with the query window shrunk inwards to `start_index` and/or
	Create a copy of this object with the query window shrunk inwards to ``start_index`` and/or

feat(query): add partial query solver engine #2536

Are you sure you want to change the base?

feat(query): add partial query solver engine #2536

Conversation

fubuloubu commented Mar 4, 2025

What I did

How I did it

How to verify it

Checklist

fubuloubu commented Mar 4, 2025

antazoey left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fubuloubu commented Mar 4, 2025

antazoey left a comment •

edited

Loading