Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): add partial query solver engine #2536

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

fubuloubu
Copy link
Member

What I did

This PR adds (provisional) support for "partial queries", e.g. breaking large user queries through the Query Management System into sub-queries according to the most efficient manner of provisioning them (based on installed plugins, plugin state, service access, environment, etc.)

Most of the time, when running some long-term query over a range of history, often certain providers are not enough to fully provision the entire query as some parts may be missing. Examples of this include disk cache with cryo (or the not-fully-functional ape-cache 1st party plugin), indexing services who might be a few minutes behind head w/ their API, or other such application-specific limitations (rate limits? free tier limits? etc.)

Thanks to this plugin, it is now possible to better support these types of queries efficiently, and also allow plugins to reduce overhead cost of conversion of their outputs, thanks to additions to the API to allow specifying different manners of providing raw Ape model access, or dataframes using narwhals's dataframe-agnostic API

fixes: #702
fixes: #1079
fixes: #2013

How I did it

First off, all built-in queries were modified using a new baseclass approach to add the concept of "query indexes" as an abstract API (and then implemented across relevant queries). Then the concept of a "cursor" was added to represent a provider-specific subset of the query without evaluating it. The cursors aided in the development of a new algorithm that finds the most efficient subset of all provider's cursor objects that covers the original query. Lastly, this new algorithm and all machinery to make best use of it was implemented into query subsystem API objects as well as the query manager, gated behind an environment variable APE_ENABLE_EXPERIMENTAL_QUERY_BACKEND

How to verify it

NOTE: To use the new solver, first enable it using APE_ENABLE_EXPERIMENTAL_QUERY_BACKEND=1

All regular functionality of the framework as well as any query-capable plugins should continue to function properly. Support for new query functionality needs to be added prior to v0.9 where the changes will be made breaking.

Also note that while it should function without any breaking changes to current v0.8, it is intended to replace the functionality of the query system when v0.9 rolls around, dropping support for the using the old method of querying from 2nd/3rd party plugins. The drop of support should be transparent to them however, as the old API methods will no longer be called, and the new ones have defaults which should not hinder proper operation.

Checklist

  • All changes are completed
  • Change is covered in tests
  • Documentation is complete

- update all `QueryType`s and add `BaseCursorAPI` objects to
  `ape.api.query`
- add support to `QueryManager` for using the new partial query
  solver system
- add `APE_ENABLE_EXPERIMENTAL_QUERY_BACKEND` environment variable to
  enable the new (experimental) partial query solver system
- build support for default RPC queries through new partial query system

NOTE: Contains notes for v0.9 breaking changes *not made yet*
@fubuloubu fubuloubu added category: feature New feature or request 0.8 0.9 labels Mar 4, 2025
@fubuloubu
Copy link
Member Author

maintainer's note: the idea behind this PR is to begin adding support for the new experimental query API to downstream 2nd/3rd party plugins in anticipation of v0.9, where legacy support will be dropped.

@fubuloubu fubuloubu force-pushed the feat/query/add-partial-queries branch from 1a9d54e to 8caae73 Compare March 4, 2025 01:20
@fubuloubu fubuloubu force-pushed the feat/query/add-partial-queries branch from 8caae73 to dece7c9 Compare March 4, 2025 01:45
@fubuloubu fubuloubu force-pushed the feat/query/add-partial-queries branch from 7d31a92 to a675de0 Compare March 4, 2025 03:08
Copy link
Member

@antazoey antazoey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, consider my feedback then ill approve
also i would like to make sure this doesnt break ape-titanoboa

class QueryConfig(PluginConfig):
"""Add 'query:' key to your config."""

backend: DataframeImplementation = DataframeImplementation.PANDAS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if happens is the query manager gets used w/o Pandas installed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, narwhals is an abstraction package that has code to translate operations on your choice of dataframe implementation, so if you don't have pandas installed but pick pandas then it will raise (hence the config option)

Many people prefer polars over pandas, and this would allow us to support both without having to ship with either package (meaning you would install your preferred package and set the config to use it as default)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on option, we could have a dynamic default that prefers polars if it is installed but uses pandas otherwise

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can do that, because it's unlikely that someone would have 2 or more installed, and if they do they can just set it via config (or kwarg)

if self.hash is None:
# Unable to query transactions.
# NOTE: Only "unsealed" blocks do not have a hash
raise ProviderError("Unable to find block transactions: not sealed yet")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be bad to logger.error() here and just return an empty list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there's a use case to work with that, but potentially thatmigjt be better

end_index: Optional[int] = None,
) -> "Self":
"""
Create a copy of this object with the query window shrunk inwards to `start_index` and/or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to use two tics for it to work in doc-strings

Suggested change
Create a copy of this object with the query window shrunk inwards to `start_index` and/or
Create a copy of this object with the query window shrunk inwards to ``start_index`` and/or

expected: list = []
assert actual == expected

# Ensure still works when hash is None (was a bug where this crashed).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashes are not always needed for blocks, such as Boa, which get you a block's transaction regardless of it has a made-up hash or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to figure out why this test existed, thank you for commenting

if self.hash is None:
# Unable to query transactions.
# NOTE: Only "unsealed" blocks do not have a hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every provider Ape talks to is a blockchain. Made up dev blocks from providers like boa don't necessarily need hashes even though they have associated transactions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, boa could also not use a real hash for block hash (e.g. block 1 is hash 0x1, etc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already made this change, but it was during development that led to the discovery of the bug leading to the test.

I think I am good with adjusting the test to catch an exception (instead of just completely deleting the test), as it was found with purpose.

@fubuloubu
Copy link
Member Author

nice, consider my feedback then ill approve
also i would like to make sure this doesnt break ape-titanoboa

In general, I want to check all the other plugins by having a PR queued up before merging this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.8 0.9 category: feature New feature or request
Projects
None yet
2 participants