-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(query): add partial query solver engine #2536
base: main
Are you sure you want to change the base?
Conversation
- update all `QueryType`s and add `BaseCursorAPI` objects to `ape.api.query` - add support to `QueryManager` for using the new partial query solver system - add `APE_ENABLE_EXPERIMENTAL_QUERY_BACKEND` environment variable to enable the new (experimental) partial query solver system - build support for default RPC queries through new partial query system NOTE: Contains notes for v0.9 breaking changes *not made yet*
maintainer's note: the idea behind this PR is to begin adding support for the new experimental query API to downstream 2nd/3rd party plugins in anticipation of v0.9, where legacy support will be dropped. |
1a9d54e
to
8caae73
Compare
8caae73
to
dece7c9
Compare
7d31a92
to
a675de0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, consider my feedback then ill approve
also i would like to make sure this doesnt break ape-titanoboa
class QueryConfig(PluginConfig): | ||
"""Add 'query:' key to your config.""" | ||
|
||
backend: DataframeImplementation = DataframeImplementation.PANDAS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if happens is the query manager gets used w/o Pandas installed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, narwhals is an abstraction package that has code to translate operations on your choice of dataframe implementation, so if you don't have pandas installed but pick pandas then it will raise (hence the config option)
Many people prefer polars over pandas, and this would allow us to support both without having to ship with either package (meaning you would install your preferred package and set the config to use it as default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on option, we could have a dynamic default that prefers polars if it is installed but uses pandas otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can do that, because it's unlikely that someone would have 2 or more installed, and if they do they can just set it via config (or kwarg)
if self.hash is None: | ||
# Unable to query transactions. | ||
# NOTE: Only "unsealed" blocks do not have a hash | ||
raise ProviderError("Unable to find block transactions: not sealed yet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be bad to logger.error()
here and just return an empty list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there's a use case to work with that, but potentially thatmigjt be better
end_index: Optional[int] = None, | ||
) -> "Self": | ||
""" | ||
Create a copy of this object with the query window shrunk inwards to `start_index` and/or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have to use two tics for it to work in doc-strings
Create a copy of this object with the query window shrunk inwards to `start_index` and/or | |
Create a copy of this object with the query window shrunk inwards to ``start_index`` and/or |
expected: list = [] | ||
assert actual == expected | ||
|
||
# Ensure still works when hash is None (was a bug where this crashed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hashes are not always needed for blocks, such as Boa, which get you a block's transaction regardless of it has a made-up hash or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to figure out why this test existed, thank you for commenting
if self.hash is None: | ||
# Unable to query transactions. | ||
# NOTE: Only "unsealed" blocks do not have a hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every provider Ape talks to is a blockchain. Made up dev blocks from providers like boa don't necessarily need hashes even though they have associated transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, boa could also not use a real hash for block hash (e.g. block 1 is hash 0x1, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already made this change, but it was during development that led to the discovery of the bug leading to the test.
I think I am good with adjusting the test to catch an exception (instead of just completely deleting the test), as it was found with purpose.
In general, I want to check all the other plugins by having a PR queued up before merging this. |
What I did
This PR adds (provisional) support for "partial queries", e.g. breaking large user queries through the Query Management System into sub-queries according to the most efficient manner of provisioning them (based on installed plugins, plugin state, service access, environment, etc.)
Most of the time, when running some long-term query over a range of history, often certain providers are not enough to fully provision the entire query as some parts may be missing. Examples of this include disk cache with
cryo
(or the not-fully-functionalape-cache
1st party plugin), indexing services who might be a few minutes behind head w/ their API, or other such application-specific limitations (rate limits? free tier limits? etc.)Thanks to this plugin, it is now possible to better support these types of queries efficiently, and also allow plugins to reduce overhead cost of conversion of their outputs, thanks to additions to the API to allow specifying different manners of providing raw Ape model access, or dataframes using
narwhals
's dataframe-agnostic APIfixes: #702
fixes: #1079
fixes: #2013
How I did it
First off, all built-in queries were modified using a new baseclass approach to add the concept of "query indexes" as an abstract API (and then implemented across relevant queries). Then the concept of a "cursor" was added to represent a provider-specific subset of the query without evaluating it. The cursors aided in the development of a new algorithm that finds the most efficient subset of all provider's cursor objects that covers the original query. Lastly, this new algorithm and all machinery to make best use of it was implemented into query subsystem API objects as well as the query manager, gated behind an environment variable
APE_ENABLE_EXPERIMENTAL_QUERY_BACKEND
How to verify it
NOTE: To use the new solver, first enable it using
APE_ENABLE_EXPERIMENTAL_QUERY_BACKEND=1
All regular functionality of the framework as well as any query-capable plugins should continue to function properly. Support for new query functionality needs to be added prior to v0.9 where the changes will be made breaking.
Also note that while it should function without any breaking changes to current v0.8, it is intended to replace the functionality of the query system when v0.9 rolls around, dropping support for the using the old method of querying from 2nd/3rd party plugins. The drop of support should be transparent to them however, as the old API methods will no longer be called, and the new ones have defaults which should not hinder proper operation.
Checklist