-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serialization: implement new, type-safe variants of BatchValues #881
Conversation
We will introduce types with the same name but different interface, and we want to move off the current ones gradually. Rename the existing ones as the first step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After seeing new definitions of BatchValuesIterator
/ RawBatchValuesIterator
I wonder - isn't it basically a re-implementation of LendingIterator
? Maybe we could use it?
Add a method to RowWriter which allows copying all contents of some SerializedValues object to it. Without it, the only method would be to parse the values in the SerializedValues via iter() and write them to the RowWriter, which is unnecessarily ineffective.
We will need to pass an empty row serialization context when we serialize values for unprepared queries in a batch.
It's not needed to use a block to make sure that `writer` is dropped before `data` is used, non-lexical lifetimes already take care of that.
It will be used to serialize data produced by upcoming batch values iterators.
A small refactor in order to improve the clarity of the later commits.
A small refactor to improve the clarity of later commits.
This refactor will improve clarity of the next change, but will also prevent an obscure issue from happening that causes futures not to be `Send` when an instance of a GAT exists across an await point.
If we changed the code to use the new BatchValues API first, the compiler would complain about some lifetime issues of temporary objects passed to the `match` block.
The PreparedStatement::calculate_token method takes an object that implements SerializeRow, serializes it to SerializedValues and returns the token. For the sake of an optimization in the code that handles batches we would like to provide a SerializedValues object directly. Extract the part that calculates token from SerializedValues to a new, separate method, but keep it pub(crate) - we'd like to avoid exposing type-unsafe interfaces to the users.
`SerializableRequest::serialize` accepts `impl BufMut`, but in reality we only pass `Vec<u8>` as an argument. Change the type of the argument to Vec<u8>, as in further commits we will need the argument to be precisely Vec<u8>.
59534dd
to
afde4a1
Compare
AFAIK there is not standard library definition of a lending iterator. Not sure whether there are "industry standard" crates that would define such a trait. Besides, this issue is irrelevant in the newest version of the PR. |
afde4a1
to
c861937
Compare
Marking as "ready", I'll update the docs in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one small comment
/// An iterator-like for `ValueList` | ||
/// | ||
/// An instance of this can be easily obtained from `IT: Iterator<Item: ValueList>`: that would be | ||
/// `BatchValuesIteratorFromIterator<IT>` | ||
/// | ||
/// It's just essentially making methods from `ValueList` accessible instead of being an actual iterator because of | ||
/// compiler limitations that would otherwise be very complex to overcome. | ||
/// (specifically, types being different would require yielding enums for tuple impls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this comment you probably meant SerializedValues
, not ValueList
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this comment was copied over from the legacy BatchValuesIterator
but not updated. I'll update in a second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cfd9e97
to
fcdeb08
Compare
fn batch_values_iter(&self) -> Self::RawBatchValuesIter<'_>; | ||
} | ||
|
||
/// An iterator-like for `ValueList` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more value list hides here @piodul
Introduces the successors of the previous BatchValues trait and its friends. The structure of the new traits is similar to the old traits, but they come in two flavors: - BatchValue, BatchValueIterator - those are user-facing traits. They allow iterating over the sets of values for batch's statements, but need to have the information about the names and types of the columns/bind markers supplied from the outside. - RawBatchValues, RawBatchValueIterator, RawSerializeRow - those serve as a glue between the logic in `scylla` and `scylla-cql`. They are analogous to `BatchValues`, `BatchValueIterator` and `SerializeRow`, but do not need the type information to be able to serialize themselves into the request. Co-authored-by: Karol Baryła <[email protected]>
Implement an adapter layer which takes a `BatchValues` object (which needs type information to serialize), pairs it with an iterator over `RowSerializationContext` objects and returns something which implements `RawBatchValues` (which don't require type information to serialize). It will be used by the `scylla` crate to pass batch data to `scylla-cql` in a type-erased form.
The purpose of the new struct is to enable token-aware routing of batches - which already exists in the old API - in the new API. Batches are routed according to the token calculated based on the first statement in the batch (if the first statement is a prepared statement). Calculation of the token must happen before the load balancing policy computes a plan and chooses the first connection, but serialization only happens after a connection is chosen. In order not to repeat serialization work, BatchValuesFirstSerialized wrapper can be used to transform a BatchValues into another BatchValues which caches the result of the first serialization. The types are put into a module in the `scylla` crate and hidden inside it. The wrapping functionality is exposed via a function which constructs the BatchValuesFirstSerialized object but returns it as `impl BatchValues`.
The driver is updated to use the new BatchValues API on all layers at once. Fortunately, there aren't many changes and they are mostly simple.
Simplify the logic of `Session::batch` by moving the parts responsible for token calculation and wrapping the `BatchValues` argument into a separate function in the `batch_values` module.
In case somebody has a custom implementation of BatchValues, they can use the adapter.
This way can reuse the existing tests for impls of the new traits and see that they behave in the same way.
Migration of batches to the new API is complete, so this method is no longer needed and can finally be removed.
fcdeb08
to
fbeafd7
Compare
This PR replaces the existing BatchValues API with one that leverages the recently introduced
SerializeCql
/SerializeRow
infrastructure and is type-safe.Refs: #801
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.