Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialization: implement new, type-safe variants of BatchValues #881

Merged
merged 19 commits into from
Dec 15, 2023

Conversation

piodul
Copy link
Collaborator

@piodul piodul commented Dec 14, 2023

This PR replaces the existing BatchValues API with one that leverages the recently introduced SerializeCql/SerializeRow infrastructure and is type-safe.

Refs: #801

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@piodul piodul requested a review from Lorak-mmk December 14, 2023 15:00
We will introduce types with the same name but different interface, and
we want to move off the current ones gradually. Rename the existing ones
as the first step.
Copy link
Collaborator

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After seeing new definitions of BatchValuesIterator / RawBatchValuesIterator I wonder - isn't it basically a re-implementation of LendingIterator? Maybe we could use it?

Add a method to RowWriter which allows copying all contents of some
SerializedValues object to it. Without it, the only method would be to
parse the values in the SerializedValues via iter() and write them to
the RowWriter, which is unnecessarily ineffective.
We will need to pass an empty row serialization context when we
serialize values for unprepared queries in a batch.
It's not needed to use a block to make sure that `writer` is dropped
before `data` is used, non-lexical lifetimes already take care of that.
It will be used to serialize data produced by upcoming batch values
iterators.
A small refactor in order to improve the clarity of the later commits.
A small refactor to improve the clarity of later commits.
This refactor will improve clarity of the next change, but will also
prevent an obscure issue from happening that causes futures not to be
`Send` when an instance of a GAT exists across an await point.
If we changed the code to use the new BatchValues API first, the
compiler would complain about some lifetime issues of temporary objects
passed to the `match` block.
The PreparedStatement::calculate_token method takes an object that
implements SerializeRow, serializes it to SerializedValues and returns
the token. For the sake of an optimization in the code that handles
batches we would like to provide a SerializedValues object directly.

Extract the part that calculates token from SerializedValues to a new,
separate method, but keep it pub(crate) - we'd like to avoid exposing
type-unsafe interfaces to the users.
`SerializableRequest::serialize` accepts `impl BufMut`, but in reality
we only pass `Vec<u8>` as an argument. Change the type of the argument
to Vec<u8>, as in further commits we will need the argument to be
precisely Vec<u8>.
@piodul piodul force-pushed the new-serialize-api-batches branch from 59534dd to afde4a1 Compare December 15, 2023 05:54
@piodul
Copy link
Collaborator Author

piodul commented Dec 15, 2023

After seeing new definitions of BatchValuesIterator / RawBatchValuesIterator I wonder - isn't it basically a re-implementation of LendingIterator? Maybe we could use it?

AFAIK there is not standard library definition of a lending iterator. Not sure whether there are "industry standard" crates that would define such a trait. Besides, this issue is irrelevant in the newest version of the PR.

@piodul piodul force-pushed the new-serialize-api-batches branch from afde4a1 to c861937 Compare December 15, 2023 06:08
@piodul
Copy link
Collaborator Author

piodul commented Dec 15, 2023

Marking as "ready", I'll update the docs in a separate PR.

@piodul piodul marked this pull request as ready for review December 15, 2023 11:28
Copy link
Collaborator

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small comment

Comment on lines 23 to 30
/// An iterator-like for `ValueList`
///
/// An instance of this can be easily obtained from `IT: Iterator<Item: ValueList>`: that would be
/// `BatchValuesIteratorFromIterator<IT>`
///
/// It's just essentially making methods from `ValueList` accessible instead of being an actual iterator because of
/// compiler limitations that would otherwise be very complex to overcome.
/// (specifically, types being different would require yielding enums for tuple impls)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this comment you probably meant SerializedValues, not ValueList, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this comment was copied over from the legacy BatchValuesIterator but not updated. I'll update in a second.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@piodul piodul force-pushed the new-serialize-api-batches branch 2 times, most recently from cfd9e97 to fcdeb08 Compare December 15, 2023 13:51
fn batch_values_iter(&self) -> Self::RawBatchValuesIter<'_>;
}

/// An iterator-like for `ValueList`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more value list hides here @piodul

piodul and others added 7 commits December 15, 2023 15:31
Introduces the successors of the previous BatchValues trait and its
friends. The structure of the new traits is similar to the old traits,
but they come in two flavors:

- BatchValue, BatchValueIterator - those are user-facing traits. They
  allow iterating over the sets of values for batch's statements, but
  need to have the information about the names and types of the
  columns/bind markers supplied from the outside.

- RawBatchValues, RawBatchValueIterator, RawSerializeRow - those serve
  as a glue between the logic in `scylla` and `scylla-cql`. They are
  analogous to `BatchValues`, `BatchValueIterator` and `SerializeRow`,
  but do not need the type information to be able to serialize
  themselves into the request.

Co-authored-by: Karol Baryła <[email protected]>
Implement an adapter layer which takes a `BatchValues` object (which
needs type information to serialize), pairs it with an iterator over
`RowSerializationContext` objects and returns something which implements
`RawBatchValues` (which don't require type information to serialize).

It will be used by the `scylla` crate to pass batch data to `scylla-cql`
in a type-erased form.
The purpose of the new struct is to enable token-aware routing of
batches - which already exists in the old API - in the new API.

Batches are routed according to the token calculated based on the first
statement in the batch (if the first statement is a prepared statement).
Calculation of the token must happen before the load balancing policy
computes a plan and chooses the first connection, but serialization only
happens after a connection is chosen. In order not to repeat
serialization work, BatchValuesFirstSerialized wrapper can be used to
transform a BatchValues into another BatchValues which caches the result
of the first serialization.

The types are put into a module in the `scylla` crate and hidden inside
it. The wrapping functionality is exposed via a function which
constructs the BatchValuesFirstSerialized object but returns it as `impl
BatchValues`.
The driver is updated to use the new BatchValues API on all layers at
once. Fortunately, there aren't many changes and they are mostly simple.
Simplify the logic of `Session::batch` by moving the parts responsible
for token calculation and wrapping the `BatchValues` argument into a
separate function in the `batch_values` module.
In case somebody has a custom implementation of BatchValues, they can
use the adapter.
This way can reuse the existing tests for impls of the new traits and
see that they behave in the same way.
Migration of batches to the new API is complete, so this method is no
longer needed and can finally be removed.
@piodul piodul force-pushed the new-serialize-api-batches branch from fcdeb08 to fbeafd7 Compare December 15, 2023 14:31
@piodul piodul requested a review from Lorak-mmk December 15, 2023 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants