serialization: implement new, type-safe variants of BatchValues #881

piodul · 2023-12-14T15:00:49Z

This PR replaces the existing BatchValues API with one that leverages the recently introduced SerializeCql/SerializeRow infrastructure and is type-safe.

Refs: #801

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

We will introduce types with the same name but different interface, and we want to move off the current ones gradually. Rename the existing ones as the first step.

Lorak-mmk

After seeing new definitions of BatchValuesIterator / RawBatchValuesIterator I wonder - isn't it basically a re-implementation of LendingIterator? Maybe we could use it?

scylla-cql/src/frame/request/mod.rs

scylla-cql/src/types/serialize/batch.rs

scylla-cql/src/types/serialize/raw_batch.rs

scylla/src/transport/session.rs

Add a method to RowWriter which allows copying all contents of some SerializedValues object to it. Without it, the only method would be to parse the values in the SerializedValues via iter() and write them to the RowWriter, which is unnecessarily ineffective.

We will need to pass an empty row serialization context when we serialize values for unprepared queries in a batch.

It's not needed to use a block to make sure that `writer` is dropped before `data` is used, non-lexical lifetimes already take care of that.

It will be used to serialize data produced by upcoming batch values iterators.

A small refactor in order to improve the clarity of the later commits.

A small refactor to improve the clarity of later commits.

This refactor will improve clarity of the next change, but will also prevent an obscure issue from happening that causes futures not to be `Send` when an instance of a GAT exists across an await point.

If we changed the code to use the new BatchValues API first, the compiler would complain about some lifetime issues of temporary objects passed to the `match` block.

The PreparedStatement::calculate_token method takes an object that implements SerializeRow, serializes it to SerializedValues and returns the token. For the sake of an optimization in the code that handles batches we would like to provide a SerializedValues object directly. Extract the part that calculates token from SerializedValues to a new, separate method, but keep it pub(crate) - we'd like to avoid exposing type-unsafe interfaces to the users.

`SerializableRequest::serialize` accepts `impl BufMut`, but in reality we only pass `Vec<u8>` as an argument. Change the type of the argument to Vec<u8>, as in further commits we will need the argument to be precisely Vec<u8>.

piodul · 2023-12-15T05:54:31Z

After seeing new definitions of BatchValuesIterator / RawBatchValuesIterator I wonder - isn't it basically a re-implementation of LendingIterator? Maybe we could use it?

AFAIK there is not standard library definition of a lending iterator. Not sure whether there are "industry standard" crates that would define such a trait. Besides, this issue is irrelevant in the newest version of the PR.

piodul · 2023-12-15T11:28:52Z

Marking as "ready", I'll update the docs in a separate PR.

Lorak-mmk

Just one small comment

Lorak-mmk · 2023-12-15T13:14:25Z

scylla-cql/src/types/serialize/raw_batch.rs

+/// An iterator-like for `ValueList`
+///
+/// An instance of this can be easily obtained from `IT: Iterator<Item: ValueList>`: that would be
+/// `BatchValuesIteratorFromIterator<IT>`
+///
+/// It's just essentially making methods from `ValueList` accessible instead of being an actual iterator because of
+/// compiler limitations that would otherwise be very complex to overcome.
+/// (specifically, types being different would require yielding enums for tuple impls)


In this comment you probably meant SerializedValues, not ValueList, right?

Right, this comment was copied over from the legacy BatchValuesIterator but not updated. I'll update in a second.

Lorak-mmk · 2023-12-15T14:28:12Z

scylla-cql/src/types/serialize/raw_batch.rs

+    fn batch_values_iter(&self) -> Self::RawBatchValuesIter<'_>;
+}
+
+/// An iterator-like for `ValueList`


One more value list hides here @piodul

Introduces the successors of the previous BatchValues trait and its friends. The structure of the new traits is similar to the old traits, but they come in two flavors: - BatchValue, BatchValueIterator - those are user-facing traits. They allow iterating over the sets of values for batch's statements, but need to have the information about the names and types of the columns/bind markers supplied from the outside. - RawBatchValues, RawBatchValueIterator, RawSerializeRow - those serve as a glue between the logic in `scylla` and `scylla-cql`. They are analogous to `BatchValues`, `BatchValueIterator` and `SerializeRow`, but do not need the type information to be able to serialize themselves into the request. Co-authored-by: Karol Baryła <[email protected]>

Implement an adapter layer which takes a `BatchValues` object (which needs type information to serialize), pairs it with an iterator over `RowSerializationContext` objects and returns something which implements `RawBatchValues` (which don't require type information to serialize). It will be used by the `scylla` crate to pass batch data to `scylla-cql` in a type-erased form.

The purpose of the new struct is to enable token-aware routing of batches - which already exists in the old API - in the new API. Batches are routed according to the token calculated based on the first statement in the batch (if the first statement is a prepared statement). Calculation of the token must happen before the load balancing policy computes a plan and chooses the first connection, but serialization only happens after a connection is chosen. In order not to repeat serialization work, BatchValuesFirstSerialized wrapper can be used to transform a BatchValues into another BatchValues which caches the result of the first serialization. The types are put into a module in the `scylla` crate and hidden inside it. The wrapping functionality is exposed via a function which constructs the BatchValuesFirstSerialized object but returns it as `impl BatchValues`.

The driver is updated to use the new BatchValues API on all layers at once. Fortunately, there aren't many changes and they are mostly simple.

Simplify the logic of `Session::batch` by moving the parts responsible for token calculation and wrapping the `BatchValues` argument into a separate function in the `batch_values` module.

In case somebody has a custom implementation of BatchValues, they can use the adapter.

This way can reuse the existing tests for impls of the new traits and see that they behave in the same way.

Migration of batches to the new API is complete, so this method is no longer needed and can finally be removed.

piodul requested a review from Lorak-mmk December 14, 2023 15:00

treewide: add Legacy- prefix to BatchValues and its friends

7fe1ef8

We will introduce types with the same name but different interface, and we want to move off the current ones gradually. Rename the existing ones as the first step.

Lorak-mmk reviewed Dec 15, 2023

View reviewed changes

scylla-cql/src/frame/request/mod.rs Show resolved Hide resolved

scylla-cql/src/types/serialize/batch.rs Outdated Show resolved Hide resolved

scylla-cql/src/types/serialize/raw_batch.rs Outdated Show resolved Hide resolved

scylla/src/transport/session.rs Outdated Show resolved Hide resolved

piodul added 10 commits December 15, 2023 06:41

serialize/row: add RowSerializationContext::empty

8715498

We will need to pass an empty row serialization context when we serialize values for unprepared queries in a batch.

serialize/row: simplify SerializedValues::from_serializable

daa5266

It's not needed to use a block to make sure that `writer` is dropped before `data` is used, non-lexical lifetimes already take care of that.

serialize/row: add SerializedValues::from_closure method

cc83cb6

It will be used to serialize data produced by upcoming batch values iterators.

session: construct RoutingInfo outside the match

1e61f9c

A small refactor in order to improve the clarity of the later commits.

session: inline the .as_deref() call

e90975e

A small refactor to improve the clarity of later commits.

session: construct first_serialized_value in a narrower scope

ccd60b3

This refactor will improve clarity of the next change, but will also prevent an obscure issue from happening that causes futures not to be `Send` when an instance of a GAT exists across an await point.

session: work around lifetime issues with temporaries

5ffd3ab

If we changed the code to use the new BatchValues API first, the compiler would complain about some lifetime issues of temporary objects passed to the `match` block.

frame: use Vec<u8> for request serialization, not BufMut

51bf7ab

`SerializableRequest::serialize` accepts `impl BufMut`, but in reality we only pass `Vec<u8>` as an argument. Change the type of the argument to Vec<u8>, as in further commits we will need the argument to be precisely Vec<u8>.

piodul force-pushed the new-serialize-api-batches branch from 59534dd to afde4a1 Compare December 15, 2023 05:54

piodul force-pushed the new-serialize-api-batches branch from afde4a1 to c861937 Compare December 15, 2023 06:08

piodul marked this pull request as ready for review December 15, 2023 11:28

Lorak-mmk mentioned this pull request Dec 15, 2023

WIP: Switch batches to new serialization traits #870

Closed

8 tasks

Lorak-mmk requested changes Dec 15, 2023

View reviewed changes

piodul force-pushed the new-serialize-api-batches branch 2 times, most recently from cfd9e97 to fcdeb08 Compare December 15, 2023 13:51

Lorak-mmk reviewed Dec 15, 2023

View reviewed changes

piodul and others added 7 commits December 15, 2023 15:31

scylla-cql/frame, scylla/{statement,transport}: switch to new API

6b88c7e

The driver is updated to use the new BatchValues API on all layers at once. Fortunately, there aren't many changes and they are mostly simple.

session: move the batch token awareness logic to a separate function

e6c2aa5

Simplify the logic of `Session::batch` by moving the parts responsible for token calculation and wrapping the `BatchValues` argument into a separate function in the `batch_values` module.

serialize/batch: add compatibility layer for the legacy API

666ca25

In case somebody has a custom implementation of BatchValues, they can use the adapter.

value_tests: adapt batch tests to both new and legacy API

2ea431c

This way can reuse the existing tests for impls of the new traits and see that they behave in the same way.

serialize/row: remove SerializedValues::to_old_serialized_values

fbeafd7

Migration of batches to the new API is complete, so this method is no longer needed and can finally be removed.

piodul force-pushed the new-serialize-api-batches branch from fcdeb08 to fbeafd7 Compare December 15, 2023 14:31

piodul requested a review from Lorak-mmk December 15, 2023 14:36

Lorak-mmk approved these changes Dec 15, 2023

View reviewed changes

Lorak-mmk merged commit 99330c8 into scylladb:main Dec 15, 2023
8 checks passed

This was referenced Dec 21, 2023

Serialization refactor: adjust the codebase to use SerializeRow and SerializeCql in the public API #822

Closed

Serialization refactor: add new serialization traits #801

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serialization: implement new, type-safe variants of BatchValues #881

serialization: implement new, type-safe variants of BatchValues #881

piodul commented Dec 14, 2023 •

edited

Loading

Lorak-mmk left a comment

piodul commented Dec 15, 2023

piodul commented Dec 15, 2023

Lorak-mmk left a comment

Lorak-mmk Dec 15, 2023

piodul Dec 15, 2023

piodul Dec 15, 2023

Lorak-mmk Dec 15, 2023

serialization: implement new, type-safe variants of BatchValues #881

serialization: implement new, type-safe variants of BatchValues #881

Conversation

piodul commented Dec 14, 2023 • edited Loading

Pre-review checklist

Lorak-mmk left a comment

Choose a reason for hiding this comment

piodul commented Dec 15, 2023

piodul commented Dec 15, 2023

Lorak-mmk left a comment

Choose a reason for hiding this comment

Lorak-mmk Dec 15, 2023

Choose a reason for hiding this comment

piodul Dec 15, 2023

Choose a reason for hiding this comment

piodul Dec 15, 2023

Choose a reason for hiding this comment

Lorak-mmk Dec 15, 2023

Choose a reason for hiding this comment

piodul commented Dec 14, 2023 •

edited

Loading