Description
Currently the iterator syntax can not really represent key-value separation.
For instance, if we wanted to get the list of keys of a prefix, using something like tree.prefix(p).map(...)
would end up loading all blobs as well because prefix()
eagerly loads all KVs.
There would need to be a separate API for prefix_keys
, that is something like -> impl Iterator<Item = lsm_tree::Result<UserKey>>
. Same for prefix_sizes
, prefix_values
, range_keys
, ... you get the idea:
let kvs = db.prefix("file#").collect::<Result<_>>();
// -----------v specialized impls in BlobTree
let size = db.prefix_size("file1#").map(Result::unwrap).sum();
let keys = db.prefix_keys("file#").collect::<Result<_>>();
let values = db.prefix_values("file#").collect::<Result<_>>();
// same for range_*
This increases the API surface a lot because we suddenly have 8 different range and prefix functions.
Instead, iterators could return a Guard
struct that is opaque to the user, but provides the following methods:
trait Guard {
fn key(self) -> crate::Result<&UserKey>; // TODO: separate `into_key` method??
fn value(self) -> crate::Result<UserValue>;
fn into_inner(self) -> crate::Result<(UserKey, UserValue)>;
fn size(self) -> crate::Result<u32>;
}
Notably, when using key-value separation, when accessing key()
or size()
, blobs are never loaded because vHandles are not resolved.
This allows much more semantically rich, but performant, queries without increasing the API surface:
let kvs = db.prefix("file1#")
.map(lsm_tree::Guard::tuple)
.collect::<Result<_>>();
let size = db.prefix("file1#")
.map(lsm_tree::Guard::size)
.map(Result::unwrap)
.sum();
let keys = db.prefix("file1#")
.map(lsm_tree::Guard::key)
.collect::<Result<_>>();
let values = db.prefix("file1#")
.map(lsm_tree::Guard::value)
.collect::<Result<_>>();
// same for range_*
Using for-loops:
// Old
//
// This would not perform well for blobs because prefix() always resolves vHandles
for kv in db.prefix("file1#") {
let (k, _) = kv?;
eprintln!("found key: {k:?}");
}
// New
//
// Because we only access key(), blobs are never loaded
for guard in db.prefix("file1#") {
let k = guard.key();
eprintln!("found key: {k:?}");
}