Skip to content

RFC: New Iterators API #110

Open
@marvin-j97

Description

@marvin-j97

Currently the iterator syntax can not really represent key-value separation.

For instance, if we wanted to get the list of keys of a prefix, using something like tree.prefix(p).map(...) would end up loading all blobs as well because prefix() eagerly loads all KVs.

There would need to be a separate API for prefix_keys, that is something like -> impl Iterator<Item = lsm_tree::Result<UserKey>>. Same for prefix_sizes, prefix_values, range_keys, ... you get the idea:

let kvs = db.prefix("file#").collect::<Result<_>>();
// -----------v specialized impls in BlobTree
let size = db.prefix_size("file1#").map(Result::unwrap).sum();
let keys = db.prefix_keys("file#").collect::<Result<_>>();
let values = db.prefix_values("file#").collect::<Result<_>>();

// same for range_*

This increases the API surface a lot because we suddenly have 8 different range and prefix functions.


Instead, iterators could return a Guard struct that is opaque to the user, but provides the following methods:

trait Guard {
    fn key(self) -> crate::Result<&UserKey>; // TODO: separate `into_key` method??
    fn value(self) -> crate::Result<UserValue>;
    fn into_inner(self) -> crate::Result<(UserKey, UserValue)>;
    fn size(self) -> crate::Result<u32>;
}

Notably, when using key-value separation, when accessing key() or size(), blobs are never loaded because vHandles are not resolved.

This allows much more semantically rich, but performant, queries without increasing the API surface:

let kvs = db.prefix("file1#")
  .map(lsm_tree::Guard::tuple)
  .collect::<Result<_>>();

let size = db.prefix("file1#")
  .map(lsm_tree::Guard::size)
  .map(Result::unwrap)
  .sum();

let keys = db.prefix("file1#")
  .map(lsm_tree::Guard::key)
  .collect::<Result<_>>();

let values = db.prefix("file1#")
  .map(lsm_tree::Guard::value)
  .collect::<Result<_>>();

// same for range_*

Using for-loops:

// Old
//
// This would not perform well for blobs because prefix() always resolves vHandles
for kv in db.prefix("file1#") {
  let (k, _) = kv?;
  eprintln!("found key: {k:?}");
}

// New
//
// Because we only access key(), blobs are never loaded
for guard in db.prefix("file1#") {
  let k = guard.key();
  eprintln!("found key: {k:?}");
}

Metadata

Metadata

Assignees

Labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions