fix: race when bumping items while loading a snapshot #4564

kostasrim · 2025-02-05T12:17:19Z

The original issue was submitted in #4497 and we supplied a fix in #4507. However, the fix ignored that the RdbLoader::Load() function is run per flow/shard thread and the "poison pill" of updating the loading state at the end of RdbLoader::Load() introduced a race condition:

shard_set->Add(i, [bc]() mutable {
      namespaces->GetDefaultNamespace().GetCurrentDbSlice().SetLoadInProgress(false);
      bc->Dec();
    });

Any flow F that finished loading its own snapshot first (relatively to the rest of the flows) will call SetLoadInProgress(false) on ALL shard threads. The consequence of that is that other flows are not yet done (their respective RdbLoader::Load()` is still processing) and next time the use the db slice API will start Bumping up items because now load in progress is false.

The fix is to update the state after all shard flows are done and similarly to update all shard flow before we start the Load() which shall provide a consistent state/view among all shard threads.

Should resolve #4554

P.s. we might be able to simplify the new db slice state via the global loading state. That's something I will need to follow but I won't do this as part of this PR.

Signed-off-by: kostas <[email protected]>

romange

Maybe I misunderstand something but why do we need to orcherstrate SetLoadInProgress on all the shards? Can we do it locally on each shard? i.e. independently for each shard?

kostasrim · 2025-02-05T15:04:38Z

Maybe I misunderstand something but why do we need to orcherstrate SetLoadInProgress on all the shards? Can we do it locally on each shard? i.e. independently for each shard?

Good question! Because master and replica might have different number of proactors. So imagine the following case:

Flow 1 -> Sets the shard's 1 loading state to true.
Flow 2 -> Still hasn't set its local loading state to true (the flow fiber did not even start because the proactor was busy)

Flow 1 had only one item, Load finished instantly and now it calls FinishLoad which calls FlushShardAsync which unfortunately calls LoadItemsBuffer on shard number 2. Boom, state loading is false. In other words, each flow at the end can dispatch to multiple shards which might have not yet have their state updated. And we can't really rely on the order of submissions to the task/shard queue (since we don't have a guarantee of sequential task execution from what you have said in the past -- and I could be wrong here).

However, saying this, I think there is a better solution! If we call FlushShardAsync we can first check if the loading state is updated. If it's not we can set it to true. That way we "save" the first scatter part of the operation (dispatching to all shard threads and update the state). We only keep the "gather" step (updating the state to false after all the loaders completed)

romange · 2025-02-05T17:16:14Z

i did not analyse the code but maybe using a unsigned counter instead of boolean in db_slice will simplify things?
i.e. every time we start we increase and when we stop we decrease.

Signed-off-by: kostas <[email protected]>

kostasrim · 2025-02-06T10:22:19Z

src/server/rdb_test.cc

@@ -735,4 +735,19 @@ TEST_F(RdbTest, HugeKeyIssue4497) {
  EXPECT_EQ(Run({"flushall"}), "OK");
 }

+TEST_F(RdbTest, HugeKeyIssue4554) {


This fails without my changes.

IMO, I wanted to reproduce via a replication test, because only replication is used on the issue. However, it proved to be a little more difficult than expected for reasons hard for me to explain in a few lines. I am fairly positive that this PR will address all the problems but I would like to have a binary asap to sent and verify that we did see the same/similar case on the issue as here.

kostasrim · 2025-02-06T10:23:12Z

i did not analyse the code but maybe using a unsigned counter instead of boolean in db_slice will simplify things? i.e. every time we start we increase and when we stop we decrease.

Now that we have a test I will polish. Let me think and I will get back to you soon

kostasrim · 2025-02-06T11:07:23Z

i did not analyse the code but maybe using a unsigned counter instead of boolean in db_slice will simplify things? i.e. every time we start we increase and when we stop we decrease.

Now that we have a test I will polish. Let me think and I will get back to you soon

@romange sequence numbers worked like charm! No more extra dispatches :)

kostasrim · 2025-02-06T11:12:31Z

src/server/rdb_load.cc

@@ -2554,6 +2562,8 @@ void RdbLoader::LoadItemsBuffer(DbIndex db_ind, const ItemsBuf& ib) {
  DbContext db_cntx{&namespaces->GetDefaultNamespace(), db_ind, GetCurrentTimeMs()};
  DbSlice& db_slice = db_cntx.GetDbSlice(es->shard_id());

+  DCHECK(!db_slice.IsCacheMode());


Maybe CHECK instead ?

romange · 2025-02-06T21:18:14Z

src/server/db_slice.cc

@@ -283,7 +283,6 @@ DbSlice::DbSlice(uint32_t index, bool cache_mode, EngineShard* owner)
      cache_mode_(cache_mode),
      owner_(owner),
      client_tracking_map_(owner->memory_resource()) {
-  load_in_progress_ = false;


please remove the CHECK at db_slice.cc:783] Check failed: fetched_items_.empty()
and replace it with DFATAL

oh I forgot this, you mentioned in it in the standup

romange · 2025-02-06T21:19:29Z

src/server/db_slice.h

@@ -598,6 +601,7 @@ class DbSlice {
  size_t soft_budget_limit_ = 0;
  size_t table_memory_ = 0;
  uint64_t entries_count_ = 0;
+  size_t load_in_progress_ = 0;


nit: use size_t for sizes, unsigned for counters, also maybe change the name to load_ref_cnt_ ?

sure no issue with that! Out of curiosity why unsigned for counter ? (size_t is also unsigned 😄 )

yeah, it's very subjective size_t and uint64_t are the same type but size_t tells me it's sizes, lengths etc.
counters - are usually uintxxx, that's how I try to declare types (proper readability aspect)

Sounds good, I will keep it in mind/ apply the notation. Cheers

romange

lgtm

Signed-off-by: kostas <[email protected]>

kostasrim · 2025-02-07T08:26:06Z

src/server/db_slice.cc

@@ -794,7 +794,8 @@ void DbSlice::FlushDbIndexes(const std::vector<DbIndex>& indexes) {
    std::swap(db_arr_[index]->trans_locks, flush_db_arr[index]->trans_locks);
  }

-  CHECK(fetched_items_.empty());
+  LOG_IF(DFATAL, fetched_items_.empty())


Since now we don't crash in release I will ask for the logs (just to make sure that we don't have any gaps)

fix: race when bumping items while loading a snapshot

d8554a0

Signed-off-by: kostas <[email protected]>

kostasrim self-assigned this Feb 5, 2025

romange reviewed Feb 5, 2025

View reviewed changes

add test

ab451c2

Signed-off-by: kostas <[email protected]>

kostasrim commented Feb 6, 2025

View reviewed changes

use sequence numbers

ac9943c

kostasrim commented Feb 6, 2025

View reviewed changes

kostasrim requested a review from romange February 6, 2025 11:12

romange reviewed Feb 6, 2025

View reviewed changes

comments

da658d5

Signed-off-by: kostas <[email protected]>

kostasrim requested a review from romange February 7, 2025 08:24

kostasrim commented Feb 7, 2025

View reviewed changes

revert log if condition

b31f119

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: race when bumping items while loading a snapshot #4564

fix: race when bumping items while loading a snapshot #4564

kostasrim commented Feb 5, 2025

romange left a comment

kostasrim commented Feb 5, 2025

romange commented Feb 5, 2025

kostasrim Feb 6, 2025 •

edited

Loading

kostasrim commented Feb 6, 2025

kostasrim commented Feb 6, 2025 •

edited

Loading

kostasrim Feb 6, 2025

romange Feb 6, 2025

romange Feb 6, 2025

kostasrim Feb 7, 2025

romange Feb 6, 2025

kostasrim Feb 7, 2025 •

edited

Loading

romange Feb 7, 2025

kostasrim Feb 7, 2025

romange left a comment

kostasrim Feb 7, 2025

fix: race when bumping items while loading a snapshot #4564

Are you sure you want to change the base?

fix: race when bumping items while loading a snapshot #4564

Conversation

kostasrim commented Feb 5, 2025

romange left a comment

Choose a reason for hiding this comment

kostasrim commented Feb 5, 2025

romange commented Feb 5, 2025

kostasrim Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

kostasrim commented Feb 6, 2025

kostasrim commented Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kostasrim Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romange left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kostasrim Feb 6, 2025 •

edited

Loading

kostasrim commented Feb 6, 2025 •

edited

Loading

kostasrim Feb 7, 2025 •

edited

Loading