-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(event cache): unload a linked chunk whenever we get a limited sync #4694
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really exciting. Thank you for having worked on this! That's exactly what I had in mind and what we've talked together. Super happy we are aligned on this.
The novelty —compared to what I was imagining— is the replace_with
API which I find pretty elegant. Kudos for that.
I let a couple of feedback about possible unsafety. The way your patch is implemented doesn't create unsafety I think, but marking one or two methods unsafe
is primordial I believe.
Yes, tests are missing, but I know it's a first shot and I know you'll write them.
EventCache
, and so of the Timeline
will see an Update::Clear
, then an Update::NewItemsChunk
. Translated by linked_chunk::AsVector
, it gives VectorDiff::Clear
, then VectorDiff::PushBack
. Basically, the timeline will “blink”/“flash”. This is not ideal at all, knowing that it can happen pretty often…
I see two solutions here:
- Either we write an heuristic in
AsVector
:- when a
VectorDiff::Clear
is followed byVectorDiff::PushBack { values }
or other insertions, it can be folded/merged in aVectorDiff::Reset { values }
- however, the
Timeline
will re-create the timeline items, with new unique IDs, so the renderer on the app side will not be able to make a clear diff, and… “blink”/“flash” again (all timeline items will be dropped, and new items will be re-created) - we could optimise that on the
Timeline
side by re-using the same unique ID for items that have been removed and re-inserted based on their event$event_id
, but I think it starts to create many complications
- when a
- Either, instead of emitting an
Update::Clear
, we emit a bunch ofUpdate::RemoveChunk
until one chunk remains. It slightly changes the approach a bit, because instead of having areplace_with
, we get aremove_all_except_last
. The underlying code remains the same, but theUpdate
s are different⚠️ note thatAsVector
expectsRemoveChunk
to remove… an empty chunk!! It emits zeroVectorDiff
. If we go in that path, we must updateAsVector
consequently, nothing fancy, but it must be done (edit: a draft here feat(common):Update::RemoveChunk
emitsVectorDiff::Remove
#4696).
I am not inclined to approve this PR until we have a consensus around this question. I know you understand that. It doesn't mean your work is not good: it is excellent and I couldn't do better myself. Congrats for that. I think however we must answer these fundamental questions before moving forward.
This patch updates `Update::RemoveChunk` to emit `VectorDiff::Remove`. Until now, `RemoveChunk` was expecting the chunk to be empty, because it is how it is used so far. However, with matrix-org#4694, it can change rapidly.
This patch updates `Update::RemoveChunk` to emit `VectorDiff::Remove`. Until now, `RemoveChunk` was expecting the chunk to be empty, because it is how it is used so far. However, with #4694, it can change rapidly.
…orage updates And rename it accordingly to `RoomEvents::store_updates`. Note: no changelog, because this is an internal API only.
d0d20a3
to
b147456
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4694 +/- ##
==========================================
- Coverage 85.89% 85.89% -0.01%
==========================================
Files 292 292
Lines 33907 33948 +41
==========================================
+ Hits 29124 29159 +35
- Misses 4783 4789 +6 ☔ View full report in Codecov by Sentry. |
For what it's worth, we've discussed about this offline, and came to the conclusion that correctness is more important than performance here. In the absence of this crucial fix, it might look like there are missing messages in a timeline. I also suspect that the batching at the output of the timeline's subscription would mostly hide the problem described here (or result in a timeline "flash", if the timeline happened to be opened while a new gappy sync happens), but let's see in multiple steps. |
b147456
to
daff99c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's even better! Well done.
I suspect we have a bug and that's why I can't approve the PR for the moment, please see my feedback.
Item: Clone, | ||
Gap: Clone, | ||
{ | ||
// Now that all checks have been done, do the actual replacement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which checks?
return Err(LazyLoaderError::ChunkIsNotLast { id: chunk.identifier }); | ||
} | ||
|
||
// The last chunk is valid. Perform all checks now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Perform all checks now.
+
// Check the linked chunk is the default empty events chunk. | ||
assert_matches!(linked_chunk.chunks().next(), Some(chunk) => { | ||
assert_eq!(chunk.identifier(), ChunkIdentifier::new(0)); | ||
assert!(chunk.is_items()); | ||
assert_matches!(chunk.content(), ChunkContent::Items(items) => { | ||
assert!(items.is_empty()); | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check there is no chunk after this chunk (chunk.next().is_none()
).
Let's also check linked_chunk.chunks().next().NEXT()
is None
too, i.e. let's check there is only one chunk.
If possible, check the Ends::last
too.
// Nothing more. | ||
assert!(it.next().is_none()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This! (for the previous comment)
assert_eq!(chunk.identifier(), chunk_id); | ||
assert_matches!(chunk.content(), ChunkContent::Items(items) => { | ||
assert_eq!(*items, vec!['e', 'f']); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check chunk.next().is_none()
too please?
// When reading the events, we do get only the last one. | ||
let (events, _) = room_event_cache.subscribe().await; | ||
assert_eq!(events.len(), 1); | ||
assert_eq!(events[0].event_id().as_deref(), Some(evid2)); | ||
|
||
// But if we back-paginate, we don't need access to network to find out about | ||
// the previous event. | ||
let outcome = room_event_cache.pagination().run_backwards_once(20).await.unwrap(); | ||
assert_eq!(outcome.events.len(), 1); | ||
assert_eq!(outcome.events[0].event_id().as_deref(), Some(evid1)); | ||
assert!(outcome.reached_start.not()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
// Run pagination once: it will consume prev-batch2 first, which is the most | ||
// recent token, which returns an empty batch, thus indicating the start of the | ||
// room. | ||
let pagination = room_event_cache.pagination(); | ||
|
||
let outcome = pagination.run_backwards_once(20).await.unwrap(); | ||
assert!(outcome.reached_start); | ||
assert!(outcome.events.is_empty()); | ||
assert!(stream.is_empty()); | ||
|
||
// Next, we lazy-load a next chunk from the store, and get the initial, empty | ||
// default events chunk. | ||
let outcome = pagination.run_backwards_once(20).await.unwrap(); | ||
assert!(outcome.reached_start.not()); | ||
assert!(outcome.events.is_empty()); | ||
assert!(stream.is_empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What? We are reaching the start of the timeline, then we paginate again and we are not reaching the start of the timeline?
How the Timeline
is supposed to know it has to paginate once again if reached_start
is set to true
? Is it a bug?
@@ -461,8 +461,20 @@ impl RoomEventCacheInner { | |||
}) | |||
.await?; | |||
|
|||
if prev_batch.is_some() && !all_duplicates { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explains this is how we compute the limited
flag here?
// We must do this *after* the above call to `.with_events_mut`, so the new | ||
// events and gaps are properly persisted to storage. | ||
if let Some(new_events_diffs) = state.shrink_to_last_chunk().await? { | ||
sync_timeline_events_diffs = new_events_diffs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's fine to forget the previous sync_timeline_events_diffs
FOR THE MOMENT because new_events_diffs
contains a Clear
, but imagine this is no longer the case? We will have a big bug here. Can we sync_timeline_events_diffs.extend(new_events_diffs);
instead?
This implements unloading the linked chunk, so as to free memory on the one hand, and avoid some weird corner cases like #4684 on the other hand.
Unloading a linked chunk happens in two steps:
Then, we make use of that functionality whenever we receive a gap via sync. This resolves the situation where we start with a hot cache store, that has one old event E1; the room's state is actually [E1, E2, E3], and the last sync returns [Gap, E3]. In this case, since we don't render gaps yet in the timeline, the timeline would show [E1, E3], making it look like we missed event E2; although the next pagination would make it appear. Instead, we here unload the linked chunk to its last chunk (E3), so that it clears [E1] from rendering, and the next paginations will start from the latest gap.
Fixes #4684.
Part of #3280.