-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert Repliear to use the row versioning strategy #158
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still thinking about the big picture here, will review more tomorrow.
I like the representation of the CVR in the database, very clever. I wonder how much you thought about its performance as it grows more and more spread out. Does it ever need to be compacted?
Some other comments about the design of pull inside.
I believe that a. I can dig into more or side note: I was very surprised that Update: |
On Thu, Nov 16, 2023 at 4:54 AM Matt Wonlaw ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In server/src/pull/pull.ts
<#158 (comment)>:
> + clientGroupID,
+ );
+ const prevCVR = pull.cookie
+ ? await getCVR(executor, pull.cookie.clientGroupID, pull.cookie.order)
+ : undefined;
+ const baseCVR: CVR = prevCVR ?? {
+ clientGroupID,
+ clientVersion: 0,
+ order: 0,
+ };
+
+ // Drop any cvr entries greater than the order we received.
+ // Getting an old order from the client means that the client is possibly missing the data
+ // from future orders.
+ //
+ // The other possbility is that multiple tabs are syncing (why are tabs not coordinating who does sync?)
I thought you'd have to do leader election for my work on cr-sqlite in the
browser as well but it seems like WebLocks just solve this? Have each tab
try to acquire the lock as its first operation, once the lock is acquired
that tab starts to sync. When tabs die, this'll free the lock for other
tabs to take over sync.
We spent a ton of time trying to decide if we could trust web locks and
couldn't convince ourselves.
Here are some starting points:
https://x.com/aboodman/status/1549590251558936577?s=20 ***@***.*** in that
thread is ex-mozilla)
w3c/web-locks#81
I would love to be convinced, but there are so many weird edge cases in
browsers around bfcache, frozen states – especially in mobile where it's
even harder to test, that it's very hard to feel confident – particularly
since the spec doesn't even say what should happen yet.
Everytime we get anywhere near here I just give up and decide it's easier
to engineer something that works right without locks than try to understand
if these things work across all the browsers and platforms we want to
support.
Luckily, I don't think web locks are necessary here -- I believe your
big-picture design does work without them:
#158 (comment)
Am I missing something?
… The only question is if a tab can be put to sleep while holding a lock. I
haven't seen this in practice and that sounds like bad design if browsers
could sleep tabs that hold shared resources.
—
Reply to this email directly, view it on GitHub
<#158 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAATUBB3HYFTF2ASXKHMBHDYEYRALAVCNFSM6AAAAAA7JZ5BI2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOMZUGU4DCMZYGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
No, I don't think so. You've definitely done more research on weblocks than myself. |
await executor(/*sql*/ `CREATE TABLE replicache_client_group ( | ||
id VARCHAR(36) PRIMARY KEY NOT NULL, | ||
cvrversion INTEGER null, | ||
clientversion INTEGER NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with your system here we could actually implement the client records using cvr too, which would simplify things! The client table could have a rowversion
field and we could send updates to lastMutationIDs using the same system.
I didn't originally want to do this in todo_row_versioning
because it will explode memory since client entries only grow. But with your system it will properly use indexes, so I think it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You lost me here. Are you saying to replace replicache_client_group
by moving its data into client_view
?
That sounds right since both tables already have exactly the same data. I'd need to stop dropping client_view
records when getting an old client_view
version from the cookie, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch in Replicache contains basically three pieces of data:
- A patch to the client view
- A patch to lastMutationIDs
- A cookie that encodes current state of backend db
todo-row-versioning calculates 1 and 2 using two different systems. For (1) it uses row versioning. For (2) it uses, basically, per-space versioning. Each ReplicaheClientGroup has a version and the clients have the version they updated at.
It would be more elegant to use row-versioning for calculating both patches. This would entail changing the way the version on the ReplicacheClient rows works. Instead of taking the latest value of the RCG.version, it would increment on its own independently, and then be stored as part of the CVR like other data.
The reason I didn't do this originally in todo-row-versioning is that every page load generates a new client. The set of clients only grows, and every CVR would contain the entire set of them. The in-memory diff would eventually get out of control.
But with your implementation here the diff is done inside the database using indexes. I am guessing that it could stay fast indefinitely.
The only problem with my proposal is that with it, the CVR table would grow with O(number-of-clients) rather than as it does now with O(number-of-rows-in-primary-data). But I believe that is addressable. Right now Replicache doesn't completely support deleting old client records (rocicorp/replicache#1033). But if it did, then we could delete them, and they'd automatically get deleted from this CVR table too.
Given all the new updates, EDIT: Given we're removing coordination (weblocks) of pulls from the client (and the visibility trick doesn't work if two tabs are visible #158 (comment)) I need to revise the CVR approach. So no, we're not ready to merge until I commit the CVR updates to solve this problem on the server. un-addressed items:
EDIT: the visibility trick doesn't work since the user might have both tabs visible at once. The revised CVR approach will not mutate data in place in order to solve the sync problem.
Replied in-line: #158 (comment)
Didn't do this / waiting clarification. I think we could also merge and do this later. |
9f1cb55 fixes things so many tabs can make progress if they're all trying to sync at once. At this point it warrants some proper integration tests and compaction (cvrs now grow without bound). The basic idea is that we no longer drop |
OK, I want to understand why it didn't work without this. Going to try and load it into my brain today. |
<9f1cb55>
fixes
things so many tabs can make progress if they're all trying to sync at once.
Thank you for doing this, but let's please pause and figure out the design
of client_view_entry first. I've grown to really like your original design
where it's mutated in-place and doesn't grow. If we do do an immutable
design I think there's a more relational-friendly design that's possible
and doesn't require compaction.
Greg and I are meeting tomorrow midday PST to figure out our opinion and
will share it after that. Maybe we could even all meet together.
In the meantime...
> I think with your system here we could actually implement the client
records using cvr too, which would simplify things! #158 (comment)
<#158 (comment)>
Didn't do this / waiting clarification. I think we could also merge and
do this later.
Can you look into this? It seems orthogonal to above. I think it's likely
Greg may come back with some feedback tomorrow too.
And one last thing:
The original Repliear implementation created a unique "space" for each
visitor so that they didn't interfere with each others' writes. You can see
this if you visit production:
[image: CleanShot 2023-11-30 at ***@***.***
This was a large amount of complexity both inthe code and schema though.
Perhaps we should do something simpler like reset the data every 24h. This
could be done after merge, but registering it here as a TODO.
…On Thu, Nov 30, 2023 at 11:58 AM Matt Wonlaw ***@***.***> wrote:
9f1cb55
<9f1cb55>
fixes things so many tabs can make progress if they're all trying to sync
at once.
At this point it warrants some proper integration tests and compaction
(cvrs now grow without bound).
The basic idea is that we no longer drop client_views nor mutate them in
place. Each client view refers to its parent and we use that linked list of
client views to construct the full view. If tabs branch off they get their
own view that is not interfered with by other tabs.
image.png (view on web)
<https://github.com/rocicorp/repliear/assets/1009003/d942dfed-0871-4bcf-9f9f-6265e207c2e7>
—
Reply to this email directly, view it on GitHub
<#158 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAATUBFXT2GRVTQIFA2UH7DYHD6OZAVCNFSM6AAAAAA7JZ5BI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZUGYZDONRYGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
On Thu, Nov 30, 2023 at 8:08 PM Aaron Boodman ***@***.***>
wrote:
If we do do an immutable design I think there's a more relational-friendly
design that's possible and doesn't require compaction.
To avoid leaving the begged question hanging, here's what i was thinking:
client_view_entry
-----------------------------------
client_group_id string
entity(enum)
entity_id string
entity_version integer
valid_from_client_view_version integer
valid_to_client_view_version nullable integer
To read the current client view for a cg: `select entity_id, entity_version
from client_view_entry where id = $client_group_id and
valid_from_client_view_version >= $client_group_version and
coalesce(valid_to_client_view_version, $client_group_version) <=
$client_group_version`
To update the client view for a cg (handwaving)
- Pick the new client_view version
- For each entity row either not in the client view or whose version is
higher in entity row:
- If there is an existing valid row for that entity, set its
valid_to_client_view_version to the previous client_view version
- Insert a new client_view_entry row with valid_to_client_view_version =
NULL
(this idea basically stolen from "temporal tables" but using
valid_from/to_version rather than valid_from/to_timestamp)
…
Greg and I are meeting tomorrow midday PST to figure out our opinion and
will share it after that. Maybe we could even all meet together.
In the meantime...
>> I think with your system here we could actually implement the client
records using cvr too, which would simplify things! #158 (comment)
<#158 (comment)>
> Didn't do this / waiting clarification. I think we could also merge and
do this later.
Can you look into this? It seems orthogonal to above. I think it's likely
Greg may come back with some feedback tomorrow too.
And one last thing:
The original Repliear implementation created a unique "space" for each
visitor so that they didn't interfere with each others' writes. You can see
this if you visit production:
[image: CleanShot 2023-11-30 at ***@***.***
This was a large amount of complexity both inthe code and schema though.
Perhaps we should do something simpler like reset the data every 24h. This
could be done after merge, but registering it here as a TODO.
On Thu, Nov 30, 2023 at 11:58 AM Matt Wonlaw ***@***.***>
wrote:
> 9f1cb55
> <9f1cb55>
> fixes things so many tabs can make progress if they're all trying to sync
> at once.
>
> At this point it warrants some proper integration tests and compaction
> (cvrs now grow without bound).
>
> The basic idea is that we no longer drop client_views nor mutate them in
> place. Each client view refers to its parent and we use that linked list of
> client views to construct the full view. If tabs branch off they get their
> own view that is not interfered with by other tabs.
>
> image.png (view on web)
> <https://github.com/rocicorp/repliear/assets/1009003/d942dfed-0871-4bcf-9f9f-6265e207c2e7>
>
> —
> Reply to this email directly, view it on GitHub
> <#158 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAATUBFXT2GRVTQIFA2UH7DYHD6OZAVCNFSM6AAAAAA7JZ5BI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZUGYZDONRYGA>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
Sure. I should be available if it's before 5pm EST. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you'd like to hop on zoom again to discuss any of my comments.
Thanks!
server/src/push.ts
Outdated
console.error( | ||
`Error executing mutation: ${JSON.stringify(mutation)}: ${e}`, | ||
); | ||
return {error: String(e)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this code needs to throw an error in this case so that the transact
logic doesn't commit any partial writes from the mutate
call above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Also looks like this bug exists in todo-row-versioning: https://github.com/rocicorp/todo-row-versioning/blob/main/server/src/push.ts#L120-L123
server/src/pull/pull.ts
Outdated
clientGroupID, | ||
sinceClientVersion: baseCVR.clientVersion, | ||
}), | ||
readNextPage(executor, clientGroupID, baseCVR.order), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paginates not only the initial loading of entries, but also changes to entries. The existing repliear incremental sync implementation only pages initial loading of entries, never changes. This is because Repliear's incremental sync implementation tries to guarantee that the client view is consistent aside from potentially being partial (i.e. partial meaning some entries are missing entirely). That is it ensure there are never inconsistencies between entries. For example if on the server we have entires [[a, 1], [b, 1]] and a mutation updates the entries to [[a, 2], [b,2]], the possible valid client views are [], [[a, 1]], [[b,1]], [[a, 2]], [[b,2]], [[a, 1], [b, 1]], [[a, 2], [b,2]], but never [[a, 1], [b,2]] or [[a, 2], [b,1]].
Another type of inconsistency that can result if we paginate entry changes is a pull response may say it includes the effects of client c1's mutation 5 by including a lastMutationIDChanges { c1: 5 }, however the effects of this mutation may not actually be included in the pull's patch due to pagination.
Repliear's current implementation always includes all entires changed since the client's previous sync,
repliear/pages/api/replicache-pull.ts
Line 107 in 2e6bc65
][] = await getChangedEntries(executor, spaceID, requestCookie.version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's match Repliear's current implementation. It's too hard to think about paginating mutations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem that I can pass an arg to pull
in order to differentiate the reason why I'm pulling. I.e., pulling in response to a poke or pulling for incremental sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's talk about this tomorrow.
bb632b9
to
73c9548
Compare
This was a bit of a scorched earth swap. Swapped from nextjs -> vite and based this off of https://github.com/rocicorp/todo-row-versioning/tree/main
… out `useExclusiveEffect`
This reverts commit 9f1cb55.
see rocicorp#159 which we're merging into this fork with this commit
f1521b5
to
3bb9399
Compare
@tantaman - remind me why we have both Line 26 in 3bb9399
Line 75 in 3bb9399
client_view .
But as long as we do have the table, then it seems like we can track the latest version for the |
Basically it was from my ignorance when starting the conversion. I think we can consolidate them now. The rest of the updates we talked about are about done. Writing some tests and tracking down one last bug. |
…cy by pulling all updates & deletes (#11)
fff3d91
to
cfefc45
Compare
The old value provided by replicache is not always accurate. What I've seen in practice: - The old value as reported by replicache is ahead of the value in `allIssuesMap` in terms of modification time My theory is that two modifications can happen to the same row back to back. One tab pulls and gets the first mutation, another tab pulls and gets the secnod. I can't reproduce the error when only a single tab syncs.
this cfefc45 has everything we talked about + tests
LMK if you'd rather this be a new PR stacked on top of the old PR. That is my preferred workflow (small, stacked, atomic PRs that are easily reviewed) although github doesn't seem to support it well. I did see some weird behavior with |
Thanks @tantaman - I reviewed this and overall it looks very great. I'm so stoked to get this landed. There are a bunch of tiny comments I have though and I think it would make more sense for me to take it from here for two reasons - (a) it's going to be more efficient for to just make all the tiny changes, (b) it's hard to review a change this size really thoroughly in code review. I will feel more confident if I've "had my hands on it" so to speak. So I think I'll just take the PR over and add the finishing touches. Is that OK with you? |
You could move on to integrating materialite. |
Yep, that's fine with me. Yeah, big PR's are a big pain to deal with. Hopefully we can keep them small for future work. |
Based on
todo-row-versioning
. This won't diff well since it re-structured the app and switched to Vite.Most relevant files:
Meta
Server
Note that past CVRs contribute to the current CVR. I.e., all data the client currently has is represented by all data we've ever sent. If any CVR is dropped we'll just re-fetch the rows that the CVR used to have.
Client
Note that I'm currently only allowing one tab to pull at once. This is explained in
pull.ts
in the comment abovedropCVREntries
. This is something I could fix if we want to keep the behavior of allowing tabs to sync the same data at the same time.