Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert Repliear to use the row versioning strategy #158

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

tantaman
Copy link
Collaborator

@tantaman tantaman commented Nov 13, 2023

Based on todo-row-versioning. This won't diff well since it re-structured the app and switched to Vite.

Most relevant files:

Meta

Server

Note that past CVRs contribute to the current CVR. I.e., all data the client currently has is represented by all data we've ever sent. If any CVR is dropped we'll just re-fetch the rows that the CVR used to have.

Client

Note that I'm currently only allowing one tab to pull at once. This is explained in pull.ts in the comment above dropCVREntries. This is something I could fix if we want to keep the behavior of allowing tabs to sync the same data at the same time.

Copy link
Contributor

@aboodman aboodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still thinking about the big picture here, will review more tomorrow.

I like the representation of the CVR in the database, very clever. I wonder how much you thought about its performance as it grows more and more spread out. Does it ever need to be compacted?

Some other comments about the design of pull inside.

server/src/pgconfig/pgconfig.ts Outdated Show resolved Hide resolved
server/src/schema.ts Show resolved Hide resolved
server/src/schema.ts Show resolved Hide resolved
server/src/schema.ts Outdated Show resolved Hide resolved
server/src/pull/pull.ts Outdated Show resolved Hide resolved
server/src/pull/pull.ts Outdated Show resolved Hide resolved
server/src/pull/next-page.ts Outdated Show resolved Hide resolved
server/src/pull/pull.ts Outdated Show resolved Hide resolved
@tantaman
Copy link
Collaborator Author

tantaman commented Nov 16, 2023

I wonder how much you thought about its performance as it grows more and more spread out. Does it ever need to be compacted?

I believe that SELECT rowid, rowversion WHERE NOT IN (...) will slow down as the CVR gets larger. I don't know by how much or where the limit is so that is something that:

a. I can dig into more or
b. we can ignore for now since we're going to replace this "sync the world" with UI driven sync which may end up changing the type of query we issue

side note: I was very surprised that WHERE NOT IN (...) performed better than WHERE NOT EXISTS (...).

Update: EXCLUDE is probably the best option here.

@aboodman
Copy link
Contributor

aboodman commented Nov 17, 2023 via email

@tantaman
Copy link
Collaborator Author

Am I missing something?

No, I don't think so. You've definitely done more research on weblocks than myself.

server/src/schema.ts Outdated Show resolved Hide resolved
await executor(/*sql*/ `CREATE TABLE replicache_client_group (
id VARCHAR(36) PRIMARY KEY NOT NULL,
cvrversion INTEGER null,
clientversion INTEGER NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with your system here we could actually implement the client records using cvr too, which would simplify things! The client table could have a rowversion field and we could send updates to lastMutationIDs using the same system.

I didn't originally want to do this in todo_row_versioning because it will explode memory since client entries only grow. But with your system it will properly use indexes, so I think it is fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You lost me here. Are you saying to replace replicache_client_group by moving its data into client_view?

That sounds right since both tables already have exactly the same data. I'd need to stop dropping client_view records when getting an old client_view version from the cookie, however.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch in Replicache contains basically three pieces of data:

  1. A patch to the client view
  2. A patch to lastMutationIDs
  3. A cookie that encodes current state of backend db

todo-row-versioning calculates 1 and 2 using two different systems. For (1) it uses row versioning. For (2) it uses, basically, per-space versioning. Each ReplicaheClientGroup has a version and the clients have the version they updated at.

It would be more elegant to use row-versioning for calculating both patches. This would entail changing the way the version on the ReplicacheClient rows works. Instead of taking the latest value of the RCG.version, it would increment on its own independently, and then be stored as part of the CVR like other data.

The reason I didn't do this originally in todo-row-versioning is that every page load generates a new client. The set of clients only grows, and every CVR would contain the entire set of them. The in-memory diff would eventually get out of control.

But with your implementation here the diff is done inside the database using indexes. I am guessing that it could stay fast indefinitely.

The only problem with my proposal is that with it, the CVR table would grow with O(number-of-clients) rather than as it does now with O(number-of-rows-in-primary-data). But I believe that is addressable. Right now Replicache doesn't completely support deleting old client records (rocicorp/replicache#1033). But if it did, then we could delete them, and they'd automatically get deleted from this CVR table too.

server/src/schema.ts Outdated Show resolved Hide resolved
server/src/schema.ts Outdated Show resolved Hide resolved
server/src/schema.ts Outdated Show resolved Hide resolved
server/src/schema.ts Outdated Show resolved Hide resolved
server/src/schema.ts Outdated Show resolved Hide resolved
server/src/pull/pull.ts Outdated Show resolved Hide resolved
server/src/pull/cvr.ts Show resolved Hide resolved
@tantaman
Copy link
Collaborator Author

tantaman commented Nov 30, 2023

Given all the new updates, are we fine where things stand and ready to merge?

EDIT: Given we're removing coordination (weblocks) of pulls from the client (and the visibility trick doesn't work if two tabs are visible #158 (comment)) I need to revise the CVR approach. So no, we're not ready to merge until I commit the CVR updates to solve this problem on the server.

un-addressed items:

Yeah, I agree this is a problem now. A design for the cvr_entry table that would be "compatible" with the current design of Replicache would be one where older rows are retained instead of mutated in place. #158 (comment)

The inefficiency is addressed by not pulling from a tab that is inactive. "not mutating in place" is not addressed since there's other downsides there. Mainly the client view becoming unbounded in size.

EDIT: the visibility trick doesn't work since the user might have both tabs visible at once. The revised CVR approach will not mutate data in place in order to solve the sync problem.

Why build up these two huge arrays

Replied in-line: #158 (comment)

I think with your system here we could actually implement the client records using cvr too, which would simplify things! #158 (comment)

Didn't do this / waiting clarification. I think we could also merge and do this later.

@tantaman
Copy link
Collaborator Author

9f1cb55 fixes things so many tabs can make progress if they're all trying to sync at once.

At this point it warrants some proper integration tests and compaction (cvrs now grow without bound).

The basic idea is that we no longer drop client_views nor mutate them in place. Each client view refers to its parent and we use that linked list of client views to construct the full view. If tabs branch off they get their own view that is not interfered with by other tabs.

image

@aboodman
Copy link
Contributor

9f1cb55 fixes things so many tabs can make progress if they're all trying to sync at once.

OK, I want to understand why it didn't work without this. Going to try and load it into my brain today.

@aboodman
Copy link
Contributor

aboodman commented Dec 1, 2023 via email

@aboodman
Copy link
Contributor

aboodman commented Dec 1, 2023 via email

@tantaman
Copy link
Collaborator Author

tantaman commented Dec 1, 2023

Greg and I are meeting tomorrow midday PST to figure out our opinion and
will share it after that. Maybe we could even all meet together.

Sure. I should be available if it's before 5pm EST.

Copy link
Contributor

@grgbkr grgbkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you'd like to hop on zoom again to discuss any of my comments.

Thanks!

client/src/util/useLock.ts Outdated Show resolved Hide resolved
client/index.html Outdated Show resolved Hide resolved
client/src/index.tsx Outdated Show resolved Hide resolved
console.error(
`Error executing mutation: ${JSON.stringify(mutation)}: ${e}`,
);
return {error: String(e)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this code needs to throw an error in this case so that the transact logic doesn't commit any partial writes from the mutate call above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Also looks like this bug exists in todo-row-versioning: https://github.com/rocicorp/todo-row-versioning/blob/main/server/src/push.ts#L120-L123

client/src/index.tsx Show resolved Hide resolved
server/src/pull/pull.ts Outdated Show resolved Hide resolved
clientGroupID,
sinceClientVersion: baseCVR.clientVersion,
}),
readNextPage(executor, clientGroupID, baseCVR.order),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paginates not only the initial loading of entries, but also changes to entries. The existing repliear incremental sync implementation only pages initial loading of entries, never changes. This is because Repliear's incremental sync implementation tries to guarantee that the client view is consistent aside from potentially being partial (i.e. partial meaning some entries are missing entirely). That is it ensure there are never inconsistencies between entries. For example if on the server we have entires [[a, 1], [b, 1]] and a mutation updates the entries to [[a, 2], [b,2]], the possible valid client views are [], [[a, 1]], [[b,1]], [[a, 2]], [[b,2]], [[a, 1], [b, 1]], [[a, 2], [b,2]], but never [[a, 1], [b,2]] or [[a, 2], [b,1]].

Another type of inconsistency that can result if we paginate entry changes is a pull response may say it includes the effects of client c1's mutation 5 by including a lastMutationIDChanges { c1: 5 }, however the effects of this mutation may not actually be included in the pull's patch due to pagination.

Repliear's current implementation always includes all entires changed since the client's previous sync,

][] = await getChangedEntries(executor, spaceID, requestCookie.version);
, though to achieve the desired consistency semantics it only really needs to send all changes to entries already synced to the client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's match Repliear's current implementation. It's too hard to think about paginating mutations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem that I can pass an arg to pull in order to differentiate the reason why I'm pulling. I.e., pulling in response to a poke or pulling for incremental sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's talk about this tomorrow.

server/src/pull/cvr.ts Outdated Show resolved Hide resolved
@tantaman tantaman force-pushed the main branch 2 times, most recently from bb632b9 to 73c9548 Compare December 5, 2023 01:45
@tantaman tantaman force-pushed the main branch 2 times, most recently from f1521b5 to 3bb9399 Compare December 5, 2023 21:39
@aboodman
Copy link
Contributor

aboodman commented Dec 6, 2023

@tantaman - remind me why we have both

cvrversion INTEGER null,
and
"version" INTEGER NOT NULL,
? It seems like they move together. It seems like we only need the one on client_view.

client_view has a 1:1 relationship with replicache_client_group right? The only real reason for client_view right now at all is because we need to track the client_version that was last sent to this group. If we can remove that, then this whole table can go away.

But as long as we do have the table, then it seems like we can track the latest version for the client_view on its row, rather than on the somewhat unrelated replicache_client_group.

@tantaman
Copy link
Collaborator Author

tantaman commented Dec 7, 2023

@tantaman - remind me why we have both

Basically it was from my ignorance when starting the conversion. replicache_client_group existed in todo-row-versioning and I wanted to minimize the moving pieces until everything was working.

I think we can consolidate them now. The rest of the updates we talked about are about done. Writing some tests and tracking down one last bug.

@tantaman tantaman force-pushed the main branch 2 times, most recently from fff3d91 to cfefc45 Compare December 8, 2023 20:24
The old value provided by replicache is not always accurate.

What I've seen in practice:
- The old value as reported by replicache is ahead of the value in `allIssuesMap` in terms of modification time

My theory is that two modifications can happen to the same row back to back. One tab pulls and gets the first mutation, another tab pulls and gets the secnod.

I can't reproduce the error when only a single tab syncs.
@tantaman
Copy link
Collaborator Author

tantaman commented Dec 8, 2023

this cfefc45 has everything we talked about + tests

  • fast forwards when receiving an old cookie
  • handles deletes and re-insertions of previously deleted rows
  • pulls all data that may have been updated or deleted since the last pull in order to preserve consistency semantics for mutations

LMK if you'd rather this be a new PR stacked on top of the old PR. That is my preferred workflow (small, stacked, atomic PRs that are easily reviewed) although github doesn't seem to support it well.

I did see some weird behavior with experimentalWatch and 4 tabs syncing at once. You can read more in the commit message here: a3a11ac

notes.md Outdated Show resolved Hide resolved
@aboodman
Copy link
Contributor

Thanks @tantaman - I reviewed this and overall it looks very great. I'm so stoked to get this landed.

There are a bunch of tiny comments I have though and I think it would make more sense for me to take it from here for two reasons - (a) it's going to be more efficient for to just make all the tiny changes, (b) it's hard to review a change this size really thoroughly in code review. I will feel more confident if I've "had my hands on it" so to speak.

So I think I'll just take the PR over and add the finishing touches. Is that OK with you?

@aboodman
Copy link
Contributor

You could move on to integrating materialite.

@tantaman
Copy link
Collaborator Author

So I think I'll just take the PR over and add the finishing touches. Is that OK with you?

Yep, that's fine with me.

Yeah, big PR's are a big pain to deal with. Hopefully we can keep them small for future work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants