Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBT replication broken #229

Open
KyleMaas opened this issue Nov 25, 2022 · 18 comments
Open

EBT replication broken #229

KyleMaas opened this issue Nov 25, 2022 · 18 comments
Labels
bug Something isn't working

Comments

@KyleMaas
Copy link
Contributor

When I attach Manyverse to go-ssb as a pub, I get the following error in Manyverse's log:

peer @[[pub ID]].ed25519 does not support RPC ebt.replicate
@KyleMaas
Copy link
Contributor Author

Oh, and I should note that the pub's config.toml has this:

# Enable syncing by using epidemic-broadcast-trees (EBT)
enable-ebt = true

@decentral1se
Copy link
Member

@KyleMaas Yep, that is a known issue (see https://github.com/ssbc/go-ssb/blob/master/docs/faq.md#can-go-ssb-replicate-with-manyverse for the full deets). The hope is that we can work towards fixing this in the near future, it would be a great boost for interop with Manyverse which a lot of folks want/need/expect. Opened #230 to converge more faq/bugs docs into one page, hopefully more discoverable...

@decentral1se
Copy link
Member

Actually better to keep this open and name it more generally? Helpful for other folks & is an outstanding issue...

@decentral1se decentral1se reopened this Nov 25, 2022
@decentral1se decentral1se changed the title Manyverse desktop cannot seem to replicate Manyverse cannot seem to replicate Nov 25, 2022
@decentral1se decentral1se added the bug Something isn't working label Nov 25, 2022
@KyleMaas
Copy link
Contributor Author

Ah, yep, I didn't see that. Thanks!

@KyleMaas
Copy link
Contributor Author

So, looking through the docs for EBT/Manyverse, I'm not seeing a whole lot of changes that have been made in the Planetary fork relating to EBT. So if it works there, is there any reason some of those changes couldn't be cherry-picked and brought over here?

@decentral1se
Copy link
Member

@KyleMaas The EBT changes in the Planetary fork were experimental and I did try to merge them in via #184 (comment) but ran out of steam with more broken tests (we have ~15 skipped tests and ~5 flaky tests in the test suite already). I still don't understand exactly how EBT is broken in go-ssb but I do intend to find out. More news as I have it. If you do any experiments, please share what you find 👍

@mycognosist
Copy link
Member

I am also planning on deepening my understanding of EBT. Hopefully we can figure this out together and get it replicating reliably.

@KyleMaas
Copy link
Contributor Author

@decentral1se

Is there an issue filed listing the skipped and flaky tests so they could be debugged? Might be something I could work on if I knew what needed to be done.

@decentral1se
Copy link
Member

@KyleMaas thanks, the skipped tests are being tracked on #169 and the flaky tests have still to be identified. I've been avoiding cataloguing so far but if you want to take a run at this, it'd be great. Any failure listed on https://github.com/ssbc/go-ssb/actions in the recent past is most likely a flaky test. I'll open up an issue for this now.

@decentral1se decentral1se mentioned this issue Nov 29, 2022
25 tasks
@decentral1se decentral1se pinned this issue Dec 18, 2022
@stevenroose
Copy link

When I attach Manyverse to go-ssb as a pub, I get the following error in Manyverse's log:

peer @[[pub ID]].ed25519 does not support RPC ebt.replicate

@KyleMaas how do you access Manyverse's debug log?

@KyleMaas
Copy link
Contributor Author

It's been quite a while, so I don't remember for sure, but I believe that was showing up on the console when I ran the Linux Manyverse client from a terminal.

@staltz
Copy link
Member

staltz commented Dec 30, 2022

@stevenroose I got your email:

I'm trying to debug/fix the interaction between Manyverse and the go-ssb based pub server. The go-ssb maintainer has mentioned that the issue might be on their side, but it's really hard to know what's going on on the Manyverse side without debug logging.

Could you maybe take a quick look at the issues I created related to this or just here confirm that Manyverse is expected to work with pub servers? (I have tried to use go-ssb and the ssb-server JS implementation and both without success.)

I'm a Go dev but not really a JS dev anymore. But who knows. If I can pinpoint what's going wrong, I might be able to fix something. Or at least find out what's going on.

There are 2 issues going on:

Manyverse replicating with go-ssb pub

There are 2 replication mechanisms in SSB: createHistoryStream RPC calls, and EBT (Epidemic Broadcast Trees). In JS, the latter is implemented by https://github.com/ssbc/ssb-replicate/blob/master/legacy.js and the former is implemented by https://github.com/ssbc/ssb-ebt . EBT is the modern method, and is vastly better for network performance, see this talk by Dominic Tarr some 5 years ago: https://www.youtube.com/watch?v=GN57bs1eAck Also, ssb-ebt was implemented some 5 years ago as well.

Some months (years?) ago we dropped support for createHistoryStream replication in Manyverse, the primary reason being that createHistoryStream was truly horrible for performance and user experience, and that most apps had EBT already enabled, such as Patchwork and Oasis. So using only EBT in Manyverse worked well since it could replicate with Patchwork and Oasis, and there aren't a lot of other implementations of SSB used in production.

go-ssb began receiving support for EBT during the ssb-ngi-pointer project (2020–2021), but there are still a few rough edges in go-ssb's EBT that need to be fixed before it's good for production. I believe those are the issues you're encountering.

To enable logs in Manyverse to debug EBT, do the following:

  1. git clone manyverse
  2. nvm use 14
  3. npm install
  4. Modify src/backend/ssb.ts to add a config field ebt: {logging: true}, above the line replicationScheduler: {
  5. npm run build-desktop
  6. npm run desktop

Manyverse replicating with ssb-server (JS pub)

That would be https://gitlab.com/staltz/manyverse/-/issues/1824 and I suspect it's because ssb-server is running an outdated version of ssb-ebt (a previous version of ssb-ebt has notoriously been sometimes "stuck" not replicating feeds it should).

@KyleMaas
Copy link
Contributor Author

@staltz

Thanks for the clarification!

@decentral1se
Copy link
Member

@staltz tysm 😌

@stevenroose If you do end up diving in and running into various issues, I'd point you at #237 which is my current focus - trying to understand why various tests are failing and how to fix them. Some of the broken-ness here might overlap, so if you have any insights, please do share.

@decentral1se decentral1se moved this from Need Help to TODO in 🚧 go-ssb maintenance 🚧 Jan 10, 2023
@decentral1se decentral1se changed the title Manyverse cannot seem to replicate EBT replication broken Feb 1, 2023
@decentral1se
Copy link
Member

decentral1se commented Feb 1, 2023

Notes from P2P Basel & chatting with @boreq about how to fix stuff:

  • Couple of commits from https://github.com/planetary-social/ssb/commits/fork made it work again, removing the caching logic, replacing the local frontier logic with "get everything from the social graph", removing the block on retrieving own feed and maybe something else I'm forgetting. those commits seem to be planetary-social@471bad0 planetary-social@c7dc092 planetary-social@05ca91c - FYI I have tried to backport this stuff but tests started breaking and we have alreadya lot of flaky tests, so I backed off. It might be worth a try re-working these commits or else diving deeper to understand the real causes and fixing the code as it was originally intended.

  • Caching logic seems to have an issue which can cause data corruption

  • New peers are somehow not always included in the EBT logic and a work-around was to disconnect everyone and re-connect to make it work

  • "negotiation" may be broken (can't remember exact details)

  • Once scuttlego goes into production, we can learn from the EBT replication implementation https://github.com/planetary-social/scuttlego/tree/main/service/domain/replication/ebt and see how to coordinate. Work is ongoing to make it all work and roll things out Planetary side.

  • EBT docs so far on http://dev.planetary.social/replication 👏

  • The updating of the EBT matrix seems to happen in https://github.com/decentral1se/ssb/blob/master/multilogs/combined.go which is connected with when indexing happens. The update only happens when indexing is triggered? That might be an issue.

  • mix/matt recently were working on a fix in JS ebt implementation which might be relevant to ask about

@boreq
Copy link
Contributor

boreq commented Feb 1, 2023

Caching logic seems to have an issue which can cause data corruption

Yeah, I saw situations where something was wrongly cached and then the logic which loads cached data and determines what to ask for would always decide that we don't need to ask for anything.

New peers are somehow not always included in the EBT logic and a work-around was to disconnect everyone and re-connect to make it work

Basically there is no code that sends new notes when the social graph changes. The notes are sent only only once when the EBT session is being created. They are not updated afterwards if e.g. we follow a new feed and we want to replicate it.

The updating of the EBT matrix seems to happen in https://github.com/decentral1se/ssb/blob/master/multilogs/combined.go which is connected with when indexing happens. The update only happens when indexing is triggered? That might be an issue.

The combined index seems to only update the ebt state (or whatever that was called) when we actually get a message from a particular feed. This means that if that combined index is used for determining which feeds to replicate in ebt logic then we only pull in feeds that we received messages from. I think the edge case was that when starting with an empty repo we would never replicate any feeds using EBTs?

This seems wrong to me as I understand that we want to replicate based on social graph? That is why I completely dropped using that combined index for EBT and use the social graph instead when trying to fix EBTs in go-ssb.

@KyleMaas
Copy link
Contributor Author

KyleMaas commented Feb 1, 2023

Great to see some progress on this!

@boreq As I've discovered with #274, the social graph system is really, really broken. I'm working on trying to find the core issue of that. So far it seems it's not due to a race condition but seems to be stemming from problems with indexing within the builder. Sounds like that may also be helpful for your EBT work.

@decentral1se
Copy link
Member

Related #72

Also %Okyc+tVgyep+1ccI8nUZbBpYiXUvUBgQPOpfnZFRXQQ=.sha256 is a new EBT spec writing effort from @gpicron and friends!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

6 participants