Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant write on EigenDA failure #242

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Inkvi
Copy link
Contributor

@Inkvi Inkvi commented Jan 14, 2025

Changes proposed

Current proxy implementation lacks protection for failures on writes. If EigenDA fails for any reason, we still want to write our commitment to the DA layer if we have caches or fallbacks enabled. Otherwise our DA batches will be postponed until Eigen recovers and might overwhelm the batcher if that period is prolonged.

Since certificate is not available at this point, a keccak of the payload has to be used as commitment.

Note to reviewers

I am aware of your PR to achieve a similar results with Eth DA as backend instead of s3. But that PR is a few months old and there was no traction from OP team.

@samlaf
Copy link
Collaborator

samlaf commented Jan 16, 2025

@Inkvi thanks for this. Bit overwhelmed at the moment but will review asap. Ping me if I forget.

store/manager.go Outdated
Comment on lines 128 to 147
log.Error("Failed to write to EigenDA backend", "err", err)
// write to EigenDA failed, which shouldn't happen if the backend is functioning properly
// use the payload as the key to avoid data loss
if m.secondary.Enabled() && !m.secondary.AsyncWriteEntry() {
redundantErr := m.secondary.HandleRedundantWrites(ctx, value, value)
if redundantErr != nil {
log.Error("Failed to write to redundant backends", "err", redundantErr)
return nil, redundantErr
}

return crypto.Keccak256(value), nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc the idea here is to use secondary backends as primary in the event of a failed dispersal to eigenda. this will create complications to some of our integrations (e.g, in arbitrum x eigenda the commitment is posted/verified against the inbox directly which would now fail with failover). Would prefer if this was an opt-in and feature guarded by some dangerous config flag...

cc @samlaf

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall seems like an approach that could work, but the PR in its current form would require a lot more work:

  1. we would want to revert to a keccak commitment mode when this kind of failover happens, so that the derivation pipeline knows that the failover happens (this would also be a more robust approach than the ad-hoc reading you currently have)
  2. as @epociask said, this failover behavior should be feature guarded
  3. this failover should only happen if the blob's size is <128KiB (to not hit against this size limit in the derivation pipeline)
  4. would need to add some tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epociask thanks for highlighting the integration problems with Arbitrum. I am not familiar with Arbitrum stack and how alt da is handled there. Could you shed more light on the failure mechanism? Are you saying that Arbitrum's inbox contract verifies the EigenDA certificate directly on-chain? That would imply a direct integration between Arbitrum stack and EigenDA which is unlikely in my mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samlaf the last time I looked at the op-batcher (v 1.9.5) it couldn't revert to a different commitment mode and generic commitment mode is used by the batcher whenever da service is specified. If that still holds true, then the blob size limitation of 128kb is not applicable anymore.

Copy link
Collaborator

@ethenotethan ethenotethan Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would imply a direct integration between Arbitrum stack and EigenDA which is unlikely in my mind.

This is what we've done in our Arbitrum fork, feel free to look at the code here where the on-chain cert verification is performed.

If that still holds true, then the blob size limitation of 128kb is not applicable anymore.

In a world with enabled fraud proofs wouldn't you need to one step prove the reading of blob contents from a preimage oracle? Curious how that would work using a hash based commitment scheme where opcode resolution would require uploading the entire pre-image or blob contents. Agree the size limitation is irrelevant for insecure integrations but presents dramatic security implications to stage1-2 rollups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of your arbitrum fork, now it makes total sense why it won't work for a nitro stack.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, the limit is from the stock DAChallenge contract not the preimage oracle? It should be possible to implement a DAChallenge contract that uses a similar mechanism to the large preimage proposal in order to circumvent this limit there too- ofc that doesn't exist atm and would be a significant amount of work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ian-shim! It should theoretically work but do see some complications if the pre-image is very large. E.g EigenDA supports up 16 MiB/s on mainnet. In the worst case this limit could be hit for a single blob dispersal:

16 MiB blob / 128 kb per calldata tx = 131 txs

In these worst case when during a dispute game, you'd have to stream 131 txs to prove the execution of a READPREIMAGE operation. IIUC this could have implications to core game mechanics/constraints (e.g, challenger bond sizes, resource fractionalization between challenge states). Can see this opening up DoS vectors that can be mitigated by either finding some max blob size threshold that ensures dispute safety or rework core challenge params to support this 16-32mib case.

One thing we could also explore is mitigating this within proxy itself; e.g we could update the service to partition blobs > 128kb into "sub-blobs" which are committed to individually using keccak256 hashing and form some multi-commitment da cert. This would require changes in the node software though to process this 1 to many commitment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ethenotethan that makes sense. I like the idea in the last paragraph, this is similar to something I asked about in our shared slack channel- instead of commiting a single cert per tx pack a bunch in a single tx's calldata. Originally was thinking about this as a way to further reduce gas costs but if it could also be used to make a fallback like this work in the fault proof system that would be awesome.

If EigenDA fails for any reason, we still want to write our commitment to the DA layer if we have caches or fallbacks enabled.

Since certificate is not available at this point, a keccak of the payload has to be used as commitment.
@Inkvi Inkvi reopened this Jan 29, 2025
@Inkvi
Copy link
Contributor Author

Inkvi commented Jan 29, 2025

Added units tests and hid the feature behind a flag.

@Inkvi Inkvi requested a review from samlaf January 29, 2025 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants