-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redundant write on EigenDA failure #242
base: main
Are you sure you want to change the base?
Conversation
@Inkvi thanks for this. Bit overwhelmed at the moment but will review asap. Ping me if I forget. |
store/manager.go
Outdated
log.Error("Failed to write to EigenDA backend", "err", err) | ||
// write to EigenDA failed, which shouldn't happen if the backend is functioning properly | ||
// use the payload as the key to avoid data loss | ||
if m.secondary.Enabled() && !m.secondary.AsyncWriteEntry() { | ||
redundantErr := m.secondary.HandleRedundantWrites(ctx, value, value) | ||
if redundantErr != nil { | ||
log.Error("Failed to write to redundant backends", "err", redundantErr) | ||
return nil, redundantErr | ||
} | ||
|
||
return crypto.Keccak256(value), nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc the idea here is to use secondary backends as primary in the event of a failed dispersal to eigenda. this will create complications to some of our integrations (e.g, in arbitrum x eigenda the commitment is posted/verified against the inbox directly which would now fail with failover). Would prefer if this was an opt-in and feature guarded by some dangerous
config flag...
cc @samlaf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems like an approach that could work, but the PR in its current form would require a lot more work:
- we would want to revert to a keccak commitment mode when this kind of failover happens, so that the derivation pipeline knows that the failover happens (this would also be a more robust approach than the ad-hoc reading you currently have)
- as @epociask said, this failover behavior should be feature guarded
- this failover should only happen if the blob's size is <128KiB (to not hit against this size limit in the derivation pipeline)
- would need to add some tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epociask thanks for highlighting the integration problems with Arbitrum. I am not familiar with Arbitrum stack and how alt da is handled there. Could you shed more light on the failure mechanism? Are you saying that Arbitrum's inbox contract verifies the EigenDA certificate directly on-chain? That would imply a direct integration between Arbitrum stack and EigenDA which is unlikely in my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samlaf the last time I looked at the op-batcher (v 1.9.5) it couldn't revert to a different commitment mode and generic commitment mode is used by the batcher whenever da service is specified. If that still holds true, then the blob size limitation of 128kb is not applicable anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would imply a direct integration between Arbitrum stack and EigenDA which is unlikely in my mind.
This is what we've done in our Arbitrum fork, feel free to look at the code here where the on-chain cert verification is performed.
If that still holds true, then the blob size limitation of 128kb is not applicable anymore.
In a world with enabled fraud proofs wouldn't you need to one step prove the reading of blob contents from a preimage oracle? Curious how that would work using a hash based commitment scheme where opcode resolution would require uploading the entire pre-image or blob contents. Agree the size limitation is irrelevant for insecure integrations but presents dramatic security implications to stage1-2 rollups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of your arbitrum fork, now it makes total sense why it won't work for a nitro stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ethenotethan is it possible to circumvent the 128kb tx size limit using the large preimage proposal functionality exposed by the preimage oracle? e.g. https://github.com/ethereum-optimism/optimism/blob/develop/packages/contracts-bedrock/src/cannon/PreimageOracle.sol#L530 and https://github.com/ethereum-optimism/optimism/blob/develop/op-challenger/game/fault/preimages/large.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, the limit is from the stock DAChallenge contract not the preimage oracle? It should be possible to implement a DAChallenge contract that uses a similar mechanism to the large preimage proposal in order to circumvent this limit there too- ofc that doesn't exist atm and would be a significant amount of work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ian-shim! It should theoretically work but do see some complications if the pre-image is very large. E.g EigenDA supports up 16 MiB/s on mainnet. In the worst case this limit could be hit for a single blob dispersal:
16 MiB blob / 128 kb per calldata tx = 131 txs
In these worst case when during a dispute game, you'd have to stream 131 txs to prove the execution of a READPREIMAGE
operation. IIUC this could have implications to core game mechanics/constraints (e.g, challenger bond sizes, resource fractionalization between challenge states). Can see this opening up DoS vectors that can be mitigated by either finding some max blob size threshold that ensures dispute safety or rework core challenge params to support this 16-32mib case.
One thing we could also explore is mitigating this within proxy itself; e.g we could update the service to partition blobs > 128kb into "sub-blobs" which are committed to individually using keccak256 hashing and form some multi-commitment da cert. This would require changes in the node software though to process this 1 to many commitment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ethenotethan that makes sense. I like the idea in the last paragraph, this is similar to something I asked about in our shared slack channel- instead of commiting a single cert per tx pack a bunch in a single tx's calldata. Originally was thinking about this as a way to further reduce gas costs but if it could also be used to make a fallback like this work in the fault proof system that would be awesome.
If EigenDA fails for any reason, we still want to write our commitment to the DA layer if we have caches or fallbacks enabled. Since certificate is not available at this point, a keccak of the payload has to be used as commitment.
Added units tests and hid the feature behind a flag. |
Changes proposed
Current proxy implementation lacks protection for failures on writes. If EigenDA fails for any reason, we still want to write our commitment to the DA layer if we have caches or fallbacks enabled. Otherwise our DA batches will be postponed until Eigen recovers and might overwhelm the batcher if that period is prolonged.
Since certificate is not available at this point, a keccak of the payload has to be used as commitment.
Note to reviewers
I am aware of your PR to achieve a similar results with Eth DA as backend instead of s3. But that PR is a few months old and there was no traction from OP team.