Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add data availability docs #1995

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions arbitrum-docs/how-arbitrum-works/12-data-availability.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
title: Data Availability
description: 'Learn how data availability works in arbitrum'
author: JasonWan
sme: JasonWan
user_story: As a current or prospective Arbitrum user, I need to learn more about how data availability works on Arbitrum.
content_type: get-started
---

# How Arbitrum data availability works

## What is the general view of Arbitrum data flow?

Arbitrum currently supports two primary data availability mechanisms:

**Rollup Mode:** In this mode, all transaction data is included in either the calldata of transactions submitted to the Ethereum mainnet (the parent chain) or the blobs submitted by the transaction. This inclusion ensures that all data is readily available on-chain for anyone to download and verify.
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved

**Anytrust Mode:** In Anytrust mode, transaction data initially gets submitted to a group of nodes known as the Data Availability Servers (DAS). The DAS stores and distributes the data. Instead of including the entire dataset on-chain, only a cryptographic proof (Data Availability Certificate, or DACert) is submitted to the parent chain. This proof significantly reduces the amount of data stored on-chain, reducing costs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to say Data Availability Committee (DAC) instead? It involves the committee as a whole, instead of the individual servers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'd say "only a cryptographic proof that the data has been stored by the DAC (Data Availability Certificate...)", so it is not confused with other types of proof.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, kind confused here, do you here mean DACert or DAC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my comment is a bit confusing, you're right. I meant that the data is stored by the DAC, and the cryptographic proof is the Data Availability Certificate. So the text would read "only a cryptographic proof that the data has been stored by the DAC (Data Availability Certificate, or DACert)...". I think we've introduced the DAC before, so in this case we are just saying that the data is stored by the DAC, and they return the DACert, which is posted to the parent chain.
Let me know if it's clearer now. If not, maybe we can try to find a different way to describe the process 🙏


Because of those data availability mechanisms, Arbitrum Nitro nodes synchronize their data differently than Ethereum nodes or other layer-one network nodes. While Go-Ethereum nodes utilize a sophisticated P2P network to synchronize with the Ethereum blockchain by discovering other nodes, exchanging data, and participating in the consensus mechanism, Arbitrum nodes diverge from this traditional approach.

However, Arbitrum nodes do **not** primarily rely on a traditional peer-to-peer (P2P) mechanism for syncing their state as many other blockchains do.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this paragraph. It doesn't add value to what has already been said, and it says that it somehow relies on a P2P mechanism (not primarily, but still...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add this paragraph is because I want to say arbitrum sync and process the data by a much more trustless way, but ya, I forget to add this word in, how do you think if we add trustless in this paragraph?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in the paragraph above that compares Ethereum to Arbitrum: "However, Arbitrum nodes do not primarily rely on a traditional peer-to-peer (P2P) mechanism for syncing their state as many other blockchains do."
What do you think?


**Here's how Arbitrum data flow works:**
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved

1. Batching and Submission:
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
1. The Sequencer queues transactions and batches them together.
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
2. These batches get submitted to the parent chain:
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
1. In Anytrust mode, the Sequencer sends the batch to the Data Availability Server (DAS) and then submits the Data Availability Certificate (DACert) which is returned and generated by the DAS to the parent chain.
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
2. In Rollup mode, the Sequencer submits the batch of transactions directly to the sequencer inbox contract on the parent chain. (Blob or calldata directly)
2. Node Synchronization:
1. Upon joining the network, a full node:
1. In Rollup mode, data is read directly from the parent chain calldata or blobs (depending on how the Sequencer posts the data).
2. In Anytrust mode, it reads data from the DACert to verify data availability.
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
2. The node continues to follow this process to catch up with the latest chain height.
3. Once caught up, the node receives updates on new Sequencer-queued messages directly from the Sequencer feed.(We will provide a detailed view of this at the next section)
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
3. Catching Up:
1. If a node falls behind the chain, it reverts to the process described in step 2 to resynchronize with the latest state.

In essence, Arbitrum nodes prioritize data retrieval from the parent chain and rely on the Sequencer for real-time updates, deviating from the traditional P2P synchronization approach used by Ethereum nodes.

## **How full nodes decode the data from the parent chain:**
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved

Arbitrum full nodes decode data received from the parent chain (Ethereum) to update their local state. This process involves monitoring events, parsing data, and Message Processing.
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved

1. Event Querying:
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
1. Full nodes subscribe to the `SequencerBatchDelivered` event emitted by the inbox contract on the parent chain. This event signifies the arrival of a new batch of transactions.
2. Event Parsing:
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
1. Upon receiving the `SequencerBatchDelivered` event, the node parses the event data into a `SequencerInboxBatch` struct. This struct typically includes:
1. `BlockHash`: The hash of the parent chain block containing the batch.
2. `ParentChainBlockNumber`: The block number of the parent chain block.
3. `SequenceNumber`: The sequence number of the batch.
4. `TimeBounds`: Time constraints for the batch.
5. `AfterDelayedAcc`: Accumulator hash after processing delayed messages.
6. `AfterDelayedCount`: Count of delayed messages.
7. `rawLog`: The raw event log data.
3. Data Serialization:
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
1. The `SequencerInboxBatch` struct serializes into a byte array.
2. The serialized data adheres to a specific format:
1. `TimeBounds.MinTimestamp` (8 bytes)
2. `TimeBounds.MaxTimestamp` (8 bytes)
3. `TimeBounds.MinBlockNumber` (8 bytes)
4. `TimeBounds.MaxBlockNumber` (8 bytes)
5. `AfterDelayedCount` (8 bytes)
6. `payload` (variable length)
1. The `payload` field further contains the following:
1. **Type:** Indicates the type of payload (e.g., DAS, blob message).
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
2. **Content:** The actual data associated with the payload type (e.g., DACert, BlobHashes, brotli compressed data).
TucksonDev marked this conversation as resolved.
Show resolved Hide resolved
4. Data Decoding and Retrieval:
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
1. Based on the `payload` type:
1. **DAS Type:** The node queries the Data Availability Service (DAS) to retrieve the raw data.
TucksonDev marked this conversation as resolved.
Show resolved Hide resolved
2. **Blob Message Type:** The node decodes the blob message to obtain the raw data.
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
3. **Brotli Message Type:** No extra steps are needed here; continue to the next step.
2. **Data Decompression:** If the raw data is Brotli-compressed, the node decompresses it. It's worth noting that the raw data we get from above i and ii might also be Brotli-compressed data.
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
5. Message Processing:
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved

1. After decoding and decompressing the data, the node obtains a series of messages.

1. Message Types:

| BatchSegmentKindL2Message | This message will contain raw data on a series of transactions. Usually, this is a single block. |
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
| ------------------------------------ | ------------------------------------------------------------------------------------------------ |
| BatchSegmentKindL2MessageBrotli | The message is the same as the above one, but this is brotli compressed data. |
| BatchSegmentKindDelayedMessages | This message contains a new delayed message read from the parent chain delayed inbox. |
| BatchSegmentKindAdvanceTimestamp | This message will notify STF to advance a second of the timestamp state. |
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
| BatchSegmentKindAdvanceL1BlockNumber | This message will notify STF to advance a new parent chain block number. |

2. **State Transition:** Finally, the State Transition Function (STF) processes these messages, and the STF will follow the rules to execute and update the Arbitrum node's local state.
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved

**Conclusion:**
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved

This process ensures that Arbitrum nodes can trustless sync an accurate view of the chain without trusting other full nodes on the network.

## **How full nodes sync the data from the sequencer feed:**
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved

Once Arbitrum full nodes have caught up with the chain, they switch from initial synchronization to a real-time update mode. This switch involves receiving data from the sequencer feed, which continuously broadcasts updates about newly queued transactions.

1. **Data Acquisition:**
Jason-W123 marked this conversation as resolved.
Show resolved Hide resolved
1. Full nodes maintain a connection to the sequencer feed or your private feed. For how to run a private feed, please refer to [**How to run a feed relay**](https://docs.arbitrum.io/run-arbitrum-node/sequencer/run-feed-relay)
2. The sequencer feed transmits data packets containing information about the latest queued transactions.
2. **Data Decoding**:
1. Full nodes decode the received data packets using the methods described in [How to read the sequencer feed](https://docs.arbitrum.io/run-arbitrum-node/sequencer/read-sequencer-feed).
3. **Message Processing:**

1. After successful decoding, the full nodes obtain the same type of data as outlined in the previous section's Step 5.
2. Send the message to the State Transition Function (STF) and execute.

(This step is the same as the previous section's Step 5)
5 changes: 5 additions & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -871,6 +871,11 @@ const sidebars = {
id: 'how-arbitrum-works/l2-to-l1-messaging',
label: 'L2 to L1 messaging',
},
{
type: 'doc',
id: 'how-arbitrum-works/data-availability',
label: 'Data Availability',
},
{
type: 'link',
href: 'https://github.com/OffchainLabs/nitro/blob/master/docs/Nitro-whitepaper.pdf',
Expand Down
Loading