-
Notifications
You must be signed in to change notification settings - Fork 5.7k
BIP draft: BIPs for Utreexo #1923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
9b3eafb
to
a94f643
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some typos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for proposing these drafts. They already look quite complete with respect to the editorial requirements (BIPs 2 and 3). I've done a cursory first pass. No immediate conceptual feedback. A few editorial comments follow; feel free to ignore them during conceptual review until they are applicable.
a94f643
to
cb2993c
Compare
cb2993c
to
d1d0342
Compare
You need to justify why you're using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol. Right now you just link to a paper from 2011. But that paper is out of date now that hardware support for SHA-256 has become common. |
I strongly recommend replacing SHA-256 with SHAKE256 (from the SHA-3 standard) for the following reasons: 1. Security Advantages
2. Comparative Analysis: SHA-256 vs SHAKE256
3. Functional ExampleInput: SHAKE256 (512-bit output): SHAKE256 (256-bit output): 4. Implementation Benefits
5. Technical ReferenceFor detailed cryptographic differences: |
Sure we can update the accumulator BIP with benchmarks for SHA512/256 vs SHA256. But could you link to the aforementioned justifications for the other parts of the Bitcoin protocol that use SHA512? |
SHAKE256 is not used in Bitcoin and introduces a new hash which increases the trust-assumption. We do not want to do this. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Some friendly moderation to keep the discussion focused on technical review -- thanks. |
SHA256 and SHA512 are quantum resistent.
Ok but this has nothing to do with this BIP. |
@1BitcoinBoWP1FZ4xwTNkq6XksKidmgYYw, please cut out the LLM generated comments. If any of us were interested in seeing an LLM’s prediction of what might be said about a topic, we could prompt one ourselves. |
On Mon, Aug 18, 2025 at 04:06:51AM -0700, Calvin Kim wrote:
kcalvinalvin left a comment (bitcoin/bips#1923)
> You need to justify why you're using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol. Right now you just link to a paper from 2011. But that paper is out of date now that hardware support for SHA-256 has become common.
Sure we can update the accumulator BIP with benchmarks for SHA512/256 vs SHA256.
But could you link to the aforementioned justifications for the other parts of the Bitcoin protocol that use SHA512?
No part of the Bitcoin consensus protocol uses SHA512.
|
Ok but you've stated in your previous comment "You need to justify why you're using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol". Would be very helpful to see what type of justifications the other protocols have made. Second, I don't think it matters if SHA512 wasn't used in the Bitcoin consensus protocol. SHA512 is used in BIP32 and the argument that SHA512 is safe for generating private keys but not safe for Bitcoin consensus isn't sound. I think our original justification (better performance with SHA512/256) mentioned in the BIP is sound. Happy to provide the benchmarks, they're being worked on at the moment. |
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
| Name | Type | Description | | ||
| ----------------- | ------------------------ | ----------------------------------------- | | ||
| Utreexo_Tag_V1 | 64 byte array | The version tag to be prepended to the leafhash. | | ||
| Utreexo_Tag_V1 | 64 byte array | The version tag to be prepended to the leafhash. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarification, is the Utreexo_Tag_V1
really used twice in preimage to the hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess would be that this duplication is unintended.
| Name | Type | Description | | |
| ----------------- | ------------------------ | ----------------------------------------- | | |
| Utreexo_Tag_V1 | 64 byte array | The version tag to be prepended to the leafhash. | | |
| Utreexo_Tag_V1 | 64 byte array | The version tag to be prepended to the leafhash. | | |
| Name | Type | Description | | |
| ----------------- | ------------------------ | ----------------------------------------- | | |
| Utreexo_Tag_V1 | 64 byte array | The version tag to be prepended to the leafhash. | |
The question is 1) why are we added one new dependency to consensus implementations, and 2) is this actually a performance increase, given that dedicated SHA256 hardware is becoming common? Length-extension attacks are not relevant for this use-case as we are only committing to public data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at most of the Accumulator Specification for the first helping. Looks very good already. I only reviewed the function definitions up to root_position
, then skimmed the rest, before reading on from Rationale.
Davidson Souza <[email protected]> | ||
Comments-URI: TBD | ||
Status: Draft | ||
Type: Specification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: BIP 2 is still active, so this should be "Standard Track" for the time being.
|
||
## Abstract | ||
|
||
This BIP describes the Utreexo accumulator and it's operations. It lays down how to update the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This BIP describes the Utreexo accumulator and it's operations. It lays down how to update the | |
This BIP describes the Utreexo accumulator and its operations. It lays down how to update the |
To accommodate this, Utreexo changes the storage requirement from the accumulator design in [^1] to $O(log_2(N))$, | ||
where N is the number of elements ever added to the set, while still keeping proof sizes small and verification efficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case this doesn’t get discussed later, it might be interesting to compare how O(log2(N)) for all transaction outputs ever created compare to the current UTXO set size.
|
||
The following utility functions are required for performing accumulator operations: | ||
|
||
**parent_hash(left, right):** Returns the hash of the concatenation of two child hashes (`left` and `right`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this ambiguity regarding the depth of the leaf in the tree not introduce similar weaknesses as the original Merkle tree construction? Why would we float up leaf-hashes rather than create a tagged hash at each level?
Is this fully mitigated due to the number of leaves being known?
return sha512_256(left + right) | ||
``` | ||
|
||
**treerows(numleaves):** Returns the minimum number of bits required to represent `numleaves - 1`. This corresponds to the height of the largest tree in the forest. Returns `0` if `numleaves` is `0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numleaves - 1
throws me off here. It’s not obvious to me, why the function would be defined that way rather than the "minimum number of bits required to represent numleaves
"? Perhaps a bit more context would help?
**parent(position, total_rows):** Returns the parent position of the given `position` in an accumulator with `total_rows` tree rows. | ||
|
||
Implementation: | ||
|
||
```python | ||
def parent(position: int, total_rows: int) -> int: | ||
return (position >> 1) | (1 << total_rows) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have used a little more explanation why this returns the parent, but staring at it for a bit, it seems to me that a fully filled tree with 2n leaves would have 2n-1 inner nodes, meaning that all leaves start with a zero in the first position and all inner nodes starting with a one.
E.g. for four leaves, the leaves are 000, 001, 010, and 011, and the inner nodes would be 100, 101, 110.
For 000 and 001, shifting to the right gives 00 and setting the top bit makes the parent 100. For 010 and 011, it works out to be 101. For 100 and 101, it works out to 110.
Gotcha, cool.
substantial. In RSA-based designs, creating a proof for any given UTXO at arbitrary times can be computationally | ||
intensive, especially as the number of UTXOs grows. | ||
|
||
Utreexo's design is driven by the need for Bridge Nodes: nodes that maintain backward compatibility with existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New jargon is usually italicized on introduction, perhaps consider:
Utreexo's design is driven by the need for Bridge Nodes: nodes that maintain backward compatibility with existing | |
Utreexo's design is driven by the need for *bridge nodes*: nodes that maintain backward compatibility with existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This time I took a look at the "Validation Layer" BIP. Also looks very good already. I noticed that there is no Rationale section, and the title seemed a little less informative than it could be.
``` | ||
BIP: TBD | ||
Layer: Peer Services | ||
Title: Utreexo - Validation Layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title feels a bit odd to me. It could be a bit more descriptive, I was thinking "Utreexo - Transaction and block validation" or smth?
Davidson Souza <[email protected]> | ||
Comments-URI: TBD | ||
Status: Draft | ||
Type: Specification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Until BIP 3 activates, this should be Standards Track.
|
||
This BIP defines the rules for validating blocks and transactions using the | ||
Utreexo accumulator. It is important to note that this BIP does not define the | ||
Utreexo accumulator itself, for that see BIP-????. This document is only concerned with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for the time being:
Utreexo accumulator itself, for that see BIP-????. This document is only concerned with | |
Utreexo accumulator itself, for that see [BIP Utreexo Accumulator](utreexo-accumulator-bip.md). This document is only concerned with |
### Node Hashes | ||
|
||
During a node's normal operation, it will need to compute the leaf hash for UTXOs | ||
being added or removed from the accumulator. The leaf hash is a 32 byte hash that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
being added or removed from the accumulator. The leaf hash is a 32 byte hash that | |
being added or removed from the accumulator. The leaf hash is a 32-byte hash that |
|
||
#### UTXO Hash Preimages | ||
|
||
Individual UTXOs are represented as 32 byte hashes in the Utreexo accumulator. To obtain this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Individual UTXOs are represented as 32 byte hashes in the Utreexo accumulator. To obtain this | |
Individual UTXOs are represented as 32-byte hashes in the Utreexo accumulator. To obtain this |
do not have outputs that overwrites an existing UTXO. | ||
|
||
`BIP-0034` was a rule where the block height was included in the script signature | ||
of the coinbase transaction. One of the reason for the change was to make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, the rest of BIP 34 explains the activation mechanism of BIP 34, so I would claim that this is the main reason.
of the coinbase transaction. One of the reason for the change was to make | |
of the coinbase transaction. The main reason for the change was to make |
random bytes that could be interpreted as block heights. The lowest block | ||
heights are: 209,921, 490,897, and 1,983,702. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random bytes that could be interpreted as block heights. The lowest block | |
heights are: 209,921, 490,897, and 1,983,702. | |
random bytes that could be interpreted as block heights. The lowest implicated block | |
heights are: 209,921, 490,897, and 1,983,702. |
that will probably never actually happen, however. | ||
|
||
Block 1,983,702 is the first block that Utreexo nodes would be in danger of a | ||
consensus failure due to the inability to perform the BIP-0030 checks. However, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consensus failure due to the inability to perform the BIP-0030 checks. However, | |
consensus failure due to the inability to perform the BIP-0030 checks, if someone were to reuse coinbase transaction from block 164,384 . However, |
|
||
### Historical BIP-0030 violations | ||
|
||
There were two UTXOs that were overwritten due to this consensus rule are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not due to this rule, but rather before it was introduced:
There were two UTXOs that were overwritten due to this consensus rule are: | |
There were two UTXOs that were overwritten by repeated transactions: |
accumulator. To be consensus compatible with clients that do have the historical | ||
violations, the leaves representing these two UTXOs in the Utreexo accumulator | ||
are hardcoded as unspendable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I’m understanding this right:
accumulator. To be consensus compatible with clients that do have the historical | |
violations, the leaves representing these two UTXOs in the Utreexo accumulator | |
are hardcoded as unspendable. | |
accumulator. To be consensus compatible with clients that retain only the second | |
occurrences of these outputs, the leaves representing the corresponding first UTXOs in the Utreexo accumulator | |
are hardcoded as unspendable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the whole P2P BIP, although I went over the new messages section a bit more quickly. There are some sections that felt a bit confusing to me, perhaps you could try to take a look at whether you can clarify those for the less initiated. Overall, this seems close to complete, although I noticed that it is missing a Rationale section.
Utreexo nodes require the inclusion proof to fully validate blocks and transactions. | ||
Each block has a corresponding inclusion proof with it and this inclusion proof for blocks up to height 906,937 requires an additional 631.85GB, which is roughly 40GB less than the size of the block data. | ||
Each transaction also has a corresponding inclusion proof with it and for normal transaction relay, the proof is roughly 3 times the size of the transaction. | ||
It's still reasonable for a single node to download this extra data but little caching goes a long way in reducing the amount of data that one has to download. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
little caching ↦ almost no caching
a little caching ↦ some caching
I think you mean the latter:
It's still reasonable for a single node to download this extra data but little caching goes a long way in reducing the amount of data that one has to download. | |
It's still reasonable for a single node to download this extra data but a little caching goes a long way in reducing the amount of data that one has to download. |
CSNs have the goal of minimizing data storage and download while performing block validation. | ||
Archive and bridge nodes store more data and provide this data to CSNs. | ||
|
||
Bridge nodes are nodes that can add inclusion proofs to mempool transactions, support the same set of messages as CSNs, and should in fact be indistinguishable from CSNs on the network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s not clear to me how "bridge nodes should in fact be indistinguishable from CSNs on the network". By whom are they indistinguishable. In what regard are they indistinguishable? Shouldn’t they, e.g., be frequently the first peer to notify about new transactions appearing in the mempool and blocks having been found as they act as the translation layer and therefore the initial source of data for the Utreexo-portion of the node network?
Archive and bridge nodes store more data and provide this data to CSNs. | ||
|
||
Bridge nodes are nodes that can add inclusion proofs to mempool transactions, support the same set of messages as CSNs, and should in fact be indistinguishable from CSNs on the network. | ||
Archive nodes are able to serve the blocks and the inclusion proofs. However, they are not able to generate the inclusion proofs as they do not keep the full UTXO set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "Bridge node" refer to the aspect of whether the node has the UTXO set, and does "archive node" refer to having the full set of data? I.e., are these different dimensions? Would you run an "archive bridge node" if you want to offer all services?
Edit: Oh, never mind, you answer that right below.
|
||
### Pre-P2P: Bridge Building | ||
|
||
When introducing Utreexo into an existing network, there are 2 thing needed before CSNs can operate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When introducing Utreexo into an existing network, there are 2 thing needed before CSNs can operate. | |
When introducing Utreexo into an existing network, there are two things needed before CSNs can operate. |
### Pre-P2P: Bridge Building | ||
|
||
When introducing Utreexo into an existing network, there are 2 thing needed before CSNs can operate. | ||
First, archive nodes need to build proofs for old blocks to serve during the initial-block download (IBD). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, archive nodes need to build proofs for old blocks to serve during the initial-block download (IBD). | |
First, archive nodes need to build proofs for old blocks to serve during the initial block download (IBD). |
With these merkle tree positions for the UTXOs referenced in the inputs, we can calculate the needed positions of the merkle hashes to them. | ||
These positions are then sent over in the `getdata` message as an another inventory vector. | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scrolling up and down through this document, it’s sometimes difficult to tell whether a paragraph belongs to the image before or after the paragraph. Since Markdown does not allow captions on images, it could for example help if either the images included the caption, or if the text were structured in some way that makes it clearer.
 | ||
|
||
Legacy block propagation without Compact Blocks comprises of three steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency: Previously you were referring to non-Utreexo nodes as "current nodes", now it’s "legacy". Please use one term to refer to the same concept across the entire document.
For some script types (e.g. `ScriptHash`, `PubkeyHash`, `WitnessScriptHash`, `WitnessPubkeyHash`) the actual locking condition is not in the scriptPubkey, but a hash of it. | ||
The script which is evaluated is provided as an element of the scriptSig or witness data. | ||
|
||
Therefore, we can safely just omit the locking script hash from the UTXO data and reconstruct it from the witness or scriptSig. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, we can safely just omit the locking script hash from the UTXO data and reconstruct it from the witness or scriptSig. | |
Therefore, we can safely omit the locking script hash from the UTXO data and reconstruct it from the witness or scriptSig. |
|--------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| block height | uint32 | The time-to-live value of a leaf in the Utreexo merkle forest. The value is determined by the amount of leaves that were added to the accumulator since its creation | | ||
| length | varint | The length of the TTLs | | ||
| TTLs | vector of TTL infos | The TTL Info for the UTXOs that are added to the Utreexo merkle forest in blockchain ordering. See [Utreexo - Validation Layer](./utreexo-validation-bip.md#Excluded UTXOs from the accumulator) for the UTXOs that are not added to the Utreexo merkle forest | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| TTLs | vector of TTL infos | The TTL Info for the UTXOs that are added to the Utreexo merkle forest in blockchain ordering. See [Utreexo - Validation Layer](./utreexo-validation-bip.md#Excluded UTXOs from the accumulator) for the UTXOs that are not added to the Utreexo merkle forest | | |
| TTLs | vector of TTL infos | The TTL Info for the UTXOs that are added to the Utreexo merkle forest in blockchain ordering. See [Utreexo - Validation Layer](./utreexo-validation-bip.md#excluded-utxos-from-the-accumulator) for the UTXOs that are not added to the Utreexo merkle forest | |
|
||
Since there's one corresponding leaf data per target location, it's trivial to generate a bitmap for the leafdatas. | ||
|
||
Using the [proof_positions](./utreexo-accumulator-bip.md#Utility Functions) function, it's possible to generate the positions of the needed proof hashes for a given set of targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the [proof_positions](./utreexo-accumulator-bip.md#Utility Functions) function, it's possible to generate the positions of the needed proof hashes for a given set of targets. | |
Using the [proof_positions](./utreexo-accumulator-bip.md#utility-functions) function, it's possible to generate the positions of the needed proof hashes for a given set of targets. |
After discussion amongst the editors, we've assigned 181-183 for these 3 BIP drafts. @murchandamus suggested 181 Accumulator / 182 Validation / 183 P2P (I agree) while leaving it up to you. |
Whenever you get around to it, please add the numbers to the Preambles, set the "Created" header to 2025-08-29 (it holds the date a BIP got numbered), and add the table entries to the README.mediawiki. |
Currently going through all the reviews and writing up the rationale for validation and p2p. Will address these as well. |
These are the 3 BIPs that describe Utreexo, a consensus-compatible (non-soft fork) way to send and verify transactions without storing the full UTXO set.
The 3 BIPs are for:
Mailing list post: https://groups.google.com/g/bitcoindev/c/W1lxBraKG_E