Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cip: Header Pruning for LNs #279

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

Wondertan
Copy link
Member

@Wondertan Wondertan commented Mar 21, 2025

Overview

CIP for Header Pruning. As Header Pruning breaks users, particularly by preventing historical queries(to be fixed in subsequent CIP), it was decided to promote this change to a CIP.

This CIP start a series of DA targeted CIPs to optimize bandwidth and storage usage for LN, particularly targeting overhead brought by headers.

Copy link
Member

@jcstein jcstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits mostly

Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after resolving the parameter question. Given the DA network parameters aren't already in CIPs (or specs) somewhere, I suggest keeping the source of truth in code and removing the parameter from this CIP.

The rationale is that it's way easier to update the parameter in code and there is no risk of the parameter in docs becoming stale with the implementation in code.

Wondertan and others added 2 commits March 23, 2025 16:50
@Wondertan
Copy link
Member Author

Given the DA network parameters aren't already in CIPs (or specs) somewhere, I suggest keeping the source of truth in code and removing the parameter from this CIP.

@rootulp, rhere is SamplingWindow param that is already in CIP-4. We can tie HeaderPruningWindow to it which is 30 days for simplicity. It should though be defined based of TrustingPeriod which is 14 days(basing on @nashqueue's findings), but that's gonna be a different CIP by someone else(like @cmwaters) that will define the TrustingPeriod and resolve issues in between all the params by adjusting them.

So given we have the linkable parameter in the old CIP, I believe it fine to define the new parameter in the CIP.

@cmwaters
Copy link
Contributor

Yeah there are already node parameters defined in CIP 4 (and we may look to adjust those soon). These are important parameters so I would also prefer to have them documented here (possibly in the same way we do core/app parameters)

Copy link
Member

@jcstein jcstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Some nits and a few stupid questions

cips/cip-x.md Outdated
The estimation of Tail height is done as follows:

```go
func estimateTail(head, tail Header, blockTime, pruningWindow time.Duration) (height uint64) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing, why is "now" involved? I would have expected it to be:

numBlockToKeep = pruningWindow / blockTime
newTail = head - numBlocksToKeep
return max(currentTail, newTail) // in case newTail is somehow older than currentTail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snippet is actually a wrong one. This one is for for finding the range that needs to trimmed off. Thanks for catching that!

The now is used there as the starting point as opposed to the time of the last header. This is inherited from pruning for samples as its how it works currently. However, thinking about this now in the case of header pruning, I think it's reasonable to just cut off basing of the most recent header.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed a typo but otherwise lgtm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I realized coming from the actual implementation:

  • It should take trustingPeriod instead of headerPruningWindow in here. As HPR can be arbitrary increased, it can become bigger then TP and more importantly the unbonding window, which then becomes a security issue due to LRA. See fa9243f
  • Using head in tail estimation actually make tail requesting dependent on head. This is not ideal, because it adds a round trip of delay for the subjective init. You first request Head and then the Tail. This is kinda unavoidable, because we need some header to be time reference to convert time to heights. We could avoid this by hardcoding the time of genesis header or some other header, but that might be a bit too much for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. not sure I follow here. Isn't SamplingWindow (~1month) already bigger than TrustingPeriod (~two weeks)? And we need pruningWindow >= SamplingWindow. I would think the light client syncing should be independent of what's happening here. If the LC hasn't been online in at least TP, it will need to resubjectively initialize, but once it does it can safely go backwards via hash links to fill in older blocks

  2. per above, my sense is a light node should first sync, and then decide if it needs to backfill. how is subjective init happening anyways?

Copy link
Member Author

@Wondertan Wondertan Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't SamplingWindow (~1month) already bigger than TrustingPeriod (~two weeks)?

Yes, but that's exactly where the problem is. The fact that SW and subsequently HPW are bigger then TP imposes risks on newly initializing nodes. If we start syncing from the Tail beyond TP we are in trouble due to LRA and the change here is to prevent that. It ensures the Tail is always withing TP, no matter which HPW is set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is subjective init happening anyways?

So the missing context here might be that we do subj init a bit differently compared to other PoS projects. We automate that in a way that I believe it makes the process even less error prone. Usually, users are asked to enter on subj init a trusted hash of a header with validator set they believed to be correct. Instead, we introduce notion of a trusted peer that we request such a header with valset from. This in turn removes the hassle from user to find the correct hash, improving UX and reduces risks of phishing and other rookie mistakes by users pasting wrong hashes. The trusted peers are by default hardcoded infrastructure nodes, that can be altered in configs.

So my point there is that now instead of a single RT, there is gonna be two for two headers. This is okayish, but not ideal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, in that case shoudn't we just get a very recent header from the trusted peer for subjective init, possibly even the latest head, and then calculate what tail we need from that, and sync backwards? syncing backwards doesnt suffer from LR attacks, only syncing forward from something outside the TP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backwards sync is the next stage and something that was on our radar for quite a while. It gonna help us to avoid syncing validator sets and commits which are necessary for forward sync. We also gonna do it with MMR so we can proof inclusion in the past without needing the whole header chain up to that point in the past.

Wondertan and others added 2 commits March 26, 2025 18:08
Co-authored-by: Ethan Buchman <[email protected]>
Co-authored-by: Ethan Buchman <[email protected]>
Wondertan and others added 2 commits March 26, 2025 18:09
Co-authored-by: Ethan Buchman <[email protected]>
Co-authored-by: Ethan Buchman <[email protected]>
@Wondertan
Copy link
Member Author

Wondertan commented Mar 26, 2025

@jcstein, when should we assign the number?

Co-authored-by: Ethan Buchman <[email protected]>
@Wondertan
Copy link
Member Author

GH applied the same suggestion from Ethan 5 times as separate commits lol

@jcstein
Copy link
Member

jcstein commented Mar 27, 2025

@Wondertan after comments are resolved. can you please close the ones above that you've resolved with a link to the commit before resolving? thank you!

then we'll assign number and get this merged. just want to make sure @ebuchman's comments are addressed!

@Wondertan
Copy link
Member Author

can you please close the ones above that you've resolved with a link to the commit before resolving?

You mean like I did with your review? Ofc

numBlocksToKeep -> headersToRetain
@jcstein jcstein self-requested a review March 31, 2025 14:30
@jcstein
Copy link
Member

jcstein commented Mar 31, 2025

this should be good after comments are resolved @Wondertan

@jcstein
Copy link
Member

jcstein commented Apr 1, 2025

I just realized maybe @Wondertan doesn't have permissions to resolve the comments. can you confirm?

@jcstein jcstein self-requested a review April 1, 2025 18:46
Comment on lines +14 to +16
Currently, every data availability (DA) node type synchronizes all historical headers starting from genesis (or other statically configured
historical header) until the subjectively initialized head of the chain. We change that by adding a way to sync a
constant size range of headers instead of the whole history.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Currently, every data availability (DA) node type synchronizes all historical headers starting from genesis (or other statically configured
historical header) until the subjectively initialized head of the chain. We change that by adding a way to sync a
constant size range of headers instead of the whole history.
Currently, every data availability (DA) node type synchronizes all historical headers starting from genesis (or other statically configured
historical header) until the subjectively initialized head of the chain. We change that by adding a way to sync a
constant time range of headers instead of the whole history.

Was it meant to be fixed time range instead of size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It more length/size instead of time. Constant time usually refers to O(1) algorithms, which is not the case here.

Comment on lines +39 to +47
The estimation of Tail is done as follows:

```go
func estimateTail(head Header, blockTime, window time.Duration) (height uint64) {
headersToRetain := window / blockTime
tail := head.Height() - headersToRetain
return tail
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-posting thought I've shared with you on 1-1:
The current estimation approach relies on a constant blockTime parameter, but in reality, block times can vary. It means that estimated Tail could drift from the intended SamplingWindow timeframe.
Given that HeaderPruningWindow must be ≥ SamplingWindow for proper sampling it can lead to potential sampling failures if too few headers are retained.
If the estimation is too conservative (retains more headers than needed), it reduces the storage efficiency benefits

Alternative approach:
Instead of local estimation, consider having nodes request the exact Tail header from trusted peers. The trusted peer could find the header closest to time.now() - SamplingWindow, providing precise alignment with sampling requirements and eliminating estimation errors.

This approach seems more reliable while still achieving the storage optimization goals. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants