-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cip: Header Pruning for LNs #279
base: main
Are you sure you want to change the base?
Conversation
9ed346d
to
fe1e5cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits mostly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after resolving the parameter question. Given the DA network parameters aren't already in CIPs (or specs) somewhere, I suggest keeping the source of truth in code and removing the parameter from this CIP.
The rationale is that it's way easier to update the parameter in code and there is no risk of the parameter in docs becoming stale with the implementation in code.
Co-authored-by: Josh Stein <[email protected]> Co-authored-by: Rootul P <[email protected]>
@rootulp, rhere is SamplingWindow param that is already in CIP-4. We can tie HeaderPruningWindow to it which is 30 days for simplicity. It should though be defined based of TrustingPeriod which is 14 days(basing on @nashqueue's findings), but that's gonna be a different CIP by someone else(like @cmwaters) that will define the TrustingPeriod and resolve issues in between all the params by adjusting them. So given we have the linkable parameter in the old CIP, I believe it fine to define the new parameter in the CIP. |
Yeah there are already node parameters defined in CIP 4 (and we may look to adjust those soon). These are important parameters so I would also prefer to have them documented here (possibly in the same way we do core/app parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Some nits and a few stupid questions
cips/cip-x.md
Outdated
The estimation of Tail height is done as follows: | ||
|
||
```go | ||
func estimateTail(head, tail Header, blockTime, pruningWindow time.Duration) (height uint64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confusing, why is "now" involved? I would have expected it to be:
numBlockToKeep = pruningWindow / blockTime
newTail = head - numBlocksToKeep
return max(currentTail, newTail) // in case newTail is somehow older than currentTail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This snippet is actually a wrong one. This one is for for finding the range that needs to trimmed off. Thanks for catching that!
The now is used there as the starting point as opposed to the time of the last header. This is inherited from pruning for samples as its how it works currently. However, thinking about this now in the case of header pruning, I think it's reasonable to just cut off basing of the most recent header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed a typo but otherwise lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I realized coming from the actual implementation:
- It should take
trustingPeriod
instead ofheaderPruningWindow
in here. As HPR can be arbitrary increased, it can become bigger then TP and more importantly the unbonding window, which then becomes a security issue due to LRA. See fa9243f - Using
head
in tail estimation actually maketail
requesting dependent onhead
. This is not ideal, because it adds a round trip of delay for the subjective init. You first request Head and then the Tail. This is kinda unavoidable, because we need some header to be time reference to convert time to heights. We could avoid this by hardcoding the time of genesis header or some other header, but that might be a bit too much for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
not sure I follow here. Isn't SamplingWindow (~1month) already bigger than TrustingPeriod (~two weeks)? And we need pruningWindow >= SamplingWindow. I would think the light client syncing should be independent of what's happening here. If the LC hasn't been online in at least TP, it will need to resubjectively initialize, but once it does it can safely go backwards via hash links to fill in older blocks
-
per above, my sense is a light node should first sync, and then decide if it needs to backfill. how is subjective init happening anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't SamplingWindow (~1month) already bigger than TrustingPeriod (~two weeks)?
Yes, but that's exactly where the problem is. The fact that SW and subsequently HPW are bigger then TP imposes risks on newly initializing nodes. If we start syncing from the Tail beyond TP we are in trouble due to LRA and the change here is to prevent that. It ensures the Tail is always withing TP, no matter which HPW is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is subjective init happening anyways?
So the missing context here might be that we do subj init a bit differently compared to other PoS projects. We automate that in a way that I believe it makes the process even less error prone. Usually, users are asked to enter on subj init a trusted hash of a header with validator set they believed to be correct. Instead, we introduce notion of a trusted peer that we request such a header with valset from. This in turn removes the hassle from user to find the correct hash, improving UX and reduces risks of phishing and other rookie mistakes by users pasting wrong hashes. The trusted peers are by default hardcoded infrastructure nodes, that can be altered in configs.
So my point there is that now instead of a single RT, there is gonna be two for two headers. This is okayish, but not ideal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, in that case shoudn't we just get a very recent header from the trusted peer for subjective init, possibly even the latest head, and then calculate what tail we need from that, and sync backwards? syncing backwards doesnt suffer from LR attacks, only syncing forward from something outside the TP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backwards sync is the next stage and something that was on our radar for quite a while. It gonna help us to avoid syncing validator sets and commits which are necessary for forward sync. We also gonna do it with MMR so we can proof inclusion in the past without needing the whole header chain up to that point in the past.
Co-authored-by: Ethan Buchman <[email protected]>
Co-authored-by: Ethan Buchman <[email protected]>
Co-authored-by: Ethan Buchman <[email protected]>
Co-authored-by: Ethan Buchman <[email protected]>
@jcstein, when should we assign the number? |
Co-authored-by: Ethan Buchman <[email protected]>
GH applied the same suggestion from Ethan 5 times as separate commits lol |
@Wondertan after comments are resolved. can you please close the ones above that you've resolved with a link to the commit before resolving? thank you! then we'll assign number and get this merged. just want to make sure @ebuchman's comments are addressed! |
You mean like I did with your review? Ofc |
numBlocksToKeep -> headersToRetain
this should be good after comments are resolved @Wondertan |
I just realized maybe @Wondertan doesn't have permissions to resolve the comments. can you confirm? |
Currently, every data availability (DA) node type synchronizes all historical headers starting from genesis (or other statically configured | ||
historical header) until the subjectively initialized head of the chain. We change that by adding a way to sync a | ||
constant size range of headers instead of the whole history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, every data availability (DA) node type synchronizes all historical headers starting from genesis (or other statically configured | |
historical header) until the subjectively initialized head of the chain. We change that by adding a way to sync a | |
constant size range of headers instead of the whole history. | |
Currently, every data availability (DA) node type synchronizes all historical headers starting from genesis (or other statically configured | |
historical header) until the subjectively initialized head of the chain. We change that by adding a way to sync a | |
constant time range of headers instead of the whole history. |
Was it meant to be fixed time range instead of size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It more length/size instead of time. Constant time usually refers to O(1) algorithms, which is not the case here.
The estimation of Tail is done as follows: | ||
|
||
```go | ||
func estimateTail(head Header, blockTime, window time.Duration) (height uint64) { | ||
headersToRetain := window / blockTime | ||
tail := head.Height() - headersToRetain | ||
return tail | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-posting thought I've shared with you on 1-1:
The current estimation approach relies on a constant blockTime parameter, but in reality, block times can vary. It means that estimated Tail could drift from the intended SamplingWindow timeframe.
Given that HeaderPruningWindow
must be ≥ SamplingWindow
for proper sampling it can lead to potential sampling failures if too few headers are retained.
If the estimation is too conservative (retains more headers than needed), it reduces the storage efficiency benefits
Alternative approach:
Instead of local estimation, consider having nodes request the exact Tail header from trusted peers. The trusted peer could find the header closest to time.now() - SamplingWindow, providing precise alignment with sampling requirements and eliminating estimation errors.
This approach seems more reliable while still achieving the storage optimization goals. Thoughts?
Overview
CIP for Header Pruning. As Header Pruning breaks users, particularly by preventing historical queries(to be fixed in subsequent CIP), it was decided to promote this change to a CIP.
This CIP start a series of DA targeted CIPs to optimize bandwidth and storage usage for LN, particularly targeting overhead brought by headers.