Skip to content

Askrene: prune and cap #8332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Lagrang3
Copy link
Collaborator

@Lagrang3 Lagrang3 commented Jun 9, 2025

This PR depends on #8299 and supersedes #8314.

With this PR I want to make a small improvement to the MCF solver in askrene.

First: I would like to constrain the number of flow units to 1000 by setting the accuracy of the solver
to the total payment amount divided by 1000. Some MCF algorithms like "successive shortest path" (SSP)
have theoretical complexity bounds that depend on that number.
Note: the 1000 number is arbitrary, the smaller it is we may reduce the solver's runtime but we lose
a accuracy.

Second: I would like to prune the set of arcs in the network. I can achieve this by setting a limit to the sum
of the arc capacities that correspond to the same channel to U, the maximum number of flow units
in the payment. Notice that due to the piece-wise linearization of the channel
cost function, one channel becomes several arcs in the MCF network, therefore we can discard the higher cost
arcs of a channel linearization if the lower cost arcs already sum up to U in flow capacity.

@Lagrang3 Lagrang3 requested a review from cdecker as a code owner June 9, 2025 15:19
@Lagrang3 Lagrang3 mentioned this pull request Jun 17, 2025
4 tasks
@Lagrang3 Lagrang3 force-pushed the askrene-prune-and-cap branch from f866e48 to b7f1ae2 Compare July 8, 2025 16:09
@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jul 8, 2025

Rebased on top of changes to #8299

@Lagrang3 Lagrang3 force-pushed the askrene-prune-and-cap branch from b7f1ae2 to 18f0bec Compare July 9, 2025 07:05
* */
params->accuracy = amount_msat_max(
AMOUNT_MSAT(1), amount_msat_div_ceil(amount, 1000000));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1M precision here is arbitrary, it could be any number.
And I think 1M is way too much, in fact I will show that we get a
negligible runtime improvement with 1M as opposed with other values (eg. 1000, 100, 10).

@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jul 9, 2025

I made some tests runs on the same gossmap we have for reference in the tests gossip-store-2024-09-22.compressed.
I have selected a population of "big nodes" that correspond to the 65% to 90% quantile range of sorted nodes
by their total channel capacity. And for those nodes I randomly select 1000 ordered pairs and called getroutes.
I only considered valid pairs of nodes those for which there is enough capacity to send and receive the payment
amount.

failrate
feerate

I notice that the fail rate is very high even for not so big payment amounts. Does this mean that most of the "big nodes"
are actually badly connected to the rest of the network?

@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jul 9, 2025

Probability cost for precision=1
probability-big

Probability cost for precision=10
probability-big

Probability cost for precision=100
probability-big

Probability cost for precision=1000
probability-big

Probability cost for precision=1M
probability-big

@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jul 9, 2025

Runtime distribution for precision=1
runtime-big

Runtime distribution for precision=10
runtime-big

Runtime distribution for precision=100
runtime-big

Runtime distribution for precision=1000
runtime-big

Runtime distribution for precision=1M
runtime-big

For 1M the improvement in runtime are exclusively due to the pruning.

@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jul 9, 2025

worsttime

@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jul 9, 2025

overtime

@Lagrang3 Lagrang3 force-pushed the askrene-prune-and-cap branch 3 times, most recently from 5d72855 to 1b5e87a Compare July 9, 2025 13:27
Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit:

Can you split 1b5e87a into the "prune small channels" change, and the "dynamic granularity based on < 1000 flows assumption" (which, FWIW, is a LOT of flows!).

They're logically separate improvements...

@rustyrussell
Copy link
Contributor

Hmm, so this is a lot of data. A summary of this in the commit message itself would be invaluable: Remember, GitHub is transitory, and will one day stop hosting us for free: git commits are forever.

You graph precision amounts, but then if I understand correctly, your precision varies on the total payment amount, so these graphs don't exactly apply to our overall performance?

@rustyrussell
Copy link
Contributor

I notice that the fail rate is very high even for not so big payment amounts. Does this mean that most of the "big nodes" are actually badly connected to the rest of the network?

That, or our canned gossip store is missing data? That is quite possible, and worth investigating with a more modern node where there have been propagation fixes will give a different result. However, it's still useful to have a standard corpus, of course.

@Lagrang3
Copy link
Collaborator Author

Minor nit:

Can you split 1b5e87a into the "prune small channels" change, and the "dynamic granularity based on < 1000 flows assumption" (which, FWIW, is a LOT of flows!).

They're logically separate improvements...

It isn't about 1000 flows, it is the number of units in which we divide the payment amount.
For example the final solution could be two flows: 1/1000 and 999/1000

@Lagrang3
Copy link
Collaborator Author

Minor nit from me as well: I have realized that CI doesn't test for SLOW_MACHINE=1, therefore some of the
checks in test_real_data are a bit off.
I'm fixing that soon.

@Lagrang3
Copy link
Collaborator Author

We also need a better commit message explaining what we are actually pruning.
Not that we are pruning small channels but arcs that we can prove are never going to be needed
in the optimal MCF solution, why? because there are other arcs with lower cost between the same
pair of nodes that combined have a capacity greater or equal the total flow amount.

@rustyrussell
Copy link
Contributor

rustyrussell commented Jul 11, 2025

We also need a better commit message explaining what we are actually pruning. Not that we are pruning small channels but arcs that we can prove are never going to be needed in the optimal MCF solution, why? because there are other arcs with lower cost between the same pair of nodes that combined have a capacity greater or equal the total flow amount.

Yes, that is subtle. I would love to see you paste exactly that sentence into the source code!

Commit messages are about changes, comments are about existing code. Sometimes these are similar:

Commit message: ... speeds up these cases by N% [table].

Comment: /* If there are other args with lower cost between the same pair of nodes with combined capacity to carry the entire flow, we can remove them from consideration. For small payments, this can be around X% of arcs! */

Lagrang3 added 6 commits July 11, 2025 10:31
Refactor MCF solver: remove structs linear_network and residual_network.
Prefer passing raw data to the helper functions.

Changelog-None

Signed-off-by: Lagrang3 <[email protected]>
The single path solver uses the same probability cost and fee cost
estimation of minflow. Single path routes computed this way are
suboptimal with respect to the MCF solution but still are optimal among
any other single path. Computationally is way faster than MCF, therefore
for some trivial payments it should be prefered.

Changelog-None.

Signed-off-by: Lagrang3 <[email protected]>
Changelog-Added: askrene: an optimal single-path solver has been added, it can be called using the developer option --dev_algorithm=single-path or by adding the layer "auto.no_mpp_support"

Signed-off-by: Lagrang3 <[email protected]>
From the multiple arcs that derive from the same channel we consider
only those with the smallest cost such that the payment amount and HTLC
max can fit in their combined capacity, ie. we prune high cost arcs that
surely will never be used by the optimal solution.

This reduces the number of arcs in the graph approximately from 8 arcs
per channel to approximately 2 arcs per channel.

No pruning.
amount:		100 	1000 	10000 	100000 	1000000
channels:	104741	106163	106607	106654	106666
arcs:		837928	849304	852856	853232	853328

Prune, limit the channel capacity by its HTLC max
amount:		100 	1000 	10000 	100000 	1000000
channels:	104741	106163	106607	106654	106666
arcs:		255502	259314	260538	260676	260704

Prune, limit the channel capacity to the payment amount
amount:		100 	1000 	10000 	100000 	1000000
channels:	104741	106163	106607	106654	106666
arcs:		209482	216270	228618	295450	432468

Prune, limit the channel capacity to the payment amount and its HTLC max
amount:		100 	1000 	10000 	100000 	1000000
channels:	104741	106163	106607	106654	106666
arcs:		209480	212324	213242	215726	228018

This produces a slight speedup for MCF computations:

Amount (sats) | speedup
-----------------------
          100 | 1.89
         1000 | 1.77
        10000 | 1.25
       100000 | 1.25
      1000000 | 1.18

Changelog-None

Signed-off-by: Lagrang3 <[email protected]>
@Lagrang3 Lagrang3 force-pushed the askrene-prune-and-cap branch from 1b5e87a to 060e870 Compare July 15, 2025 11:57
@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jul 15, 2025

  • split into three commits,
  • added some benchmarks results to the commit description,
  • added comment in the code better explaining what is pruned,
  • rebased into Askrene single path solver #8299,
  • and fixed the pyln tests instances for every commit.

Speed in getroutes up by setting the granularity to 1000

Amount (sats) | speedup
-----------------------
          100 | 1.00
         1000 | 1.00
        10000 | 1.06
       100000 | 1.31
      1000000 | 2.64

Worst runtime of getroutes

Amount (sats) | before (ms) | after (ms)
--------------------------------------
          100 | 1507        | 761
         1000 | 2129        | 1214
        10000 | 1632        | 1043
       100000 | 2004        | 1150
      1000000 | 27170       | 3289

Changelog-None

Signed-off-by: Lagrang3 <[email protected]>
@Lagrang3 Lagrang3 force-pushed the askrene-prune-and-cap branch from 060e870 to 7c49092 Compare July 15, 2025 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants