Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It crashes with medium/large/big repo #83

Open
perettigiuliano opened this issue May 8, 2024 · 9 comments
Open

It crashes with medium/large/big repo #83

perettigiuliano opened this issue May 8, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@perettigiuliano
Copy link

Hi, I have just tried with this repo: Switft from apple
git-graph crashes:
image

Could you fix that?

Thanks

@perettigiuliano
Copy link
Author

@mlange-42
Do you think that you can do something to prevent git-graph to crash on such a projects?

@mlange-42
Copy link
Owner

Hi @perettigiuliano! Thank you for reporting! Unfortunately, I did not find the time to look into this yet. As I have no idea so far what exactly causes the crash, it is hard to judge. But surely this can be fixed somehow.

@mlange-42 mlange-42 added the bug Something isn't working label May 27, 2024
@mlange-42
Copy link
Owner

@perettigiuliano As a temporary workaround, use the -n option to limit the number of commits to show.

It looks like this is not really a bug. The number of commits is just too big to fit their internal representation into the memory. Not sure how to continue here. Maybe a default limit to the number of commits would be an option.

@perettigiuliano
Copy link
Author

perettigiuliano commented Jun 2, 2024

@mlange-42
Thank you for the answer, here it is the last wmanil that I have received from support of "Fork" application, after mine asking them to have in their application a graph like the one from yours.

Giuliano,

Thank you for trying!

That repo is insane, I have never seen a repo like that!!!

Swift is just a normal repo, it's not *that* big. There are many repos of a similar size, VSCode (https://github.com/microsoft/vscode.git), for example.

Just as a reference, the Chromium repo IS insane :).


Graph drawing is a very difficult and complex problem (especially when we talk about production quality).

What we have in Fork now is a compromise between readability and performance.

It's great though, that there is interest in improving the graph visualization 👍. I myself spent a lot of time trying to improve it and honestly failed :(.

Has "git-graph" some memory allocation problem? Is it related to a (or more) bug or to the logic implied in a graphic like that?

It's not related to the memory allocation, I think. It's more O(n) problem.

Some hot points I noticed:
- git graph is a DAG, which means commits only have pointers to *parents*. Traversing commit *children* contradicts git design. And as result, calculating children for each commit requires full graph traversing, which will likely be bottleneck on large repos.
- many repos have a lot of tags (> 10K in some repos) and their number *always* grow. Taking them into account is very-very expensive.
- full traversing makes impossible lazy loading. For example, it's impossible to show first 100 commits of 'swift' in a reasonable time

Generally speaking, I'm not sure if git references (both branches and tags) should be part of the graph visualization algorithm. Branches in git are not graph 'edges', instead they are graph 'tips'. So, again, handling them as 'edges' contradicts git design and will likely be a bottleneck.

Dan

Maybe there something that could serve as suggestion...
Surely a limit, as I saw the same request is coming also from other issues. could be a work around, something like taking only from the moment of actual branch creation on,

@Nahor
Copy link

Nahor commented Sep 3, 2024

First, the screenshot in the OP shows Rust trying to allocate 10GB of RAM, which is unlikely to be normal.
I tried myself, and I noticed that git-graph's memory usage jump to 12G in a few seconds, followed by a steady/linear growth to 33G over the following 60s. So I wouldn't be surprise if that jump to a total 12G is caused by a 10G allocation as well.

The other thing to notice is the total memory usage. The peak was 33G while parsing the repo, then dropped to "only" 22G while rendering the graph.
I tried on a smaller repo (git itself). There the memory usage peaked at 20G during parsing and steadied at 13G during rendering.

Overall though, that memory usage feels quite excessive.

@Nahor
Copy link

Nahor commented Sep 3, 2024

The issue is with print_unicode.

Here are some heap profiling done over the git repo:

  • Memory usage:
    image
  • Flamegraph:
    image
  • For comparison, memory usage when using the SVG output:
    image
    =>Also note the duration of the graph (1min12s for the default output vs 3.5s for SVG)

@mlange-42
Copy link
Owner

@Nahor Many thanks for investigating and sharing your insights!

First, it is definitely good to see that the problem seems not related to parsing or layouting the graph.

Regarding the differnences between unicode and SVG output, I can hardly imagine that the final string memory required is (that much) bigger for unicode, even including color escape codes. SVG should definitely be longer. So I guess this is a kind of bug. If not, rendering only the required range could be an option for paged output, or writing directly to a stream for unpaged output.

@Nahor
Copy link

Nahor commented Sep 4, 2024

Disclaimer: All tests today were run on a git+git_for_windows repo. Yesterday, I was using a pure git repo, so don't compare the numbers


I can hardly imagine that the final string memory required is (that much) bigger for unicode, even including color escape codes. SVG should definitely be longer.

Unicode is significantly bigger than SVG for that repo: 530M for Unicode (no-wrap output) vs 33M for SVG. I assume the reason for being bigger because the lines must be drawn character-by-characters + commit text&alignment, vs just paths elements without commit text (bug?) in SVG.


One reason for the slow down is the number of branches and auto-wrapping. That number is so big that auto-wrapping uses a width of 1, which results is 21G worth of Unicode output (vs 530M without wrapping). I think there is no point of using 1 as the minimum width (at the very least not for auto), this is unusable.

  • Using --wrap 120 0 8 --color never, it takes 2.6s to see the first page. It also uses only a peak of 950M of RAM.
  • Using --wrap auto 0 8 --color never, it takes 24s, and peaked at 24G
  • Using --wrap 1 0 8 --color never, it takes 24s, and peaked at 24G (same as auto)
  • Using --wrap 40 0 8 --color never, it takes 3.3s, and peaked at 1.4G (more reasonable minimum width)

Coloring is also significant, if not more:

  • Using --wrap 120 0 8 --color always, it takes 4.3s and peaked at 1.5G
  • Using --wrap 40 0 8 --color always, it takes 6.3s, and peaked at 2.4G
  • Using --wrap auto 0 8 --color always, it takes 1min30s, and peaked at 46G
  • with --wrap none, the output is 530M (color) vs 170M (no color)

@Nahor
Copy link

Nahor commented Sep 5, 2024

One last detail then I'll stop:

  • ~90% of print_unicode is actually print_graph
  • ~33% of print_graph is this and ~66% is that (and some peanuts)

Since those are dealing only with strings and chars, a partial solution might be to avoid using std::fmt and append directly to g_out (it depends how well Rust can optimize away the std::fmt)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants