Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update libfuse3 #252

Open
VHSgunzo opened this issue Mar 17, 2025 · 15 comments
Open

Update libfuse3 #252

VHSgunzo opened this issue Mar 17, 2025 · 15 comments
Assignees
Labels
enhancement New feature or request fixready
Milestone

Comments

@VHSgunzo
Copy link

VHSgunzo commented Mar 17, 2025

Hi, thanks for the new release!)
Is it possible to update the libfuse3 version to get rid of this message?

Image

Ignoring invalid max threads value 4294967295 > max (100000).

It’s probably a libfuse3 issue fixed in 3.14.1

@VHSgunzo VHSgunzo changed the title Update libfuse Update libfuse3 Mar 17, 2025
@mhx
Copy link
Owner

mhx commented Mar 17, 2025

Is it possible to update the libfuse3 version to get rid of this message?

Most likely yes :)

It’s probably a libfuse3 issue fixed in 3.14.1

And as soon as Ubuntu moves past 3.14.0, it'll be fixed automatically.

@eageag
Copy link

eageag commented Mar 20, 2025

_buntu are wrong. a most smart steps are commiting to linux/debian, as its base, not creating their own patches for a sales. they think what they move like crossover/wine, but, finally, will fall with that greed. imo.

@mhx
Copy link
Owner

mhx commented Mar 22, 2025

Is it possible to update the libfuse3 version to get rid of this message?

Most likely yes :)

It’s probably a libfuse3 issue fixed in 3.14.1

Turns out I can build DwarFS on Alpine now. Even static builds seem to work just fine. Alpine has fuse 3.16.2, so this would be an easy fix. (Plus, the binaries are going to be slightly smaller.)

@mhx mhx self-assigned this Mar 22, 2025
@mhx mhx added enhancement New feature or request fixready labels Mar 22, 2025
@mhx mhx added this to the v0.12.0 milestone Mar 22, 2025
@VHSgunzo
Copy link
Author

VHSgunzo commented Mar 22, 2025

@mhx Hi! I would like to point out that musl malloc has terrible performance and this is especially evident in FUSE filesystem projects (e.g. SquashFS) and memory intensive projects. I have been building squashfs utilities and my other projects in alpine (musl) using mimalloc to get almost the same performance as usual static linking with glibc (but still a bit faster with glibc). But linking with mimalloc has some pitfalls, such as poor compatibility of newer versions of mimalloc with older x86_64 processors (psabi v1) and other architectures (aarch64), now I chose version v2.1.7 for my projects.
In the Alpine linux repository mimalloc is compiled with the secure flag, which will also affect performance for the worse.

Also, if you want to reduce the size of the dwarfs-universal static build, you can build all dependencies and dwarfs-universal with these flags:

CFLAGS='-Os -g0 -ffunction-sections -fdata-sections -fvisibility=hidden -fmerge-all-constants'
CXXFLAGS='-Os -g0 -ffunction-sections -fdata-sections -fvisibility=hidden -fmerge-all-constants'
LDFLAGS='-Wl,--gc-sections -Wl,--strip-all'

as I do here.
Measurements of the big application (AppImage with uruntime and DwarFS) launch speed showed no performance degradation.

@mhx
Copy link
Owner

mhx commented Mar 22, 2025

@mhx Hi! I would like to point out that musl malloc has terrible performance and this is especially evident in FUSE filesystem projects (e.g. SquashFS) and memory intensive projects. I have been building squashfs utilities and my other projects in alpine (musl) using mimalloc to get almost the same performance as usual static linking with glibc (but still a bit faster with glibc). But linking with mimalloc has some pitfalls, such as poor compatibility of newer versions of mimalloc with older processors and other architectures, now I chose version v2.1.7 for my projects. In the Alpine linux repository mimalloc is compiled with the secure flag, which will also affect performance for the worse.

The DwarFS static binaries all use jemalloc, so my hope would be that they perform about the same as before. I'll make sure to double-check, though, so thanks for the heads-up!

Also, if you want to reduce the size of the dwarfs-universal static build, you can build all dependencies and dwarfs-universal with these flags:

CFLAGS='-Os -g0 -ffunction-sections -fdata-sections -fvisibility=hidden -fmerge-all-constants'
CXXFLAGS='-Os -g0 -ffunction-sections -fdata-sections -fvisibility=hidden -fmerge-all-constants'
LDFLAGS='-Wl,--gc-sections -Wl,--strip-all'

I'm probably going to skip the -Os, but the rest seem good additions for the static build:

$ ls -l build-clang-static*/universal/dwarfs-universal-upx
-rwxr-xr-x 1 mhx users 4826720 Mar 22 10:40 build-clang-static-vhsgunzo/dwarfs-universal-upx
-rwxr-xr-x 1 mhx users 5183808 Mar 22 10:40 build-clang-static/dwarfs-universal-upx

Actually, I just did a quick benchmark of the FUSE driver part of the universal static binary. Using a DwarFS image with 6.2 GiB of astrophotography images, I ran:

$ time find mnt -type f -print0 | xargs -0 sha512sum

The total run times were pretty much identical, with a slight edge for the Alpine/musl binaries (the vhs one is using your suggested flags, but not -Os):

0.11.2 (ubuntu, glibc)       25.070s
0.11.2+ (alpine, musl)       24.774s
0.11.2+ (alpine, musl, vhs)  24.837s

A second test using an image with 700 MiB in ~75,000 source files and running:

$ time find mnt -type f -print0 | xargs -P16 -n64 -0 sha512sum

Again, very similar results:

0.11.2 (ubuntu, glibc)       2.709
0.11.2+ (alpine, musl)       2.687
0.11.2+ (alpine, musl, vhs)  2.780

@VHSgunzo
Copy link
Author

VHSgunzo commented Mar 22, 2025

I'm probably going to skip the -Os

Does the -Os flag have a bad effect on speed in your test?

you can get the clang -Os glibc+jemalloc build from here: https://github.com/VHSgunzo/dwarfs/releases/tag/v0.11.2

although the difference with the -Os from Alpine is also interesting

@mhx
Copy link
Owner

mhx commented Mar 22, 2025

I'll give -Os a try later :)

@mhx
Copy link
Owner

mhx commented Mar 22, 2025

This was definitely an interesting exercise!

$ hyperfine -L ver build-clang-static,build-clang-static-native,build-clang-static-native-Os,build-clang-static-skylake,build-clang-static-skylake-Os,build-clang-static-vhsgunzo-Os,build-gcc-static,build-gcc-static-native,build-gcc-static-native-Os,build-gcc-static-skylake,build-gcc-static-skylake-Os,build-gcc-static-vhsgunzo-Os '{ver}/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force'
Benchmark 1: build-clang-static/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.472 s ±  0.018 s    [User: 3.917 s, System: 0.629 s]
  Range (min … max):    1.454 s …  1.514 s    10 runs
 
Benchmark 2: build-clang-static-native/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.476 s ±  0.013 s    [User: 3.881 s, System: 0.617 s]
  Range (min … max):    1.462 s …  1.500 s    10 runs
 
Benchmark 3: build-clang-static-native-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.505 s ±  0.014 s    [User: 4.137 s, System: 0.672 s]
  Range (min … max):    1.492 s …  1.539 s    10 runs
 
Benchmark 4: build-clang-static-skylake/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.487 s ±  0.011 s    [User: 3.907 s, System: 0.658 s]
  Range (min … max):    1.472 s …  1.509 s    10 runs
 
Benchmark 5: build-clang-static-skylake-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.495 s ±  0.012 s    [User: 4.012 s, System: 0.633 s]
  Range (min … max):    1.480 s …  1.524 s    10 runs
 
Benchmark 6: build-clang-static-vhsgunzo-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.503 s ±  0.021 s    [User: 4.050 s, System: 0.644 s]
  Range (min … max):    1.473 s …  1.541 s    10 runs
 
Benchmark 7: build-gcc-static/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.505 s ±  0.011 s    [User: 4.165 s, System: 0.690 s]
  Range (min … max):    1.488 s …  1.520 s    10 runs
 
Benchmark 8: build-gcc-static-native/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.643 s ±  0.014 s    [User: 4.259 s, System: 0.672 s]
  Range (min … max):    1.630 s …  1.668 s    10 runs
 
Benchmark 9: build-gcc-static-native-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      3.916 s ±  0.015 s    [User: 9.398 s, System: 0.703 s]
  Range (min … max):    3.898 s …  3.954 s    10 runs
 
Benchmark 10: build-gcc-static-skylake/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.654 s ±  0.013 s    [User: 4.397 s, System: 0.664 s]
  Range (min … max):    1.635 s …  1.681 s    10 runs
 
Benchmark 11: build-gcc-static-skylake-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      3.922 s ±  0.018 s    [User: 8.883 s, System: 0.692 s]
  Range (min … max):    3.883 s …  3.952 s    10 runs
 
Benchmark 12: build-gcc-static-vhsgunzo-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      3.915 s ±  0.020 s    [User: 8.871 s, System: 0.683 s]
  Range (min … max):    3.890 s …  3.950 s    10 runs
 
Summary
  build-clang-static/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force ran
    1.00 ± 0.01 times faster than build-clang-static-native/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    1.01 ± 0.01 times faster than build-clang-static-skylake/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    1.02 ± 0.01 times faster than build-clang-static-skylake-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    1.02 ± 0.02 times faster than build-clang-static-vhsgunzo-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    1.02 ± 0.02 times faster than build-clang-static-native-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    1.02 ± 0.01 times faster than build-gcc-static/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    1.12 ± 0.02 times faster than build-gcc-static-native/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    1.12 ± 0.02 times faster than build-gcc-static-skylake/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    2.66 ± 0.03 times faster than build-gcc-static-vhsgunzo-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    2.66 ± 0.03 times faster than build-gcc-static-native-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    2.66 ± 0.03 times faster than build-gcc-static-skylake-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force

What does this do?

The input data is 20 Perl installations, a total of 26401 files or 678 MiB of data. The configuration for mkdwarfs is such that is really tests the core components: similarity hashing + ordering as well as the segmenter. Compression is turned off and the output goes straight to /dev/null. This is one of my go-to benchmarks.

What do we learn?

I think the most important lesson is: never use -Os with gcc, at least not for mkdwarfs. :) I had no idea performance degradation with -Os was so significant.

Also, there really isn't much point in using -march=. The default configuration with both clang and gcc is always the fastest compared to -march=native and -march=skylake (I chose that one just because I've been using it for years on one of my older machines). I'm quite happy with that result, it means I (likely) won't have to add more CPU specific code.

Finally, optimizing for size with clang only results in a 2% performance hit, so that's definitely worth considering.

Admittedly, gcc's -Os is far better in bringing down binary size, but the cost is unacceptable, at least for mkdwarfs:

$ ll --sort=size build-*/universal/dwarfs-universal
.rwxr-xr-x 12,001,472 mhx users 22 Mar 19:25 build-gcc-static-native-Os/universal/dwarfs-universal
.rwxr-xr-x 12,001,472 mhx users 22 Mar 19:25 build-gcc-static-skylake-Os/universal/dwarfs-universal
.rwxr-xr-x 12,120,256 mhx users 22 Mar 19:25 build-gcc-static-vhsgunzo-Os/universal/dwarfs-universal
.rwxr-xr-x 12,986,832 mhx users 22 Mar 19:25 build-clang-static-native-Os/universal/dwarfs-universal
.rwxr-xr-x 12,990,736 mhx users 22 Mar 19:25 build-clang-static-skylake-Os/universal/dwarfs-universal
.rwxr-xr-x 13,106,512 mhx users 22 Mar 19:25 build-clang-static-vhsgunzo-Os/universal/dwarfs-universal
.rwxr-xr-x 15,145,784 mhx users 22 Mar 19:25 build-clang-static-skylake/universal/dwarfs-universal
.rwxr-xr-x 15,166,584 mhx users 22 Mar 19:25 build-clang-static-native/universal/dwarfs-universal
.rwxr-xr-x 15,200,824 mhx users 22 Mar 19:25 build-clang-static/universal/dwarfs-universal
.rwxr-xr-x 15,695,808 mhx users 22 Mar 19:25 build-gcc-static/universal/dwarfs-universal
.rwxr-xr-x 15,769,536 mhx users 22 Mar 19:25 build-gcc-static-skylake/universal/dwarfs-universal
.rwxr-xr-x 15,773,632 mhx users 22 Mar 19:25 build-gcc-static-native/universal/dwarfs-universal

And the UPX compressed universal binaries:

$ ll --sort=size build-*/universal/dwarfs-universal-upx
.rwxr-xr-x 4,336,752 mhx users 22 Mar 19:25 build-gcc-static-skylake-Os/universal/dwarfs-universal-upx
.rwxr-xr-x 4,339,792 mhx users 22 Mar 19:25 build-gcc-static-native-Os/universal/dwarfs-universal-upx
.rwxr-xr-x 4,353,572 mhx users 22 Mar 19:25 build-gcc-static-vhsgunzo-Os/universal/dwarfs-universal-upx
.rwxr-xr-x 4,579,588 mhx users 22 Mar 19:25 build-clang-static-skylake-Os/universal/dwarfs-universal-upx
.rwxr-xr-x 4,580,440 mhx users 22 Mar 19:25 build-clang-static-native-Os/universal/dwarfs-universal-upx
.rwxr-xr-x 4,595,732 mhx users 22 Mar 19:25 build-clang-static-vhsgunzo-Os/universal/dwarfs-universal-upx
.rwxr-xr-x 5,183,808 mhx users 22 Mar 19:25 build-clang-static/universal/dwarfs-universal-upx
.rwxr-xr-x 5,190,088 mhx users 22 Mar 19:25 build-clang-static-skylake/universal/dwarfs-universal-upx
.rwxr-xr-x 5,196,188 mhx users 22 Mar 19:25 build-clang-static-native/universal/dwarfs-universal-upx
.rwxr-xr-x 5,438,116 mhx users 22 Mar 19:25 build-gcc-static/universal/dwarfs-universal-upx
.rwxr-xr-x 5,488,788 mhx users 22 Mar 19:25 build-gcc-static-skylake/universal/dwarfs-universal-upx
.rwxr-xr-x 5,496,316 mhx users 22 Mar 19:25 build-gcc-static-native/universal/dwarfs-universal-upx

The v0.11.2 release binary (ubuntu, clang) had a size of 5,312,592 bytes. Bringing that down to 4,595,732 seems tempting.

Now the bad news:

$ hyperfine -L ver build-clang-static,build-clang-static-vhsgunzo-Os,dwarfs-0.11.2-Linux-x86_64-clang/bin '{ver}/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force'
Benchmark 1: build-clang-static/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.478 s ±  0.010 s    [User: 3.948 s, System: 0.665 s]
  Range (min … max):    1.460 s …  1.495 s    10 runs
 
Benchmark 2: build-clang-static-vhsgunzo-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.502 s ±  0.016 s    [User: 4.047 s, System: 0.633 s]
  Range (min … max):    1.480 s …  1.533 s    10 runs
 
Benchmark 3: dwarfs-0.11.2-Linux-x86_64-clang/bin/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.363 s ±  0.021 s    [User: 3.992 s, System: 0.934 s]
  Range (min … max):    1.335 s …  1.402 s    10 runs
 
Summary
  dwarfs-0.11.2-Linux-x86_64-clang/bin/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force ran
    1.08 ± 0.02 times faster than build-clang-static/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    1.10 ± 0.02 times faster than build-clang-static-vhsgunzo-Os/mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force

The new binaries are around 10% slower than the release binary built on Ubuntu. I'll dig into where that might be coming from. Interestingly, in terms of raw CPU time, the new binaries actually consume less, but they're still slower.

@mhx
Copy link
Owner

mhx commented Mar 23, 2025

The new binaries are around 10% slower than the release binary built on Ubuntu. I'll dig into where that might be coming from.

It's memcpy.

@VHSgunzo
Copy link
Author

VHSgunzo commented Mar 23, 2025

It's memcpy.

It's interesting! Btw, what if use mimalloc instead of jemalloc?
According to the perf tests that microsoft did, they claim that mimalloc is the fastest. especially when it comes to small, short-lived allocations.

@mhx
Copy link
Owner

mhx commented Mar 23, 2025

It's memcpy.

It's interesting!

Well, it's a bit unsurprising, unfortunately. glibc with its ifunc feature just switches to __memmove_avx_unaligned_erms whereas musl sticks to a relatively plain memcpy implementation.

Btw, what if use mimalloc instead of jemalloc? According to the perf tests that microsoft did, they claim that mimalloc is the fastest. especially when it comes to small, short-lived allocations.

It sucks :)

$ hyperfine -L env ,/usr/lib64/libmimalloc.so,/usr/lib64/libjemalloc.so 'LD_PRELOAD={env} ./mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force'
Benchmark 1: LD_PRELOAD= ./mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.385 s ±  0.014 s    [User: 3.994 s, System: 0.794 s]
  Range (min … max):    1.354 s …  1.404 s    10 runs
 
Benchmark 2: LD_PRELOAD=/usr/lib64/libmimalloc.so ./mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.966 s ±  0.035 s    [User: 4.171 s, System: 0.889 s]
  Range (min … max):    1.901 s …  2.018 s    10 runs
 
Benchmark 3: LD_PRELOAD=/usr/lib64/libjemalloc.so ./mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
  Time (mean ± σ):      1.354 s ±  0.015 s    [User: 3.895 s, System: 0.857 s]
  Range (min … max):    1.333 s …  1.375 s    10 runs
 
Summary
  LD_PRELOAD=/usr/lib64/libjemalloc.so ./mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force ran
    1.02 ± 0.02 times faster than LD_PRELOAD= ./mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force
    1.45 ± 0.03 times faster than LD_PRELOAD=/usr/lib64/libmimalloc.so ./mkdwarfs -i /home/mhx/perl-install-small -o /dev/null -C null -l9 -L4g --no-progress --log-level=error --force

I've build that binary with -DUSE_JEMALLOC=OFF so I can safely LD_PRELOAD the different allocators. Turns out that the glibc malloc has caught up over the years.

I'm actually surprised how bad mimalloc performs here. Maybe I'm doing it wrong, this is just the dev-libs/mimalloc-2.1.9:0/2::gentoo that I already had installed. Maybe it performs better on Windows ;)

@VHSgunzo
Copy link
Author

VHSgunzo commented Mar 23, 2025

Maybe I'm doing it wrong, this is just the dev-libs/mimalloc-2.1.9:0/2::gentoo that I already had installed.

If it is compiled with the secure flag that guard pages, encrypted free lists, etc, then it is clear why there is such a difference) For example, mimalloc from the alpine linux repositories comes by default in this form and the insecure version is stored separately.

@mhx
Copy link
Owner

mhx commented Mar 23, 2025

Maybe I'm doing it wrong, this is just the dev-libs/mimalloc-2.1.9:0/2::gentoo that I already had installed.

If it is compiled with the secure flag that encrypts memory pages, then it is clear why there is such a difference) For example, mimalloc from the alpine linux repositories comes by default in this form and the insecure version is stored separately.

I guess it's not:

# equery uses mimalloc
[ Legend : U - final flag setting for installation]
[        : I - package is installed with flag     ]
[ Colors : set, unset                             ]
 * Found these USE flags for dev-libs/mimalloc-2.1.9:
 U I
 + + abi_x86_32 : 32-bit (x86) libraries
 - - debug      : Enable extra debug codepaths, like asserts and extra output. If you want to get meaningful backtraces see
                  https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces
 - - hardened   : Enable exploit mitigations
 - - test       : Enable dependencies and/or preparations necessary to run tests (usually controlled by FEATURES=test but can be
                  toggled independently)
 - - valgrind   : Enable annotations for accuracy. May slow down runtime slightly. Safe to use even if not currently using
                  dev-debug/valgrind
src_configure() {
        local mycmakeargs=(
                -DMI_DEBUG_FULL=$(usex debug)
                -DMI_SECURE=$(usex hardened)
                -DMI_INSTALL_TOPLEVEL=ON
                -DMI_BUILD_TESTS=$(usex test)
                -DMI_BUILD_OBJECT=OFF
                -DMI_BUILD_STATIC=OFF
                -DMI_TRACK_VALGRIND=$(usex valgrind)
                -DMI_LIBC_MUSL=$(usex elibc_musl)
                # Don't inject -march=XXX
                -DMI_OPT_ARCH=OFF
        )

        cmake-multilib_src_configure
}

@VHSgunzo
Copy link
Author

Then it's very strange)

@VHSgunzo
Copy link
Author

VHSgunzo commented Mar 23, 2025

I have compared the rust project (wrappe) and packaging with it a directory with bash and all the necessary libraries for it including terminfo files (a lot of small files) and here are the results I managed to achieve:

~/Git_Project/VHSgunzo/uruntime/dist $ time WRAPPE=./wrappe-jemalloc lib4bin -z bash-dir    
[ INFO ][2025.03.23 17:59:22]: [ GEN LIB PATH ]: [/home/user/Git_Project/VHSgunzo/uruntime/dist/bash-dir/shared/lib/lib.path]
[ INFO ][2025.03.23 17:59:22]: [ PACKING WITH WRAPPE ]: [/home/user/Git_Project/VHSgunzo/uruntime/dist/bash-dir]
wrappe 1.0.4 (68ae83e)
note: setting console mode is only supported for Windows runners
[1/4] 🔍 counting contents of bash-dir…
[2/4] 📃 writing runner bash for target x86_64-unknown-linux-musl…
[3/4] 🚚 compressing 2940 files and directories…
      💾 7.57MB read, 3.72MB written, 49.20% of original size
      📍 took 1.15s
      ✨ successfully compressed 2940 files and directories
[4/4] 📃 writing startup configuration…
      ✨ done!
WRAPPE=./wrappe-jemalloc lib4bin -z bash-dir  0,69s user 6,46s system 608% cpu 1,174 total

~/Gi/VHSgunzo/uruntime/dist $ time WRAPPE=./wrappe-muslmalloc lib4bin -z bash-dir            
[ INFO ][2025.03.23 17:59:41]: [ GEN LIB PATH ]: [/home/user/Git_Project/VHSgunzo/uruntime/dist/bash-dir/shared/lib/lib.path]
[ INFO ][2025.03.23 17:59:41]: [ PACKING WITH WRAPPE ]: [/home/user/Git_Project/VHSgunzo/uruntime/dist/bash-dir]
wrappe 1.0.4 (68ae83e)
note: setting console mode is only supported for Windows runners
[1/4] 🔍 counting contents of bash-dir…
[2/4] 📃 writing runner bash for target x86_64-unknown-linux-musl…
[3/4] 🚚 compressing 2940 files and directories…
      💾 7.57MB read, 3.72MB written, 49.20% of original size
      📍 took 1.22s
      ✨ successfully compressed 2940 files and directories
[4/4] 📃 writing startup configuration…
      ✨ done!
WRAPPE=./wrappe-muslmalloc lib4bin -z bash-dir  0,76s user 6,54s system 583% cpu 1,251 total

~/Gi/VHSgunzo/uruntime/dist $ time WRAPPE=./wrappe-mimalloc lib4bin -z bash-dir              
[ INFO ][2025.03.23 17:59:52]: [ GEN LIB PATH ]: [/home/user/Git_Project/VHSgunzo/uruntime/dist/bash-dir/shared/lib/lib.path]
[ INFO ][2025.03.23 17:59:52]: [ PACKING WITH WRAPPE ]: [/home/user/Git_Project/VHSgunzo/uruntime/dist/bash-dir]
wrappe 1.0.4 (68ae83e)
note: setting console mode is only supported for Windows runners
[1/4] 🔍 counting contents of bash-dir…
[2/4] 📃 writing runner bash for target x86_64-unknown-linux-musl…
[3/4] 🚚 compressing 2940 files and directories…
      💾 7.57MB read, 3.72MB written, 49.20% of original size
      📍 took 0.23s
      ✨ successfully compressed 2940 files and directories
[4/4] 📃 writing startup configuration…
      ✨ done!
WRAPPE=./wrappe-mimalloc lib4bin -z bash-dir  1,97s user 0,16s system 821% cpu 0,259 total

~/Gi/VHSgunzo/uruntime/dist $ du -sk wrappe-*                                             
1632    wrappe-jemalloc
1460    wrappe-mimalloc
1316    wrappe-muslmalloc

I.e. with mimalloc is several times faster than with jemalloc (jemallocator) and musl malloc (default for x86_64-unknown-linux-musl target)

later, I'll try to rebuild squashfuse with jemalloc to compare it with the mimalloc build too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fixready
Projects
None yet
Development

No branches or pull requests

3 participants