Poor performance for large allocations in native-lib mode #4357

nia-e · 2025-05-29T18:51:27Z

As of #4343, the benchmark results for huge_allocs are significantly worse (~20x) if the isolated allocator is enabled i.e. in native-lib mode. This could be alleviated by using mmap internally instead, which did not have this issue, but will possibly require overallocating if the requested alignment is greater than the system pagesize.

Todos / open questions:

Why is calling alloc::alloc() so much slower when asking for page-aligned memory?
If it's fixable, bug the relevant people / open a PR to fix this
If it's not fixable, consider switching over to mmaping its memory instead

The text was updated successfully, but these errors were encountered:

RalfJung · 2025-05-29T19:26:25Z

@bjorn3 @lqd @nnethercote do you have any idea why page-aligned multiple-of-page-size allocations are slowing down jemalloc so much?

nnethercote · 2025-05-30T03:59:39Z

@bjorn3 @lqd @nnethercote do you have any idea why page-aligned multiple-of-page-size allocations are slowing down jemalloc so much?

Nope, but I will try summoning @glandium, who has forgotten more about jemalloc than I will ever know, in case he feels like answering a random question... (Hi Mike!)

glandium · 2025-05-30T04:18:33Z

I'm not sure what might be going on here, especially regarding the scale of the mentioned difference. I'm also not that familiar with very recent versions of jemalloc. I would advise looking at profiles (and maybe also look at the difference on different platforms)

If I was to venture a guess, it could be the kernel zeroing fresh pages in the process of those allocations.

(Hey Nick!)

nia-e · 2025-05-31T08:59:30Z

I doubt it's the kernel. mmapping fresh pages (which are definitely zeroed) was near-exactly tied with jemallocing 16-byte-aligned memory, when both were requested as zeroed. The perf hit only appeared when the size was left unchanged but the alignment on jemalloc was upped to being the system pagesize, even hardcoding 4096 in the align field caused the same perf hit. It also seemed to vary a lot between runs; I saw ~8.5x slowdown on some and almost 30x on others, but both mmap and low-alignment jemalloc had very consistent times (+/- 5% or so on any given run)

RalfJung added C-bug Category: This is a bug. I-slow Impact: Makes Miri even slower than it already is A-native Area: calling native functions via FFI labels May 29, 2025

nia-e mentioned this issue May 31, 2025

isolated_alloc: directly use mmap for allocations #4362

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Poor performance for large allocations in native-lib mode #4357

Poor performance for large allocations in native-lib mode #4357

nia-e commented May 29, 2025

RalfJung commented May 29, 2025

Uh oh!

nnethercote commented May 30, 2025

Uh oh!

glandium commented May 30, 2025

Uh oh!

nia-e commented May 31, 2025

Uh oh!

Poor performance for large allocations in native-lib mode #4357

Poor performance for large allocations in native-lib mode #4357

Comments

nia-e commented May 29, 2025

RalfJung commented May 29, 2025

Uh oh!

nnethercote commented May 30, 2025

Uh oh!

glandium commented May 30, 2025

Uh oh!

nia-e commented May 31, 2025

Uh oh!