Skip to content

scx_rustland_core: Forbid mmap() syscall #1812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 6, 2025
Merged

scx_rustland_core: Forbid mmap() syscall #1812

merged 2 commits into from
May 6, 2025

Conversation

arighi
Copy link
Contributor

@arighi arighi commented May 4, 2025

The user-space schedulers should never perform blocking memory allocations, otherwise the entire scheduling pipeline may get stuck.

To prevent this from happening, scx_rustland_core implements a GlobalAlloc with a custom memory allocator that operates on a pre-allocated locked memory arena and all the memory of the process is automatically locked.

However, external libraries/crates can still execute mmap() syscalls directly (i.e., libc), potentially stalling the scheduler, for example:

R scx_rustland[159] -5016ms
scx_state/flags=3/0x1 dsq_flags=0x0 ops_state/qseq=2/1
sticky/holding_cpu=-1/-1 dsq_id=(n/a)
dsq_vtime=0 slice=0 weight=100
cpus=ff

asm_sysvec_apic_timer_interrupt+0x1a/0x20
mmap_region+0x65/0x140
do_mmap+0x47d/0x620
vm_mmap_pgoff+0xbc/0x1c0
do_syscall_64+0xbb/0x1e0
entry_SYSCALL_64_after_hwframe+0x77/0x7f

To catch these calls introduce a seccomp filter that returns EPERM when the mmap() syscall is invoked.

This doesn't solve the problem, but it allows to catch the code that invokes mmap() and we can exit early without having to wait for the watchdog timeout.

@arighi arighi requested review from htejun, multics69 and hodgesds May 4, 2025 05:53
@arighi
Copy link
Contributor Author

arighi commented May 4, 2025

Looks like nix doesn't like -lseccomp, @JakeHillion any idea how to fix this?

@JakeHillion
Copy link
Contributor

Cherry pick aba0b3d and you should be good

@arighi arighi force-pushed the rustland-forbid-mmap branch from 15fec88 to 82a1c5d Compare May 4, 2025 17:49
The user-space schedulers should never perform blocking memory
allocations, otherwise the entire scheduling pipeline may get stuck.

To prevent this from happening, scx_rustland_core implements a
GlobalAlloc with a custom memory allocator that operates on a
pre-allocated locked memory arena and all the memory of the process is
automatically locked.

However, external libraries/crates can still execute mmap() syscalls
directly (i.e., libc), potentially stalling the scheduler, for example:

  R scx_rustland[159] -5016ms
      scx_state/flags=3/0x1 dsq_flags=0x0 ops_state/qseq=2/1
      sticky/holding_cpu=-1/-1 dsq_id=(n/a)
      dsq_vtime=0 slice=0 weight=100
      cpus=ff

    asm_sysvec_apic_timer_interrupt+0x1a/0x20
    mmap_region+0x65/0x140
    do_mmap+0x47d/0x620
    vm_mmap_pgoff+0xbc/0x1c0
    do_syscall_64+0xbb/0x1e0
    entry_SYSCALL_64_after_hwframe+0x77/0x7f

To catch these calls introduce a seccomp filter that returns EPERM when
the mmap() syscall is invoked.

This doesn't solve the problem, but it allows to catch the code that
invokes mmap() and we can exit early without having to wait for the
watchdog timeout.

Signed-off-by: Andrea Righi <[email protected]>
@arighi arighi force-pushed the rustland-forbid-mmap branch from 82a1c5d to d5237cc Compare May 5, 2025 15:00
Copy link
Contributor

@hodgesds hodgesds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I was wondering if there was a way to enforce the seccomp config on the main thread only. That way other libraries could mmap in separate threads and in theory not interfere with scheduling.

@arighi
Copy link
Contributor Author

arighi commented May 6, 2025

LGTM, I was wondering if there was a way to enforce the seccomp config on the main thread only. That way other libraries could mmap in separate threads and in theory not interfere with scheduling.

Yeah... that was the idea, have the main thread never being blocked in mmap() and create separate threads that can do mmap() and other blocking operations.

Since the seccomp filter is applied in BpfScheduler::init() in theory this logic should work already, as long as you create threads before calling BpfScheduler::init() and those threads should be able to call mmap(). I'll do a test to double check this is the case.

@arighi
Copy link
Contributor Author

arighi commented May 6, 2025

Since the seccomp filter is applied in BpfScheduler::init() in theory this logic should work already, as long as you create threads before calling BpfScheduler::init() and those threads should be able to call mmap(). I'll do a test to double check this is the case.

Right, I confirm that, I just added an explicit call to mmap() in the stats server and everything's fine. If I put the same mmap() call in the main user-space scheduler thread, then the mmap() fails with EPERM.

Basically you need to create all the threads before initializing the scheduler, then they can use mmap().

@arighi arighi added this pull request to the merge queue May 6, 2025
Merged via the queue into main with commit 559cdb5 May 6, 2025
32 checks passed
@arighi arighi deleted the rustland-forbid-mmap branch May 6, 2025 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants