-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hugepages support for Firecracker #4360
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #4360 +/- ##
=======================================
Coverage 81.37% 81.37%
=======================================
Files 243 243
Lines 29431 29518 +87
=======================================
+ Hits 23949 24021 +72
- Misses 5482 5497 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
ac53626
to
c50f991
Compare
@avirtuos tagging you in here for visibility. GitHub is not letting me explicitly add you as a review for some reason. This PR is where the initial huge pages work will take place :) |
e8ee021
to
c34853c
Compare
f1f946a
to
4622838
Compare
I would mark most of them as |
Yes, as you mention, we do not enforce it, so we also should not insist on it on reviews, as its up to the author. I prefer to not put them outside of very unambiguous cases (e.g. doc commits). |
This field allows specifying whether guest memory for this microVM should be backed by regular, 4K, pages or 2M hugetlbfs pages. Configuration fails if guest memory size is not a multiple of selected page size. Signed-off-by: Patrick Roy <[email protected]>
Mark it as developer preview in case not everything gets worked out before 1.7. Signed-off-by: Patrick Roy <[email protected]>
Update the memory allocation code to be able to utilize hugetlbfs, and pass the corresponding arguments through from the api. Currently snapshots always restore on 4K pages, as huge page configuration is not yet saved in the vmstate file. Signed-off-by: Patrick Roy <[email protected]>
We store the huge pages configuration in the snapshot's vmstate file, and enforce that a snapshot gets restored with the same hugepages configuration with which it was taken (for simplicity reasons). Signed-off-by: Patrick Roy <[email protected]>
Support for memfd_create with `MFD_HUGETLB | MFD_ALLOW_SEALING` was only added in 4.16, so trying to use hugetlbfs backed guest memory on 4.14 host will fail [1]. Make the error message shown to the customer a bit nicer. [1]: https://man7.org/linux/man-pages/man2/memfd_create.2.html Signed-off-by: Patrick Roy <[email protected]>
Tests that explicitly check API responses now need to deal with the new huge_pages parameter. Signed-off-by: Patrick Roy <[email protected]>
Attempts to boot a microvm with guest memory backed by 2MB hugetlbfs pages. Adjusts the test infrastructure to allocate 2MB pages prior to test run (failing if it cannot do so). For now, we rely on no other process on the host trying to use hugetlbfs. The test is skipped on 4.14 because hugetlbfs support for sealable memfds was only added in 4.16. We put the test as a performance test to ensure it runs on ag=1 agents, to avoid problems with different agents on the same metal concurrently modifying the hugetlbfs pool. Signed-off-by: Patrick Roy <[email protected]>
Make the `valid_handler.rs` code sample page-size agnostic, in preparating for hugepages tests. Signed-off-by: Patrick Roy <[email protected]>
The test has to be UFFD based, as we cannot mmap the file with hugetlbfs enabled (as `MAP_HUGETLB` is a modifier to `MAP_ANONYMOUS`, which precludes file mappings). Signed-off-by: Patrick Roy <[email protected]>
The balloon device does not work with huge pages, so for now disallow using them together. Signed-off-by: Patrick Roy <[email protected]>
Booting our initrd artifact inside a huge-pages enabled VM causes it to get stuck, so for now this is seemingly not supported. Signed-off-by: Patrick Roy <[email protected]>
An EPT_VIOLATION kvm_exit happens when the MMU's extended page tables are missing an entry for some guest physical address (e.g. some page is present in the guest page tables, but KVM has not yet set up a mapping of guest-physical->host-physical address for it). This happens after snapshot restore even if a page is faulted in via UFFD, as fauling in via UFFD only maps the page into host userspace (e.g. Firecracker), but does not set up the EPT entries. We track the number of EPT_VIOLATIONS post restore when using UFFD for both 4K and 2M pages, as we expect their number to be significantly lower when using huge pages. We use a special UFFD handler that faults in the entire guest memory ahead of time, as otherwise we just track normal page faults. Signed-off-by: Patrick Roy <[email protected]>
Differential snapshots work with hugetlbfs pages out of the box. This is because despite guest memory being backed by 2M pages, KVM still keeps a dirty log at 4K granularity. This means we do not need to adjust our differential snapshot logic to handle 2M chunks, as the existing logic for 4K chunks stays valid. Signed-off-by: Patrick Roy <[email protected]>
Document how to use hugetlbfs with Firecracker. Signed-off-by: Patrick Roy <[email protected]>
Add the huge_pages field to the /machine-config documentation. Signed-off-by: Patrick Roy <[email protected]>
Add an entry about huge page support. Signed-off-by: Patrick Roy <[email protected]>
d72c786
to
f68c09a
Compare
When regenerating artifacts for firecracker-microvm#4360, the guest kernels got updated to a new patch version. Fixes firecracker-microvm#4454 Signed-off-by: Patrick Roy <[email protected]>
When regenerating artifacts for #4360, the guest kernels got updated to a new patch version. Fixes #4454 Signed-off-by: Patrick Roy <[email protected]>
Changes
Adds initial support for backing guest memory with huge pages. They can be configured via the
/machine-config
API, with Firecracker supporting 2M hugetlbfs pages. Huge page configuration is stored in the snapshot state, and a snapshot will be restored into a VM whose huge pages configuration matches that of the VM from which the snapshot was taken (with the operation erroring out if this is not possible for some reason, e.g. if trying to restore a hugetlbfs backed snapshot by mmaping the memory file). For the initial devpreview release, features that would require explicit integration with huge pages (such as memory ballooning) are mutually exclusive with huge page support.This is linked to the #2139.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
CHANGELOG.md
.TODO
s link to an issue.rust-vmm
.