Skip to content

Mlx5 offload enable #2676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 10, 2025
Merged

Mlx5 offload enable #2676

merged 7 commits into from
Apr 10, 2025

Conversation

mrbojangles3
Copy link
Contributor

@mrbojangles3 mrbojangles3 commented Feb 19, 2025

[Enable Hardware Offloads in Mellanox driver ]

The configuration changes in this PR will allow for hardware offloading of traffic control and connection tracking operations. There is a netdev conf paper that talks about these features.

How to use

These kernel configuration changes will allow the hardware offload features of the Mellanox nics to be used. The usage of these features comes from user space. For those interested in containerized workloads, one possible way to use these is similar to Red Hat Openshift allows this configuration (via Open vSwitch). Outside of ovs, these offloads can be configured via the tc or nft commands.

Testing done

I have compiled these changes, and booted a physical node equipped with a mellanox Cx-7.

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)
  • Inspected CI output for image differences: /boot and /usr size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.

@mrbojangles3
Copy link
Contributor Author

A link to the thread that started this PR.

@ader1990
Copy link
Contributor

The pipelines failed with:

2025-02-19T18:47:35.6812600Z INFO    grub_install.sh: Installing GRUB x86_64-xen in flatcar_production_image.bin
2025-02-19T18:47:35.7002936Z INFO    grub_install.sh: Compressing modules in flatcar/grub/x86_64-xen
2025-02-19T18:47:37.5624582Z INFO    grub_install.sh: Generating flatcar/grub/x86_64-xen/load.cfg
2025-02-19T18:47:37.5794400Z INFO    grub_install.sh: Generating xen/pvboot-x86_64.elf
2025-02-19T18:47:37.5900771Z INFO    grub_install.sh: Installing default x86_64 Xen bootloader.
2025-02-19T18:47:37.6644329Z INFO    grub_install.sh: Elapsed time (grub_install.sh): 0m3s
2025-02-19T18:47:37.7125747Z INFO    build_image: Generating flatcar_production_image_pcr_policy.zip
2025-02-19T18:47:38.0202747Z INFO    build_image: Writing flatcar_production_image_contents.txt
2025-02-19T18:47:38.8309417Z INFO    build_image: Writing flatcar_production_image_contents_wtd.txt
2025-02-19T18:47:39.0791751Z cpio: premature end of file
2025-02-19T18:47:39.0798053Z rmdir: failed to remove '/home/sdk/trunk/src/scripts/artifacts/amd64-usr/developer-4249.0.0+nightly-20250217-2100-5-gad11c4677c-a1/tmp_initrd_contents/rootfs-0': Directory not empty
2025-02-19T18:47:39.0943167Z ERROR   build_image: script called: build_image '--board=amd64-usr' '--group=developer' '--output_root=/home/sdk/trunk/src/scripts/artifacts' 'prodtar' 'container' 'sysext'
2025-02-19T18:47:39.0948625Z ERROR   build_image: Backtrace:  (most recent call is last)
2025-02-19T18:47:39.0962922Z ERROR   build_image:   file build_image, line 176, called: create_prod_image 'flatcar_production_image.bin' 'base' 'developer' 'coreos-base/coreos' 'containerd-flatcar:app-containers/containerd,docker-flatcar:app-containers/docker&app-containers/docker-cli&app-containers/docker-buildx'
2025-02-19T18:47:39.0977229Z ERROR   build_image:   file prod_image_util.sh, line 169, called: finish_image 'flatcar_production_image.bin' 'base' '/home/sdk/trunk/src/scripts/artifacts/amd64-usr/developer-4249.0.0+nightly-20250217-2100-5-gad11c4677c-a1/rootfs' 'flatcar_production_image_contents.txt' 'flatcar_production_image_contents_wtd.txt' 'flatcar_production_image.vmlinuz' 'flatcar_production_image_pcr_policy.zip' 'flatcar_production_image.grub' 'flatcar_production_image.shim' 'flatcar_production_image_kernel_config.txt' 'flatcar_production_image_initrd_contents.txt' 'flatcar_production_image_initrd_contents_wtd.txt' 'flatcar_production_image_disk_usage.txt'
2025-02-19T18:47:39.0988460Z ERROR   build_image:   file build_image_util.sh, line 869, called: die_err_trap '"${BUILD_LIBRARY_DIR}/extract-initramfs-from-vmlinuz.sh" "${root_fs_dir}/boot/flatcar/vmlinuz-a" "${BUILD_DIR}/tmp_initrd_contents"' '1'
2025-02-19T18:47:39.0993394Z ERROR   build_image: 
2025-02-19T18:47:39.0999869Z ERROR   build_image: Command failed:
2025-02-19T18:47:39.1006572Z ERROR   build_image:   Command '"${BUILD_LIBRARY_DIR}/extract-initramfs-from-vmlinuz.sh" "${root_fs_dir}/boot/flatcar/vmlinuz-a" "${BUILD_DIR}/tmp_initrd_contents"' exited with nonzero code: 1

This is because the initrd now contains a phantom cpio piece, I solved this in my kernel upgrade PR here, probably will solve this issue: 2be94c2.
You can rebase and cherry-pick this commit, and I can rerun the pipeline.

Copy link

github-actions bot commented Feb 20, 2025

Build action triggered: https://github.com/flatcar/scripts/actions/runs/14363396624

@mrbojangles3
Copy link
Contributor Author

On my local machine I was able to pass the build packages and build images. I am hopeful that this will also be the case in CI.

@jepio
Copy link
Member

jepio commented Feb 24, 2025

The arm64 build failed due to CONFIG_SWITCHDEV being in the amd64-only config, and CONFIG_VFIO_PCI_{VGA,IGD} in commonconfig actually being x86 only.
I've pushed a couple of commits to fix this up (hope you don't mind @mrbojangles3) and to make as many of the options modules as possible. With that I hope we can review what the size impact is of all these options so that we can judge whether we can afford to enable all of them.

@jepio
Copy link
Member

jepio commented Feb 24, 2025

Also exectued the sort_config script to sort the new options.

@mrbojangles3
Copy link
Contributor Author

(hope you don't mind @mrbojangles3)

I do not. Thank you for the help.

Right now the tests are failing, does that mean things have gotten too big? Did I miss documentation somewhere on a size limitation?

@mrbojangles3
Copy link
Contributor Author

I would like to help here, but I am not sure what to do.

@chewi
Copy link
Contributor

chewi commented Mar 20, 2025

@jepio Did it fail because of this?
flatcar/flatcar-dev-util@flatcar-master...jepio/fetch-head

@t-lo
Copy link
Member

t-lo commented Apr 9, 2025

Tests run fine actually, devcontainer fails (that's expected) and the PR tests can't post messages to this issue (also expected).

If the code review of this PR is good, we can merge. Build and tests can be considered good.

@t-lo t-lo requested a review from a team April 9, 2025 15:13
@chewi chewi force-pushed the mlx5_offload_enable branch from 300ead6 to f39ee44 Compare April 9, 2025 17:18
@chewi
Copy link
Contributor

chewi commented Apr 9, 2025

I've rebased to address the conflict, and I'm now running this under our internal CI, including on Azure.

@chewi
Copy link
Contributor

chewi commented Apr 9, 2025

I remembered what Thilo said in the meeting, so I've now cancelled that and started it again after bumping the Azure machine sizes to v6.

Copy link
Contributor

@chewi chewi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That successfully passed. I'm slightly concerned about the 331KB kernel image increase, but we have sufficient space before I resolve that issue.

@chewi chewi merged commit b8dd649 into flatcar:main Apr 10, 2025
1 of 2 checks passed
@github-project-automation github-project-automation bot moved this from ✅ Testing / in Review to Implemented in Flatcar tactical, release planning, and roadmap Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

6 participants