-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: supply id for minor meaningless device #2250
api: supply id for minor meaningless device #2250
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2250 +/- ##
==========================================
- Coverage 66.05% 66.05% -0.01%
==========================================
Files 454 454
Lines 53406 53438 +32
==========================================
+ Hits 35280 35298 +18
- Misses 15588 15597 +9
- Partials 2538 2543 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
71e856a
to
f1589bf
Compare
75e692e
to
3ad4389
Compare
/hold |
3ad4389
to
582c9c7
Compare
/hold cancel |
c1555a8
to
186918a
Compare
186918a
to
8014a9e
Compare
Signed-off-by: wangjianyu.wjy <[email protected]>
8014a9e
to
96cf7a6
Compare
/lgtm |
/approve |
Ⅰ. Describe what this PR does
The RDMA device is mounted inside the container based on the scheduling assignment result.
Ⅱ. Does this pull request fix one issue?
The scheduling framework already supports joint allocation of Gpus and RDMA devices, but several semantics need to be tested in practice, including preference and samePCIE. In order to allow RDMA devices to access the container, we need to fix the lack of BDF addresses in the results assigned by the scheduling algorithm to RDMA devices
Ⅲ. Describe how to verify it
In the k8s cluster, prepare one or more servers that support rdma network adapters as cluster nodes. Install the new version of the koordlet component, koord-manager, and the revamped multus-cni on each node, and check the node status. The number of resources is displayed as the actual number of RDMA nics on the node.
Write a pod.YAML to apply for RDMA network card resources, kubectl apply-f pod. On pod note (device-allocated), check whether the rdma device bdf address (busID field) is included in the rdma allocation result, and check the accuracy.
When the pod is in the running state, enter the container and run the ifconfig command to check the number of network cards. If the case is correct, multiple network card lists will be output, such as net1, net2, net3, and so on.
Ⅳ. Special notes for reviews
Deploy components that support RDMA, including koordlet, koord-manager, koord-scheduler, and multus-cni.
Due to the complete end-to-end passthrough, multi-CNI plug-in is also required, which complies with CNI specifications and supports multi-NIC allocation. Assigning the RDMA network card PF/VF to the Pod mentioned here requires the device ID to be injected into the component, otherwise it will not work. This change will be maintained separately in the multi-cni project or another PR
V. Checklist
make test