Skip to content

Add nvidia-cdi-refresh service #1076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions deployments/systemd/nvidia-cdi-refresh.path
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[Unit]
Description=Trigger CDI refresh on NVIDIA driver install / uninstall events

[Path]
PathChanged=/lib/modules/%v/modules.dep
PathChanged=/lib/modules/%v/modules.dep.bin

[Install]
WantedBy=multi-user.target
27 changes: 27 additions & 0 deletions deployments/systemd/nvidia-cdi-refresh.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[Unit]
Description=Refresh NVIDIA CDI specification file
ConditionPathExists=/usr/bin/nvidia-smi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check for /usr/bin/nvidia-ctk?

ConditionPathExists=/usr/bin/nvidia-ctk

[Service]
Type=oneshot
ExecCondition=/usr/bin/grep -qE '/nvidia.ko' /lib/modules/%v/modules.dep
ExecStart=/usr/bin/nvidia-ctk cdi generate --output=/var/run/cdi/nvidia.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should this path depend on the installation locations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, yes. umm let me think how to mod this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm after checking we /usr/bin set as default for both deb/rpm, and we don't provide macros to change the install path. So having this hardcoded here looks ok. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then install command installs to:

install -m 755 -t %{buildroot}%{_bindir} nvidia-ctk

If this is always /usr/bin then we don't neet to update this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation, we can leave it as is. - https://docs.fedoraproject.org/en-US/packaging-guidelines/RPMMacros/#macros_installation

CapabilityBoundingSet=CAP_SYS_MODULE CAP_SYS_ADMIN CAP_MKNOD

[Install]
WantedBy=multi-user.target
1 change: 1 addition & 0 deletions docker/Dockerfile.debian
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ RUN make PREFIX=${DIST_DIR} cmds

WORKDIR $DIST_DIR
COPY packaging/debian ./debian
COPY deployments/systemd/ .

ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile.opensuse-leap
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ RUN make PREFIX=${DIST_DIR} cmds

WORKDIR $DIST_DIR/..
COPY packaging/rpm .
COPY deployments/systemd/ .

ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile.rpm-yum
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ RUN make PREFIX=${DIST_DIR} cmds

WORKDIR $DIST_DIR/..
COPY packaging/rpm .
COPY deployments/systemd/* ${DIST_DIR}/

ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile.ubuntu
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ RUN make PREFIX=${DIST_DIR} cmds

WORKDIR $DIST_DIR
COPY packaging/debian ./debian
COPY deployments/systemd/ .

ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
Expand Down
2 changes: 2 additions & 0 deletions packaging/debian/nvidia-container-toolkit-base.install
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
nvidia-container-runtime /usr/bin
nvidia-ctk /usr/bin
nvidia-cdi-hook /usr/bin
nvidia-cdi-refresh.service /etc/systemd/system/
nvidia-cdi-refresh.path /etc/systemd/system/
10 changes: 10 additions & 0 deletions packaging/debian/nvidia-container-toolkit-base.postinst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,16 @@ set -e
case "$1" in
configure)
/usr/bin/nvidia-ctk --quiet config --config-file=/etc/nvidia-container-runtime/config.toml --in-place

if command -v systemctl >/dev/null 2>&1 \
&& systemctl --quiet is-system-running 2>/dev/null; then

systemctl daemon-reload || true

if [ -z "$2" ]; then # $2 empty → first install
systemctl enable --now nvidia-cdi-refresh.path || true
fi
fi
;;

abort-upgrade|abort-remove|abort-deconfigure)
Expand Down
19 changes: 18 additions & 1 deletion packaging/rpm/SPECS/nvidia-container-toolkit.spec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not have an additional package on rpm-based systems?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean adding the service install as part of the regular RPM install script?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the debian packages you added an nvidia-container-toolkit-cdi-refresh package that includes the systemd unit and udef rules. This seems to be missing from the RPM packages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your question, on the DEB side I noticed we have separated specific components into individual packages.
But on the RPM side we only have 1 file (RPM Package def file) so I followed the structure and added the install of the new 3 files in the same RPM def file

packaging
├── debian
│   ├── changelog.old
│   ├── compat
│   ├── control
│   ├── copyright
│   ├── nvidia-container-toolkit-base.install
│   ├── nvidia-container-toolkit-base.postinst
│   ├── nvidia-container-toolkit-cdi-refresh.install
│   ├── nvidia-container-toolkit-cdi-refresh.postinst
│   ├── nvidia-container-toolkit-operator-extensions.install
│   ├── nvidia-container-toolkit.install
│   ├── nvidia-container-toolkit.lintian-overrides
│   ├── nvidia-container-toolkit.postinst
│   ├── nvidia-container-toolkit.postrm
│   ├── prepare
│   └── rules
└── rpm
    ├── SOURCES
    │   └── LICENSE
    └── SPECS
        └── nvidia-container-toolkit.spec

Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ Source3: nvidia-container-runtime
Source4: nvidia-container-runtime.cdi
Source5: nvidia-container-runtime.legacy
Source6: nvidia-cdi-hook
Source7: nvidia-cdi-refresh.service
Source8: nvidia-cdi-refresh.path

Obsoletes: nvidia-container-runtime <= 3.5.0-1, nvidia-container-runtime-hook <= 1.4.0-2
Provides: nvidia-container-runtime
Expand All @@ -28,23 +30,35 @@ Requires: nvidia-container-toolkit-base == %{version}-%{release}
Provides tools and utilities to enable GPU support in containers.

%prep
cp %{SOURCE0} %{SOURCE1} %{SOURCE2} %{SOURCE3} %{SOURCE4} %{SOURCE5} %{SOURCE6} .
cp %{SOURCE0} %{SOURCE1} %{SOURCE2} %{SOURCE3} %{SOURCE4} %{SOURCE5} %{SOURCE6} %{SOURCE7} %{SOURCE8} .

%install
mkdir -p %{buildroot}%{_bindir}
mkdir -p %{buildroot}/etc/systemd/system/

install -m 755 -t %{buildroot}%{_bindir} nvidia-container-runtime-hook
install -m 755 -t %{buildroot}%{_bindir} nvidia-container-runtime
install -m 755 -t %{buildroot}%{_bindir} nvidia-container-runtime.cdi
install -m 755 -t %{buildroot}%{_bindir} nvidia-container-runtime.legacy
install -m 755 -t %{buildroot}%{_bindir} nvidia-ctk
install -m 755 -t %{buildroot}%{_bindir} nvidia-cdi-hook
install -m 644 -t %{buildroot}/etc/systemd/system %{SOURCE7}
install -m 644 -t %{buildroot}/etc/systemd/system %{SOURCE8}

%post
if [ $1 -gt 1 ]; then # only on package upgrade
mkdir -p %{_localstatedir}/lib/rpm-state/nvidia-container-toolkit
cp -af %{_bindir}/nvidia-container-runtime-hook %{_localstatedir}/lib/rpm-state/nvidia-container-toolkit
fi

# Reload systemd unit cache
/bin/systemctl daemon-reload || :

# On fresh install ($1 == 1) enable the path unit so it starts at boot
if [ "$1" -eq 1 ]; then
/bin/systemctl enable --now nvidia-cdi-refresh.path || :
fi

%posttrans
if [ ! -e %{_bindir}/nvidia-container-runtime-hook ]; then
# repairing lost file nvidia-container-runtime-hook
Expand Down Expand Up @@ -89,6 +103,9 @@ Provides tools such as the NVIDIA Container Runtime and NVIDIA Container Toolkit
%{_bindir}/nvidia-container-runtime
%{_bindir}/nvidia-ctk
%{_bindir}/nvidia-cdi-hook
%dir /etc/systemd/system
/etc/systemd/system/nvidia-cdi-refresh.service
/etc/systemd/system/nvidia-cdi-refresh.path

# The OPERATOR EXTENSIONS package consists of components that are required to enable GPU support in Kubernetes.
# This package is not distributed as part of the NVIDIA Container Toolkit RPMs.
Expand Down