Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate FAQ info into AMDGPU install and troubleshooting guides #349

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ BARs
br
Broadcom
CTest
denylist
DHCP
dkms
dmesg
Expand Down Expand Up @@ -34,5 +35,6 @@ vCPU
vfio
VFIO
vhost
whl
udev
Udev
25 changes: 25 additions & 0 deletions docs/install/amdgpu-install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,19 @@ access, use the ``dkms`` use case:

amdgpu-install --usecase=dkms

To verify the kernel installation, use this command:

.. code-block:: shell

sudo dkms status

If the installation of the kernel module was successful, the command displays the output
in the following format:

.. code-block:: shell

amdgpu, 4.3-52.el7, 3.10.0-1160.11.1.el7.x86_64, x86_64: installed (original_module exists)

Upgrading ROCm
=================================================

Expand Down Expand Up @@ -228,6 +241,18 @@ To install use cases specific to your requirements, use the installer (``amdgpu-

sudo amdgpu-install --usecase=rocm,asan

* To list all possible use cases, use the ``--list-usecase`` option:

.. code-block:: bash

sudo amdgpu-install --list-usecase

* The ``--help`` option displays all available options for the ``amdgpu-install`` script:

.. code-block:: bash

sudo amdgpu-install --help

Uninstalling ROCm
=================================================

Expand Down
5 changes: 5 additions & 0 deletions docs/install/native-install/multi-version-install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ A single-version ROCm installation involves the following.
See :doc:`../../install/quick-start` or :doc:`../../install/detailed-install` for
a standard single-version installation.

.. note::

You cannot install single-version and multi-version ROCm packages together on the same machine.
The conflicting package versions might result in unpredictable behavior.

The following diagram illustrates the difference between single-version and
multi-version ROCm installations.

Expand Down
31 changes: 30 additions & 1 deletion docs/reference/install-faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ After installing these packages and :ref:`registering using your license for Ent

.. _troubleshooting-symlinks:

Issue #7: Installations using Python wheels (``.whl`` files) do not support soft links
Issue #7: Installations using Python wheels (.whl files) do not support soft links
======================================================================================

If you have installed ROCm or any ROCm component using a Python wheel (``.whl`` file), running
Expand All @@ -158,3 +158,32 @@ or
python3 /opt/rocm-6.2.0/libexec/rocm_smi/rocm_smi.py

See `Symbolic links in wheels <https://discuss.python.org/t/symbolic-links-in-wheels/1945>`_ for more information.

.. _troubleshooting-denylist:

Issue #8: The AMDGPU driver is not loaded after installation
======================================================================================

When you are verifying the ROCm installation according to the :doc:`post-install instructions <../install/post-install>`,
the ``rocm-smi`` and ``rocminfo`` commands might fail with the error message
``Driver not initialized`` or not display any output. This could indicate
the AMDGPU driver is not loaded.

**Solution:** Ensure the AMDGPU driver is not on a denylist such as ``/etc/modprobe.d/blacklist-amdgpu.conf``.
The location of this file might vary depending on the system distribution and version.
To verify whether the driver is on a denylist, use the following command:

.. code-block:: shell

grep amdgpu /etc/modprobe.d/*

.. _troubleshooting-group-membership:

Issue #9: Cannot access the AMD GPU or accelerator after installation
======================================================================================

If the group permissions are not set properly during ROCm installation,
you might get an error similar to ``Permission denied`` when attempting to access the AMD GPU.

**Solution:** You must be part of the ``video`` and ``render`` groups to access the AMD GPU or accelerator.
To learn how to add an account to these groups, see :ref:`group_permissions`.
Loading