Skip to content

Commit

Permalink
Incorporate FAQ info into AMDGPU install and troubleshooting guides (#…
Browse files Browse the repository at this point in the history
…349) (#362)

* Incorporate FAQ info into AMDGPU install and troubleshooting guides

* Fix linting errors

* Apply suggestions from the external review

(cherry picked from commit fd77a7e)
  • Loading branch information
amd-jnovotny authored Dec 10, 2024
1 parent 9bd0f9a commit c10b0e3
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 1 deletion.
2 changes: 2 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ BARs
br
Broadcom
CTest
denylist
DHCP
dkms
dmesg
Expand Down Expand Up @@ -34,5 +35,6 @@ vCPU
vfio
VFIO
vhost
whl
udev
Udev
25 changes: 25 additions & 0 deletions docs/install/amdgpu-install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,19 @@ access, use the ``dkms`` use case:
amdgpu-install --usecase=dkms
To verify the kernel installation, use this command:

.. code-block:: shell
sudo dkms status
If the installation of the kernel module was successful, the command displays the output
in the following format:

.. code-block:: shell
amdgpu, 4.3-52.el7, 3.10.0-1160.11.1.el7.x86_64, x86_64: installed (original_module exists)
Upgrading ROCm
=================================================

Expand Down Expand Up @@ -228,6 +241,18 @@ To install use cases specific to your requirements, use the installer (``amdgpu-
sudo amdgpu-install --usecase=rocm,asan
* To list all possible use cases, use the ``--list-usecase`` option:

.. code-block:: bash
sudo amdgpu-install --list-usecase
* The ``--help`` option displays all available options for the ``amdgpu-install`` script:

.. code-block:: bash
sudo amdgpu-install --help
Uninstalling ROCm
=================================================

Expand Down
5 changes: 5 additions & 0 deletions docs/install/native-install/multi-version-install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ A single-version ROCm installation involves the following.
See :doc:`../../install/quick-start` or :doc:`../../install/detailed-install` for
a standard single-version installation.

.. note::

You cannot install single-version and multi-version ROCm packages together on the same machine.
The conflicting package versions might result in unpredictable behavior.

The following diagram illustrates the difference between single-version and
multi-version ROCm installations.

Expand Down
31 changes: 30 additions & 1 deletion docs/reference/install-faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ After installing these packages and :ref:`registering using your license for Ent

.. _troubleshooting-symlinks:

Issue #7: Installations using Python wheels (``.whl`` files) do not support soft links
Issue #7: Installations using Python wheels (.whl files) do not support soft links
======================================================================================

If you have installed ROCm or any ROCm component using a Python wheel (``.whl`` file), running
Expand All @@ -158,3 +158,32 @@ or
python3 /opt/rocm-6.2.0/libexec/rocm_smi/rocm_smi.py
See `Symbolic links in wheels <https://discuss.python.org/t/symbolic-links-in-wheels/1945>`_ for more information.

.. _troubleshooting-denylist:

Issue #8: The AMDGPU driver is not loaded after installation
======================================================================================

When you are verifying the ROCm installation according to the :doc:`post-install instructions <../install/post-install>`,
the ``rocm-smi`` and ``rocminfo`` commands might fail with the error message
``Driver not initialized`` or not display any output. This could indicate
the AMDGPU driver is not loaded.

**Solution:** Ensure the AMDGPU driver is not on a denylist such as ``/etc/modprobe.d/blacklist-amdgpu.conf``.
The location of this file might vary depending on the system distribution and version.
To verify whether the driver is on a denylist, use the following command:

.. code-block:: shell
grep amdgpu /etc/modprobe.d/*
.. _troubleshooting-group-membership:

Issue #9: Cannot access the AMD GPU or accelerator after installation
======================================================================================

If the group permissions are not set properly during ROCm installation,
you might get an error similar to ``Permission denied`` when attempting to access the AMD GPU.

**Solution:** You must be part of the ``video`` and ``render`` groups to access the AMD GPU or accelerator.
To learn how to add an account to these groups, see :ref:`group_permissions`.

0 comments on commit c10b0e3

Please sign in to comment.