[REQUEST] Support for RTX 5090 #285

rubentorresbonet · 2025-02-14T20:58:08Z

Problem

Only cuda 12.8 supports RTX 5090.

When trying a vanilla tabby setup with cuda 12.x, these blockers pop up:

The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.
RuntimeError: CUDA error: no kernel image is available for execution on the device

I tried reinstalling pytorch after all the setup:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

But then tabby is pretty upset about it:

It looks like you're in a conda environment. Skipping venv check.
pip 25.0 from /home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/pip (python 3.11)
Loaded your saved preferences from `start_options.json`
"undefined symbol" error here usually means you are attempting to load a prebuilt extension wheel that was compiled against a different version of PyTorch than the one you are you using. Please verify that the versions match.
Traceback (most recent call last):
  File "/home/ruben/tabbyAPI/start.py", line 275, in <module>
    from main import entrypoint
  File "/home/ruben/tabbyAPI/main.py", line 12, in <module>
    from common import gen_logging, sampling, model
  File "/home/ruben/tabbyAPI/common/model.py", line 19, in <module>
    from backends.exllamav2.model import ExllamaV2Container
  File "/home/ruben/tabbyAPI/backends/exllamav2/model.py", line 9, in <module>
    from backends.exllamav2.vision import clear_image_embedding_cache
  File "/home/ruben/tabbyAPI/backends/exllamav2/vision.py", line 21, in <module>
    from exllamav2.generator import ExLlamaV2MMEmbedding
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/__init__.py", line 3, in <module>
    from exllamav2.model import ExLlamaV2
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/model.py", line 33, in <module>
    from exllamav2.config import ExLlamaV2Config
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/config.py", line 5, in <module>
    from exllamav2.stloader import STFile, cleanup_stfiles
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/stloader.py", line 5, in <module>
    from exllamav2.ext import none_tensor, exllamav2_ext as ext_c
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/ext.py", line 115, in <module>
    raise e
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/ext.py", line 107, in <module>
    import exllamav2_ext
ImportError: /home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2_ext.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs

This error was raised because a package was not found.
Update your dependencies by running update_scripts/update_deps.sh

Solution

Support rtx 5090 so that we can use it with tabby

Alternatives

No response

Explanation

we cannot use tabby with rtx 5090 otherwise

Examples

No response

Additional context

No response

Acknowledgements

I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.

The text was updated successfully, but these errors were encountered:

kingbri1 · 2025-02-15T03:16:33Z

TabbyAPI can technically support any torch version. However, each subsequent wheel-based dependency needs to be built around that version. In addition, CUDA 12.8 + Windows is not supported for pytorch and the issues to support them are still open.

Wheel dependencies:

Since torch 2.7 is currently nightly, Tabby won't support it out of the box due to the unstable nature of nightly builds. However, you can build wheels for the above programs using your configuration and install them in a new venv.

To skip wheel installs in a venv:

Run pip install . to install all non-wheel dependencies
Build and install the wheels manually

Once Torch 2.7 is stable, the wheels can be updated in Tabby's pyproject and the CUDA version can be bumped to 12.8. I'll keep this issue open for tracking.

rubentorresbonet · 2025-02-16T18:09:53Z

Thanks, I'm happy to try out and report back.
Do you have any instructions on building the wheels combination that tabby would like?

DocShotgun · 2025-02-17T02:39:03Z

Assuming you have Blackwell-compatible torch already installed in your tabby venv:

For exllamav2, you should be able to git clone https://github.com/turboderp-org/exllamav2 into a folder, activate your tabby venv, then navigate into the exllamav2 folder and do pip install . --no-build-isolation.

For flash-attn, I'd say activate your tabby venv and then follow the instructions to install from source here: https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features. This package is pretty heavy to compile from source though, will take a while to run.

rubentorresbonet · 2025-02-17T08:52:22Z

Thank you. Server is running, didnt try yet to do anything, but here are a few commands I used to get exllavav2 to work with tabby:

git clone https://github.com/turboderp-org/exllamav2
cd exllamav2
conda activate tabby
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 --force-reinstall
EXLLAMA_NOCOMPILE= pip install .
conda install -c conda-forge gcc
conda install -c conda-forge libstdcxx-ng
conda install -c conda-forge gxx=11.4
conda install -c conda-forge ninja
cd ..
python main.py

I had to do the EXLLAMA_NOCOMPILE= trick to avoid an import issue of the extension (_ext)

And flash attention took like 2-3 hours. It heated the room pretty well.

I'll update the ticket with the next findings.

rubentorresbonet · 2025-02-17T09:08:50Z

I got DeepSeek-R1-Distill-Qwen-32B-4bpw-h6-exl2 to work with the commands below, thank you :-)

Just to double check I compiled and installed flash attention 2 correctly: is there any way to test if tabby is using it correctly for a model?

kingbri1 · 2025-02-17T16:34:10Z

Check in Tabby's logs to see if you're falling back to "compatibility mode". If those messages don't show up, you're using FA2

kingbri1 added the tracking Tracking for future changes label Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST] Support for RTX 5090 #285

[REQUEST] Support for RTX 5090 #285

rubentorresbonet commented Feb 14, 2025 •

edited

Loading

kingbri1 commented Feb 15, 2025 •

edited

Loading

rubentorresbonet commented Feb 16, 2025

DocShotgun commented Feb 17, 2025

rubentorresbonet commented Feb 17, 2025 •

edited

Loading

rubentorresbonet commented Feb 17, 2025

kingbri1 commented Feb 17, 2025

[REQUEST] Support for RTX 5090 #285

[REQUEST] Support for RTX 5090 #285

Comments

rubentorresbonet commented Feb 14, 2025 • edited Loading

Problem

Solution

Alternatives

Explanation

Examples

Additional context

Acknowledgements

kingbri1 commented Feb 15, 2025 • edited Loading

rubentorresbonet commented Feb 16, 2025

DocShotgun commented Feb 17, 2025

rubentorresbonet commented Feb 17, 2025 • edited Loading

rubentorresbonet commented Feb 17, 2025

kingbri1 commented Feb 17, 2025

rubentorresbonet commented Feb 14, 2025 •

edited

Loading

kingbri1 commented Feb 15, 2025 •

edited

Loading

rubentorresbonet commented Feb 17, 2025 •

edited

Loading