Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Support for RTX 5090 #285

Open
3 tasks done
rubentorresbonet opened this issue Feb 14, 2025 · 6 comments
Open
3 tasks done

[REQUEST] Support for RTX 5090 #285

rubentorresbonet opened this issue Feb 14, 2025 · 6 comments
Labels
tracking Tracking for future changes

Comments

@rubentorresbonet
Copy link

rubentorresbonet commented Feb 14, 2025

Problem

Only cuda 12.8 supports RTX 5090.

When trying a vanilla tabby setup with cuda 12.x, these blockers pop up:

The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.
RuntimeError: CUDA error: no kernel image is available for execution on the device

I tried reinstalling pytorch after all the setup:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

But then tabby is pretty upset about it:

It looks like you're in a conda environment. Skipping venv check.
pip 25.0 from /home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/pip (python 3.11)
Loaded your saved preferences from `start_options.json`
"undefined symbol" error here usually means you are attempting to load a prebuilt extension wheel that was compiled against a different version of PyTorch than the one you are you using. Please verify that the versions match.
Traceback (most recent call last):
  File "/home/ruben/tabbyAPI/start.py", line 275, in <module>
    from main import entrypoint
  File "/home/ruben/tabbyAPI/main.py", line 12, in <module>
    from common import gen_logging, sampling, model
  File "/home/ruben/tabbyAPI/common/model.py", line 19, in <module>
    from backends.exllamav2.model import ExllamaV2Container
  File "/home/ruben/tabbyAPI/backends/exllamav2/model.py", line 9, in <module>
    from backends.exllamav2.vision import clear_image_embedding_cache
  File "/home/ruben/tabbyAPI/backends/exllamav2/vision.py", line 21, in <module>
    from exllamav2.generator import ExLlamaV2MMEmbedding
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/__init__.py", line 3, in <module>
    from exllamav2.model import ExLlamaV2
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/model.py", line 33, in <module>
    from exllamav2.config import ExLlamaV2Config
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/config.py", line 5, in <module>
    from exllamav2.stloader import STFile, cleanup_stfiles
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/stloader.py", line 5, in <module>
    from exllamav2.ext import none_tensor, exllamav2_ext as ext_c
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/ext.py", line 115, in <module>
    raise e
  File "/home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2/ext.py", line 107, in <module>
    import exllamav2_ext
ImportError: /home/ruben/miniconda3/envs/tabby/lib/python3.11/site-packages/exllamav2_ext.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs

This error was raised because a package was not found.
Update your dependencies by running update_scripts/update_deps.sh

Solution

Support rtx 5090 so that we can use it with tabby

Alternatives

No response

Explanation

we cannot use tabby with rtx 5090 otherwise

Examples

No response

Additional context

No response

Acknowledgements

  • I have looked for similar requests before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will make my requests politely.
@kingbri1
Copy link
Member

kingbri1 commented Feb 15, 2025

TabbyAPI can technically support any torch version. However, each subsequent wheel-based dependency needs to be built around that version. In addition, CUDA 12.8 + Windows is not supported for pytorch and the issues to support them are still open.

Wheel dependencies:

Since torch 2.7 is currently nightly, Tabby won't support it out of the box due to the unstable nature of nightly builds. However, you can build wheels for the above programs using your configuration and install them in a new venv.

To skip wheel installs in a venv:

  1. Run pip install . to install all non-wheel dependencies
  2. Build and install the wheels manually

Once Torch 2.7 is stable, the wheels can be updated in Tabby's pyproject and the CUDA version can be bumped to 12.8. I'll keep this issue open for tracking.

@rubentorresbonet
Copy link
Author

Thanks, I'm happy to try out and report back.
Do you have any instructions on building the wheels combination that tabby would like?

@DocShotgun
Copy link
Member

Assuming you have Blackwell-compatible torch already installed in your tabby venv:

For exllamav2, you should be able to git clone https://github.com/turboderp-org/exllamav2 into a folder, activate your tabby venv, then navigate into the exllamav2 folder and do pip install . --no-build-isolation.

For flash-attn, I'd say activate your tabby venv and then follow the instructions to install from source here: https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features. This package is pretty heavy to compile from source though, will take a while to run.

@rubentorresbonet
Copy link
Author

rubentorresbonet commented Feb 17, 2025

Thank you. Server is running, didnt try yet to do anything, but here are a few commands I used to get exllavav2 to work with tabby:

git clone https://github.com/turboderp-org/exllamav2
cd exllamav2
conda activate tabby
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 --force-reinstall
EXLLAMA_NOCOMPILE= pip install .
conda install -c conda-forge gcc
conda install -c conda-forge libstdcxx-ng
conda install -c conda-forge gxx=11.4
conda install -c conda-forge ninja
cd ..
python main.py

I had to do the EXLLAMA_NOCOMPILE= trick to avoid an import issue of the extension (_ext)

And flash attention took like 2-3 hours. It heated the room pretty well.

I'll update the ticket with the next findings.

@rubentorresbonet
Copy link
Author

I got DeepSeek-R1-Distill-Qwen-32B-4bpw-h6-exl2 to work with the commands below, thank you :-)

Just to double check I compiled and installed flash attention 2 correctly: is there any way to test if tabby is using it correctly for a model?

@kingbri1
Copy link
Member

Check in Tabby's logs to see if you're falling back to "compatibility mode". If those messages don't show up, you're using FA2

@kingbri1 kingbri1 added the tracking Tracking for future changes label Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tracking Tracking for future changes
Projects
None yet
Development

No branches or pull requests

3 participants