Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running on a Jetson Orin NX #714

Open
sgoudelis opened this issue Feb 19, 2025 · 0 comments
Open

running on a Jetson Orin NX #714

sgoudelis opened this issue Feb 19, 2025 · 0 comments

Comments

@sgoudelis
Copy link

sgoudelis commented Feb 19, 2025

Good morning,

I have been trying to make the exo project work on my Orin NX without success, here is the error I am getting when running exo:

(exo) sgoudelis@jetson:~/projects/exo$ exo 
Selected inference engine: None

  _____  _____  
 / _ \ \/ / _ \ 
|  __/>  < (_) |
 \___/_/\_\___/ 
    
Detected system: Linux
Inference engine name after selection: tinygrad
Traceback (most recent call last):
  File "/home/sgoudelis/miniconda3/envs/exo/bin/exo", line 33, in <module>
    sys.exit(load_entry_point('exo', 'console_scripts', 'exo')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sgoudelis/miniconda3/envs/exo/bin/exo", line 25, in importlib_load_entry_point
    return next(matches).load()
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/importlib/metadata/__init__.py", line 205, in load
    module = import_module(match.group('module'))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 999, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/sgoudelis/projects/exo/exo/main.py", line 106, in <module>
    inference_engine = get_inference_engine(inference_engine_name, shard_downloader)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sgoudelis/projects/exo/exo/inference/inference_engine.py", line 69, in get_inference_engine
    from exo.inference.tinygrad.inference import TinygradDynamicShardInferenceEngine
  File "/home/sgoudelis/projects/exo/exo/inference/tinygrad/inference.py", line 4, in <module>
    from exo.inference.tinygrad.models.llama import Transformer, TransformerShard, convert_from_huggingface, fix_bf16, sample_logits
  File "/home/sgoudelis/projects/exo/exo/inference/tinygrad/models/llama.py", line 2, in <module>
    from tinygrad import Tensor, Variable, TinyJit, dtypes, nn, Device
  File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/__init__.py", line 5, in <module>
    from tinygrad.tensor import Tensor                                    # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/tensor.py", line 12, in <module>
    from tinygrad.device import Device, BufferSpec
  File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/device.py", line 226, in <module>
    class CPUProgram:
  File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/device.py", line 227, in CPUProgram
    helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'kernel32' if sys.platform == "win32" else 'gcc_s'))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so: invalid ELF header

looking into the so file I get this:

(exo) sgoudelis@jetson:~/projects/exo$ file /home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so
/home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so: ASCII text
(exo) sgoudelis@jetson:~/projects/exo$ more /home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library.  */
GROUP ( libgcc_s.so.1 -lgcc )

Anyone had any idea how to make exo work on the Orin Jetson ?

UPDATE:

Moving the mentioned static object file out of the way actually makes exo go further. It does fail in another way:

Traceback (most recent call last):
  File "/home/sgoudelis/miniconda3/envs/exo/bin/exo", line 33, in <module>
    sys.exit(load_entry_point('exo', 'console_scripts', 'exo')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sgoudelis/projects/exo/exo/main.py", line 385, in run
    loop.run_until_complete(main())
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/home/sgoudelis/projects/exo/exo/main.py", line 349, in main
    await node.start(wait_for_peers=args.wait_for_peers)
  File "/home/sgoudelis/projects/exo/exo/orchestration/node.py", line 59, in start
    self.device_capabilities = await device_capabilities()
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sgoudelis/projects/exo/exo/topology/device_capabilities.py", line 153, in device_capabilities
    return await linux_device_capabilities()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sgoudelis/projects/exo/exo/topology/device_capabilities.py", line 188, in linux_device_capabilities
    gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/pynvml.py", line 2934, in nvmlDeviceGetMemoryInfo
    _nvmlCheckReturn(ret)
  File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/pynvml.py", line 979, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.NVMLError_NotSupported: Not Supported

I am a complete noob when it comes to NVIDIA CUDA stuff btw. I am guessing this happens because the Orin has shared memory.

ANOTHER UPDATE:

Exo does work with the Orin NX 16GB, by bypassing the part of the code is querying the VRAM amount and giving it a bogus number does make exo boot up just fine and also have GPU accelerated inference.

I would love for some feedback from one of the developers of the Exo project about this. Please feel free to comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant