Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudnn_home not valid during build #33

Closed
mfruhner opened this issue Mar 18, 2021 · 20 comments
Closed

cudnn_home not valid during build #33

mfruhner opened this issue Mar 18, 2021 · 20 comments

Comments

@mfruhner
Copy link

Description
I am not able to build the ONNX Backend. I am following the build instructions in the README but the build fails at Step 17.

Triton Information
Main Branch for Trition Version 21.02

To Reproduce

I am running DGX OS 5 (Ubuntu 20.04).

cmake -DCMAKE_INSTALL_PREFIX:PATH=pwd/install -DTRITON_BUILD_ONNXRUNTIME_VERSION=1.6.0 -DTRITO N_BUILD_CONTAINER_VERSION=21.02 ..
make install

Output:

Step 17/24 : RUN ./build.sh ${COMMON_BUILD_ARGS} --update --build --use_cuda --cuda_home "/usr/local/cuda" ---> Running in 3360f12bb769 2021-03-18 11:01:00,463 build [ERROR] - cuda_home and cudnn_home paths must be specified and valid. cuda_home='/usr/local/cuda' valid=True. cudnn_home='None' valid=False The command '/bin/sh -c ./build.sh ${COMMON_BUILD_ARGS} --update --build --use_cuda --cuda_home "/usr/local/cuda"' returned a non-zero code: 1 make[2]: *** [CMakeFiles/ort_target.dir/build.make:81: onnxruntime/lib/libonnxruntime.so.1.6.0] Fehler 1 make[1]: *** [CMakeFiles/Makefile2:158: CMakeFiles/ort_target.dir/all] Fehler 2 make: *** [Makefile:149: all] Fehler 2

Expected behavior
I expect the build to succed.

@CoderHam
Copy link
Contributor

https://github.com/triton-inference-server/onnxruntime_backend/blob/main/tools/gen_ort_dockerfile.py#L93
The build relies on getting CUDNN_VERSION from the base containers here.
I checked that the variable is indeed present in the nvcr.io/nvidia/tritonserver:21.02-py3-min container. Can you share the dockerfile generated by gen_ort_dockerfile.py in your build?

@askhade
Copy link
Contributor

askhade commented May 28, 2021

I have encountered this issue too... perhaps it may be a good idea to allow users to enter the path for cudnn similar to cuda.
TRITON_BUILD_CUDNN_HOME (similar to TRITON_BUILD_CUDA_HOME)

@askhade
Copy link
Contributor

askhade commented May 28, 2021

@mfruhner : Were you able to find a workaround ? I simply updated the gen_ort_dockerfile.py script tp explicitly include the cudnn path

@GowthamKudupudi
Copy link

GowthamKudupudi commented Jun 8, 2021

@mfruhner, did you resolve it? if the build is failing then many should report but only few are here in this thread; are we doing anything wrong? I'm using --no-container-build flag while building triton server; why the build is trying to build ONNXruntime inside a container!

@GowthamKudupudi
Copy link

@CoderHam https://paste.ubuntu.com/p/nF3HCcYycR/ is the Dockerfile.ort found in the build folder.

@GowthamKudupudi
Copy link

@askhade what should be the value of --cudnn-home?

@askhade
Copy link
Contributor

askhade commented Jul 22, 2021

--cudnn_home should be set to the path to cudnn libs dir

@chandrameenamohan
Copy link

chandrameenamohan commented Aug 6, 2021

@askhade or @GowthamKudupudi Are you able to resolve it? I am trying this in centos7. I am able to build tensorflow1, tensorflow2, python and pytorch backend. But I am getting error when I try to build backend for onnx.
I use this command to build:
./build.py --cmake-dir=/server/build --build-dir=/tmp/citritonbuild --no-container-build --endpoint=http --endpoint=grpc --repo-tag=common:r21.04 --repo-tag=core:r21.04 --repo-tag=backend:r21.04 --repo-tag=thirdparty:r21.04 --backend=onnxruntime:r21.04 --enable-logging --enable-stats --enable-tracing

This is the error:

Step 26/37 : RUN ./build.sh ${COMMON_BUILD_ARGS} --update --build --use_cuda --cuda_home "/usr/local/cuda" --use_tensorrt --tensorrt_home "/usr/src/tensorrt" --use_openvino CPU_FP32
 ---> Running in dc590d08a4e7
2021-08-06 06:22:20,248 tools_python_utils [INFO] - flatbuffers module is not installed. parse_config will not be available
2021-08-06 06:22:20,257 build [ERROR] - cuda_home and cudnn_home paths must be specified and valid.
cuda_home='/usr/local/cuda' valid=True. cudnn_home='None' valid=False
The command '/bin/sh -c ./build.sh ${COMMON_BUILD_ARGS} --update --build --use_cuda --cuda_home "/usr/local/cuda" --use_tensorrt --tensorrt_home "/usr/src/tensorrt" --use_openvino CPU_FP32' returned a non-zero code: 1
make[2]: *** [CMakeFiles/ort_target.dir/build.make:81: onnxruntime/lib/libonnxruntime.so.1.7.1] Error 1
make[2]: Leaving directory '/tmp/citritonbuild/onnxruntime/build'
make[1]: *** [CMakeFiles/Makefile2:158: CMakeFiles/ort_target.dir/all] Error 2
make[1]: Leaving directory '/tmp/citritonbuild/onnxruntime/build'
make: *** [Makefile:149: all] Error 2
error: make install failed

@chandrameenamohan
Copy link

@mfruhner : Were you able to find a workaround ? I simply updated the gen_ort_dockerfile.py script tp explicitly include the cudnn path

Can you please explicitly mention where did you edit and how did you run this?
When I run build script. It automatically downloads the onnxruntime from git. What you did to use your local downloaded onnxruntime backend codebase?

@mfruhner
Copy link
Author

mfruhner commented Aug 9, 2021

I didn't try to solve this any further and went on to use something else, sorry.

@askhade
Copy link
Contributor

askhade commented Sep 13, 2021

@chandrameenamohan : I suppose you are hitting this issue because of "--no-container-build" can you remove it and test again?
@CoderHam : Can you pick this up?

@GowthamKudupudi
Copy link

The comment by @CoderHam that is liked is the key

@aravindhank11
Copy link

Any known fixes on how to pass the cudnn_home with --no-container-build opt?

@aravindhank11
Copy link

The following changes to build.py did the trick (in case somebody else has come across a similar issue & looking for an easy fix):

diff --git a/build.py b/build.py
index 82754fa9..a06b42e9 100755
--- a/build.py
+++ b/build.py
@@ -640,7 +640,11 @@ def pytorch_cmake_args(images):
 def onnxruntime_cmake_args(images, library_paths):
     cargs = [
         cmake_backend_arg('onnxruntime', 'TRITON_BUILD_ONNXRUNTIME_VERSION',
-                          None, TRITON_VERSION_MAP[FLAGS.version][2])
+                          None, TRITON_VERSION_MAP[FLAGS.version][2]),
+        cmake_backend_arg('onnxruntime', 'TRITON_BUILD_CUDA_HOME',
+                          None, '/usr/local/cuda-11.7/'),
+        cmake_backend_arg('onnxruntime', 'TRITON_BUILD_CUDNN_HOME',
+                          None, '/usr/lib/x86_64-linux-gnu/')
     ]
 
     # TRITON_ENABLE_GPU is already set for all backends in backend_cmake_args()

I am not sure if build.py exposes this as a generic run-time parameter. I would be more than happy to add the support if need be.

@changchengx
Copy link

@askhade @GowthamKudupudi
I hit the same failure when build Triton/server/r23.04 branch with below command on the host with Ubuntu20.04.5 OS:

$ ./build.py -v --no-container-build --build-dir=`pwd`/build --enable-all

@changchengx
Copy link

@aravindhank11
I'll apply your patch on Triton/server/r23.04 branch and run the build command again to check it:

$ ./build.py -v --no-container-build --build-dir=`pwd`/build --enable-all

@changchengx
Copy link

@aravindhank11 It also works when building Triton/server/r23.04

@changchengx
Copy link

@aravindhank11 Without using the patch, it also works to build Triton/sever/r23.04 without docker by below command:

./build.py -v --no-container-build --build-dir=`pwd`/build --enable-all --extra-backend-cmake-arg=onnxruntime:TRITON_BUILD_CUDA_HOME=/usr/local/cuda-12.1/ --extra-backend-cmake-arg=onnxruntime:TRITON_BUILD_CUDNN_HOME=/usr/lib/x86_64-linux-gnu/

@pultarmi
Copy link

pultarmi commented Jun 24, 2024

I just want to report that this bug still exists in v23.11 and I solved it by changing
"RUN ./build.sh ${{COMMON_BUILD_ARGS}} --update --build {}"
to
"RUN ./build.sh ${{COMMON_BUILD_ARGS}} --cudnn_home=/usr/local/cudnn-8.9 --update --build {}"
in gen_ort_dockerfile.py. The build process is run in a docker image so first you should check what cudnn version is inside.

There is another bug related to build.sh requiring non-privileged user for it to run, hence you want to use
"RUN ./build.sh ${{COMMON_BUILD_ARGS}} --allow_running_as_root --cudnn_home=/usr/local/cudnn-8.9 --update --build {}"

Then it compiles with no further problems

@631068264
Copy link

v24.09

#!/bin/bash

mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install \
      -DTRITON_BUILD_ONNXRUNTIME_VERSION=1.19.2 \
      -DTRITON_BACKEND_REPO_TAG=r24.09 \
      -DTRITON_CORE_REPO_TAG=r24.09 \
      -DTRITON_COMMON_REPO_TAG=r24.09 \
      -DTRITON_ENABLE_ONNXRUNTIME_TENSORRT=ON \
      -DTRITON_BUILD_CONTAINER_VERSION=24.09 ..

make install
Step 17/27 : RUN ./build.sh ${COMMON_BUILD_ARGS} --update --build --use_cuda --cuda_home "/usr/local/cuda" --use_tensorrt --use_tensorrt_builtin_parser --tensorrt_home "/usr/src/tensorrt" --allow_running_as_root
 ---> Running in 6d9e291eee48
2024-11-11 08:33:16,650 tools_python_utils [INFO] - flatbuffers module is not installed. parse_config will not be available
2024-11-11 08:33:16,655 build [DEBUG] - Command line arguments:
  --build_dir /workspace/onnxruntime/build/Linux --config Release --skip_submodule_sync --parallel --build_shared_lib --build_dir /workspace/build --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES='"'"'60;61;70;75;80;86;90'"'"'' --update --build --use_cuda --cuda_home /usr/local/cuda --use_tensorrt --use_tensorrt_builtin_parser --tensorrt_home /usr/src/tensorrt --allow_running_as_root
2024-11-11 08:33:16,659 build [ERROR] - cuda_home and cudnn_home paths must be specified and valid.
cuda_home='/usr/local/cuda' valid=True. cudnn_home='None' valid=False
Namespace(build_dir='/workspace/build', config=['Release'], update=True, build=True, clean=False, parallel=0, nvcc_threads=-1, test=False, skip_tests=False, compile_no_warning_as_error=False, enable_nvtx_profile=False, enable_memory_profile=False, enable_training=False, enable_training_apis=False, enable_training_ops=False, enable_nccl=False, mpi_home=None, nccl_home=None, use_mpi=False, enable_onnx_tests=False, path_to_protoc_exe=None, fuzz_testing=False, enable_symbolic_shape_infer_tests=False, gen_doc=None, gen_api_doc=False, use_cuda=True, cuda_version=None, cuda_home='/usr/local/cuda', cudnn_home=None, enable_cuda_line_info=False, enable_cuda_nhwc_ops=False, enable_pybind=False, build_wheel=False, wheel_name_suffix=None, skip_keras_test=False, build_csharp=False, build_nuget=False, msbuild_extra_options=None, build_java=False, build_nodejs=False, build_objc=False, build_shared_lib=True, build_apple_framework=False, cmake_extra_defines=[["CMAKE_CUDA_ARCHITECTURES='60;61;70;75;80;86;90'"]], target=None, x86=False, rv64=False, arm=False, arm64=False, arm64ec=False, buildasx=False, riscv_toolchain_root='', riscv_qemu_path='', msvc_toolset=None, windows_sdk_version=None, android=False, android_abi='arm64-v8a', android_api=27, android_sdk_path='', android_ndk_path='', android_cpp_shared=False, android_run_emulator=False, use_gdk=False, gdk_edition='.', gdk_platform='Scarlett', ios=False, visionos=False, macos=None, apple_sysroot='', ios_toolchain_file='', visionos_toolchain_file='', xcode_code_signing_team_id='', xcode_code_signing_identity='', cmake_generator=None, osx_arch='x86_64', apple_deploy_target=None, enable_address_sanitizer=False, use_binskim_compliant_compile_flags=False, disable_memleak_checker=False, build_wasm=False, build_wasm_static_lib=False, emsdk_version='3.1.59', enable_wasm_simd=False, enable_wasm_threads=False, disable_wasm_exception_catching=False, enable_wasm_api_exception_catching=False, enable_wasm_exception_throwing_override=True, wasm_run_tests_in_browser=False, enable_wasm_profiling=False, enable_wasm_debug_info=False, wasm_malloc=None, emscripten_settings=None, use_extensions=False, extensions_overridden_path=None, cmake_path='cmake', ctest_path='ctest', skip_submodule_sync=True, use_mimalloc=False, use_dnnl=False, dnnl_gpu_runtime='', dnnl_opencl_root='', use_openvino=None, dnnl_aarch64_runtime='', dnnl_acl_root='', use_coreml=False, use_webnn=False, use_snpe=False, snpe_root=None, use_nnapi=False, use_vsinpu=False, nnapi_min_api=None, use_jsep=False, use_qnn=False, qnn_home=None, use_rknpu=False, use_preinstalled_eigen=False, eigen_path=None, enable_msinternal=False, llvm_path=None, use_vitisai=False, use_tvm=False, tvm_cuda_runtime=False, use_tvm_hash=False, use_tensorrt=True, use_tensorrt_builtin_parser=True, use_tensorrt_oss_parser=False, tensorrt_home='/usr/src/tensorrt', test_all_timeout='10800', use_migraphx=False, migraphx_home=None, use_full_protobuf=False, llvm_config='', skip_onnx_tests=False, skip_winml_tests=False, skip_nodejs_tests=False, enable_msvc_static_runtime=False, use_dml=False, dml_path='', use_winml=False, winml_root_namespace_override=None, dml_external_project=False, use_telemetry=False, enable_wcos=False, enable_lto=False, enable_transformers_tool_test=False, use_acl=None, acl_home=None, acl_libs=None, use_armnn=False, armnn_relu=False, armnn_bn=False, armnn_home=None, armnn_libs=None, build_micro_benchmarks=False, minimal_build=None, include_ops_by_config=None, enable_reduced_operator_type_support=False, disable_contrib_ops=False, disable_ml_ops=False, disable_rtti=False, disable_types=[], disable_exceptions=False, rocm_version=None, use_rocm=False, rocm_home=None, code_coverage=False, enable_lazy_tensor=False, ms_experimental=False, enable_external_custom_op_schemas=False, external_graph_transformer_path=None, enable_cuda_profiling=False, use_cann=False, cann_home=None, enable_rocm_profiling=False, use_xnnpack=False, use_azure=False, use_cache=False, use_triton_kernel=False, use_lock_free_queue=False, allow_running_as_root=True)
The command '/bin/sh -c ./build.sh ${COMMON_BUILD_ARGS} --update --build --use_cuda --cuda_home "/usr/local/cuda" --use_tensorrt --use_tensorrt_builtin_parser --tensorrt_home "/usr/src/tensorrt" --allow_running_as_root' returned a non-zero code: 1
make[2]: *** [CMakeFiles/ort_target.dir/build.make:74: onnxruntime/lib/libonnxruntime.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:278: CMakeFiles/ort_target.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants