Triton Inference Server HugeCTR Backend

HugeCTR Backend is a GPU-accelerated recommender model deploy framework that was designed to effectively use GPU memory to accelerate the inference by decoupling the parameter server, embedding cache and model weight. HugeCTR Backend supports concurrent model inference execution across multiple GPUs, embedding cache sharing between multiple model instances. For additional information, see HugeCTR Inference User Guide.

Quick Start

Installing and Building HugeCTR Backend

You can either install Hugectr backend easily using the Hugectr backend Docker image in NGC, or build Hugectr backend from scratch based on your own specific requirement if you're an advanced user. We support the following compute capabilities for inference deployment:

Compute Capability	GPU
70	NVIDIA V100 (Volta)
75	NVIDIA T4 (Turing)
80	NVIDIA A100 (Ampere)

The following prerequisites must be met before installing or building HugeCTR from scratch:

Docker version 19 and higher
cuBLAS version 10.1
CMake version 3.17.0
cuDNN version 7.5
RMM version 0.16
GCC version 7.4.0

Installing HugeCTR Inference Server from NGC Containers

All NVIDIA Merlin components are available as open-source projects. However, a more convenient way to make use of these components is by using Merlin NGC containers. Containers allow you to package your software application, libraries, dependencies, and runtime compilers in a self-contained environment. When installing hugectr backend from NGC containers, the application environment remains both portable, consistent, reproducible, and agnostic to the underlying host system software configuration.

Hugectr backend docker images are available in the NVIDIA container repository on https://ngc.nvidia.com/catalog/containers/nvidia:hugectr.

You can pull and launch the container by running the following command:

docker run --runtime=nvidia --rm -it nvcr.io/nvidia/hugectr:v3.0-inference  # Start interaction mode

Building HugeCTR from Scratch

Since the Hugectr backend building is based on Hugectr installation, the first step is to compile hugectr, generate a shared library(libhugectr_inference.so), and install it in the specified folder correctly. The default path of all the HugeCTR libraries and header files are installed in /usr/local/hugectr folder. Before building HugeCTR from scratch, you should download the HugeCTR repository and the third-party modules that it relies on by running the following commands:

git clone https://github.com/NVIDIA/HugeCTR.git
cd HugeCTR
git submodule update --init --recursive

You can build HugeCTR from scratch using the following options:

CMAKE_BUILD_TYPE: You can use this option to build HugeCTR with Debug or Release. When using Debug to build, HugeCTR will print more verbose logs and execute GPU tasks in a synchronous manner.
ENABLE_INFERENCE: You can use this option to build HugeCTR in inference mode, which was designed for inference framework. In this mode,inference shared library will be built for hugectr backend. Only inference related interfaces could be used, which means users can’t train models in this mode. This option is set to OFF by default.

Here is the example of how you can build HugeCTR using these build options:

$ mkdir -p build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_INFERENCE=ON .. 
$ make -j
$ make install

Building HugeCTR Backend from Scratch

Before building HugeCTR backend from scratch, you should download the HugeCTR backend repository by running the following commands:

git https://github.com/triton-inference-server/hugectr_backend.git
cd hugectr_backend

Use cmake to build and install in a specified folder. Please remember to specify the absolute path of the local directory that installs the HugeCTR backend for “--backend-directory” argument when launching the Triton Server.

$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install ..
$ make install

The following required Triton repositories will be pulled and used in the build. By default the "main" branch/tag will be used for each repo but the listed CMake argument can be used to override.

triton-inference-server/backend: -DTRITON_BACKEND_REPO_TAG=[tag]
triton-inference-server/core: -DTRITON_CORE_REPO_TAG=[tag]
triton-inference-server/common: -DTRITON_COMMON_REPO_TAG=[tag]

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
cmake		cmake
docs		docs
samples		samples
src		src
test		test
tools		tools
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
inference.DockerFile		inference.DockerFile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Triton Inference Server HugeCTR Backend

Quick Start

Installing and Building HugeCTR Backend

Installing HugeCTR Inference Server from NGC Containers

Building HugeCTR from Scratch

Building HugeCTR Backend from Scratch

About

Releases

Packages

Contributors 4

Languages

License

yingcanw/hugectr_backend

Folders and files

Latest commit

History

Repository files navigation

Triton Inference Server HugeCTR Backend

Quick Start

Installing and Building HugeCTR Backend

Installing HugeCTR Inference Server from NGC Containers

Building HugeCTR from Scratch

Building HugeCTR Backend from Scratch

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages