v0.19.0 #4192
Pinned
kaiyux
announced in
Announcements
v0.19.0
#4192
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
TensorRT-LLM Release 0.19.0
Key Features and Enhancements
examples/deepseek_v3/README.md
, also to the blogdocs/source/blogs/Best_perf_practice_on_DeepSeek-R1_in_TensorRT-LLM.md
.PyExecutor
.PeftCacheManager
support.AutoTuner
to both Fused MoE and NVFP4 Linear operators.UserBuffers
allocator.examples/deepseek_v3/README.md
.tensorrt_llm._torch.auto_deploy
.examples/auto_deploy/README.md
for more details.get_stats
support.examples/llm-api/llm_mgmn_*.sh
.examples/multimodal/README.md
.examples/mixtral/README.md
.examples/qwen2audio/README.md
.examples/language_adapter/README.md
.examples/stdit/README.md
.examples/vit/README.md
.examples/exaone/README.md
.examples/gemma/README.md
.examples/mmlu_llmapi.py
.--quantize_lm_head
optionexamples/quantization/quantize.py
to supportlm_head
quantization./metrics
endpoint fortrtllm-serve
to log iteration statistics.trtllm-serve
.disaggServerBenchmark
.trtllm-bench
.fp8_blockscale_gemm
is now open-sourced.ENABLE_MULTI_DEVICE
andENABLE_UCX
as CMake options.PyExecutor
inference flow to estimatemax_num_tokens
forkv_cache_manager
.TLLM_OVERRIDE_LAYER_NUM
andTLLM_TRACE_MODEL_FORWARD
environment variables for debugging.init.py
.API Changes
kc_cache_retention_config
from C++executor
API to the LLM API.BuildConfig
arguments toLlmArgs
.DecoderState
via bindings and integrated it in decoder.LlmArgs
withPydantic
and migrated remaining pybinding configurations to Python.numNodes
toParallelConfig
.Fixed Issues
addCumLogProbs
kernel. Thanks to the contribution from @aotman in Fix Incorrect Batch Slot Usage in addCumLogProbs Kernel #2787.--extra-index-url https://pypi.nvidia.com
when runningpip install tensorrt-llm
.Infrastructure Changes
Known Issues
This discussion was created from the release v0.19.0.
Beta Was this translation helpful? Give feedback.
All reactions