triton-inference-server / server Public

Notifications
Fork 1.5k
Star 8.9k

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/server

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

649 Open 3,241 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Versioning for ensemble models and/or confg.pbtxt files

#8056 opened Mar 5, 2025 by ghicks-novaprime

InferenceServerException: [408] an exception occurred in the client while decoding the response: Parse error at offset 0: Invalid value.

#8051 opened Mar 4, 2025 by TopAgrume

Bazel support and tag release for individual repos

#8049 opened Mar 4, 2025 by arpit15

Function calling

#8048 opened Mar 3, 2025 by thehumit

Significant performance degradation when using OpenAI Frontend + streaming

#8045 opened Feb 28, 2025 by jolyons123

How to Send FP16 Input Tensors Using gRPC in C# for NVIDIA Triton Inference Server?

#8044 opened Feb 28, 2025 by Madihaa-Shaikh

Python Backend GPU Tensor Support on Windows - A Must-Have!

#8041 opened Feb 27, 2025 by mhbassel

Multibyte UTF-8 Characters Broken in Streaming Mode (� Substitution)

#8039 opened Feb 27, 2025 by Nurgl

[Question] Triton Inference server vLLM backend vs vLLM serve

#8036 opened Feb 26, 2025 by pradghos

Segment fault crash due to race condition of request cancellation (with fix proposal) bug

Something isn't working

#8034 opened Feb 25, 2025 by lunwang-ttd

Triton llm openai langgraph toolcall

#8033 opened Feb 25, 2025 by GGN1994

Python backend without GIL

#8032 opened Feb 25, 2025 by zeruniverse

Request Cancellation

#8030 opened Feb 24, 2025 by MichalPogodski

Infinite pending status from 3 days after launching server

#8028 opened Feb 24, 2025 by nbowon

leak memory memory

Related to memory usage, memory growth, and memory leaks

#8026 opened Feb 21, 2025 by aTunass

How can I construct a pb_utils.Tensor without using numpy?

#8022 opened Feb 19, 2025 by fighterhit

Streaming support on Infer endpoint when DECOUPLED mode is true module: frontends

Issues related to the triton frontends

question

Further information is requested

#8021 opened Feb 19, 2025 by adityarap

Inconsistent HF token requirements for cached gated models: Triton vs vLLM deployments

#8020 opened Feb 19, 2025 by haka-qylis

"output tensor shape does not match size of output" when using python backend and providing a custom environment

#8019 opened Feb 19, 2025 by Isuxiz

why triton server used so many thread in same triton proc?

#8017 opened Feb 18, 2025 by soulseen

Performance Discrepancy Between NVIDIA Triton and Direct Faster-Whisper Inference

#8016 opened Feb 18, 2025 by YuBeomGon

Unable to load model from S3 bucket

#8008 opened Feb 12, 2025 by jmlaubach

Got run time error 0 active drivers ([]). There should only be one. when using PipelineModule through ray and deepspeed

#8007 opened Feb 12, 2025 by consciousgaze

Python Backend support implicit state management for Sequence Inference

#8006 opened Feb 12, 2025 by zhuichao001

ONNX Model IR Version 10 Support

#8001 opened Feb 11, 2025 by RohanAdwankar

Previous 1 2 3 4 5 … 25 26 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly