The inference performance of the sample code is very poor #11

kebe7jun · 2025-01-16T13:54:34Z

Very cool model!

I used 8*Nvidia H20 to do inference testing based on the sample code, and found that the performance was only about 1 tokens/s, and the GPU could not be fully used. Is this in line with expectations?
Currently, VLLM and SGLang are not adapted to MiniMax. Are there any other inference engines you would recommend?

MiniMax-AI-Dev · 2025-01-17T02:48:02Z

Thank you for your feedback. We are currently planning to support our models on open-source inference frameworks. At the same time, we also welcome community developers to join us in advancing the support for our models in open-source inference engines.

terryaic · 2025-01-18T01:22:33Z

The same thing happens to 8*A800 too. less than 1T/s...

kanebay · 2025-01-23T09:43:19Z

The same issue. On the H100 * 8 server, the fixed input token count is 2048 and the output token count is 240, When batch_size is 1, the time consumption reaches 620 seconds; When batch_size is 2, the time consumption reaches 765 seconds. It's too slow! Which open-source inference frameworks will be supported?

ZZBoom · 2025-02-19T11:30:35Z

We have submitted a 13454 to vllm, and the performance improvement compared to the Hugging Face's implementation is very significant. You might want to give it a try.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The inference performance of the sample code is very poor #11

The inference performance of the sample code is very poor #11

kebe7jun commented Jan 16, 2025 •

edited

Loading

MiniMax-AI-Dev commented Jan 17, 2025

terryaic commented Jan 18, 2025

kanebay commented Jan 23, 2025

ZZBoom commented Feb 19, 2025

The inference performance of the sample code is very poor #11

The inference performance of the sample code is very poor #11

Comments

kebe7jun commented Jan 16, 2025 • edited Loading

MiniMax-AI-Dev commented Jan 17, 2025

terryaic commented Jan 18, 2025

kanebay commented Jan 23, 2025

ZZBoom commented Feb 19, 2025

kebe7jun commented Jan 16, 2025 •

edited

Loading