Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The inference performance of the sample code is very poor #11

Open
kebe7jun opened this issue Jan 16, 2025 · 4 comments
Open

The inference performance of the sample code is very poor #11

kebe7jun opened this issue Jan 16, 2025 · 4 comments

Comments

@kebe7jun
Copy link

kebe7jun commented Jan 16, 2025

Very cool model!

I used 8*Nvidia H20 to do inference testing based on the sample code, and found that the performance was only about 1 tokens/s, and the GPU could not be fully used. Is this in line with expectations?
Currently, VLLM and SGLang are not adapted to MiniMax. Are there any other inference engines you would recommend?

@MiniMax-AI-Dev
Copy link
Contributor

Thank you for your feedback. We are currently planning to support our models on open-source inference frameworks. At the same time, we also welcome community developers to join us in advancing the support for our models in open-source inference engines.

@terryaic
Copy link

The same thing happens to 8*A800 too. less than 1T/s...

@kanebay
Copy link

kanebay commented Jan 23, 2025

The same issue. On the H100 * 8 server, the fixed input token count is 2048 and the output token count is 240, When batch_size is 1, the time consumption reaches 620 seconds; When batch_size is 2, the time consumption reaches 765 seconds. It's too slow! Which open-source inference frameworks will be supported?

@ZZBoom
Copy link
Collaborator

ZZBoom commented Feb 19, 2025

We have submitted a 13454 to vllm, and the performance improvement compared to the Hugging Face's implementation is very significant. You might want to give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants