This is a CodeLlama-34B-GPTQ starter template from Banana.dev that allows on-demand serverless GPU inference.
You can fork this repository and deploy it on Banana as is, or customize it based on your own needs.
- Fork this repository to your own Github account.
- Connect your Github account on Banana.
- Create a new model on Banana from the forked Github repository.
- Wait for the model to build after creating it.
- Make an API request to it using one of the provided snippets in your Banana dashboard.
For more info, check out the Banana.dev docs.