Llama 2 Resources

Free playgrounds

70B-chat by Yuvraj at Hugging Face: https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI
13B-chat by Hugging Face: https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat
7B-chat by Hugging Face: https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat
7B-chat, 13B-chat and 70B-chat by a16z: https://llama2.ai
13B-chat by Pietro: https://llama-2.replit.app

Running it yourself on a cloud GPU

70B GPTQ version required 35-40 GB VRAM. Follow this guide

Hosted APIs

70B chat: https://replicate.com/replicate/llama70b-v2-chat
13B chat: https://replicate.com/a16z-infra/llama13b-v2-chat

Best model versions to use

For minimal VRAM:
- 7B GGML versions:
  - Chat-tuned model: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML
  - Base model: https://huggingface.co/TheBloke/Llama-2-7B-GGML
- 13B GGML versions:
  - Chat-tuned model: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML
  - Base model: https://huggingface.co/TheBloke/Llama-2-13B-GGML
For large VRAM:
- 70B GPTQ versions:
  - Chat-tuned version: https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ
  - Base model: https://huggingface.co/TheBloke/Llama-2-70B-GPTQ

See more on GPTQ vs GGML versions of models here.

What about 34B?

It's coming soon. They wanted more time to red-team it.

Chat vs Base

The base models are uncensored, and are not instruct-tuned or chat-tuned.

The chat models are censored, and have been chat-tuned.

Are the base models instruct-tuned?

No.¹ "These models are not finetuned for chat or Q&A. They should be prompted so that the expected answer is the natural continuation of the prompt."

How can I prompt the base models?

Create a fake document so that if the model naturally continues what would be expected next in the document, you get the result you want. For example, instead of "What is the capital of France?" you'd write "The capital of France is "

See also: https://youtu.be/bZQun8Y4L2A?t=555

Are they censored?

Base is uncensored, chat is censored.

Prompt template

<s>[INST] <<SYS>>
{your_system_message}
<</SYS>>

{user_message_1} [/INST]

and

<s>[INST] <<SYS>>
{your_system_message}
<</SYS>>

{user_message_1} [/INST] {model_reply_1}</s><s>[INST] {user_message_2} [/INST]

See here.

Setting up an API endpoint

Hugging Face
Docker/Runpod - see here but use this runpod template instead of the one linked in that post

What will some popular uses of Llama 2 be?

Devs playing around with it
Uses that GPT doesn't allow but are legal (for example, NSFW content)
Enterprises using it as an alternative to GPT-3.5 if they can get it to be cheaper overall
Enterprises using it as an alternative to GPT-4 if they can fine-tune it for a specific use case and get comparable performance

Fine-tuning

Fine-tuning as a service for enterprises

Databricks (Patrick, co-founder, and Xiangrui, ML Lead) incl Mosaic ML
Lamini.ai (Lamini is a Llama inspired name. Llamas, Alpacas, Vicuñas and Guanacos are all subsets of the Lamini tribe.)
Radiant AI
Scale AI
Snowflake

Running locally

https://github.com/facebookresearch/llama#pretrained-models ↩

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama 2 Resources

Free playgrounds

Running it yourself on a cloud GPU

Hosted APIs

Best model versions to use

What about 34B?

Chat vs Base

Are the base models instruct-tuned?

How can I prompt the base models?

Are they censored?

Prompt template

Setting up an API endpoint

What will some popular uses of Llama 2 be?

Fine-tuning

Fine-tuning as a service for enterprises

Running locally

About

Releases

Packages

TikkunCreation/llama-2-resources

Folders and files

Latest commit

History

Repository files navigation

Llama 2 Resources

Free playgrounds

Running it yourself on a cloud GPU

Hosted APIs

Best model versions to use

What about 34B?

Chat vs Base

Are the base models instruct-tuned?

How can I prompt the base models?

Are they censored?

Prompt template

Setting up an API endpoint

What will some popular uses of Llama 2 be?

Fine-tuning

Fine-tuning as a service for enterprises

Running locally

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages