-
Notifications
You must be signed in to change notification settings - Fork 54
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Profiling][Model][Doc] Support Llama3-8B and 70B on A100s (#22)
* Merged PR 1873: Support Llama3 8B and 70B for 32k context length on a100_pairwise_nvlink # Changelog * Support Llama3 8B and 70B https://llama.meta.com/llama3/ * Max supported context length is 32k, only on 4xA100. * Pipeline parallel is not profiled yet for more than 4k. * Attention profiling enhancements: ** Reduce number of input combinations by removing those batches which require more kv cache blocks than available GPU memory. * Fix llama3-8B and 70B profiling data * Bring documentation files to top-level docs/ folder * Add llama3-70b attention profiling data * format * minor
- Loading branch information
1 parent
2e72d7a
commit 2bb7a08
Showing
22 changed files
with
150,223 additions
and
21,140 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
num_layers: 80 | ||
num_q_heads: 64 | ||
num_kv_heads: 8 | ||
embedding_dim: 8192 | ||
mlp_hidden_dim: 28672 | ||
max_position_embeddings: 8192 | ||
use_gated_mlp: true | ||
use_bias: false | ||
use_qkv_bias: false | ||
activation: silu | ||
norm: rms_norm | ||
post_attn_norm: true | ||
rope_theta: 500000.0 | ||
rope_scaling: null | ||
vocab_size: 128256 | ||
is_neox_style: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
num_layers: 32 | ||
num_q_heads: 32 | ||
num_kv_heads: 8 | ||
embedding_dim: 4096 | ||
mlp_hidden_dim: 14336 | ||
max_position_embeddings: 4096 | ||
use_gated_mlp: true | ||
use_bias: false | ||
use_qkv_bias: false | ||
activation: silu | ||
norm: rms_norm | ||
post_attn_norm: true | ||
rope_theta: 500000.0 | ||
rope_scaling: null | ||
vocab_size: 128256 | ||
is_neox_style: true |
Oops, something went wrong.