Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Llama3.1-8B benchmark with disabled collective matmul #1317

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bvandermoon
Copy link
Collaborator

Description

Add a new Trillium benchmark that runs Llama3.1 with collective matmuls disabled. I ran this on v6e-8 and saw an improvement from ~350 TFLOP/s/device to ~410-420 TFLOP/s/device after this change.

This change is needed to support adding a reproducible recipe for v6e-8.

Tests

Ran this benchmark on v6e-8 using the following command:

python3 benchmarks/benchmark_runner.py xpk \
    --project=$PROJECT \
    --zone=$ZONE \
    --device_type=v6e-8 \
    --num_slices=1  \
    --cluster_name=${CLUSTER_NAME} \
    --base_output_directory=${OUTPUT_DIR} \
    --model_name="llama3_1_8b_8192_no_collective_matmul" \
    --libtpu_version=20241209 \
    --base_docker_image=maxtext_base_image

I got the perf described above. Also confirmed in the profile that the previous collective matmuls in the MLP layer are now gone:

image

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

Copy link

Important

The terms of service for this installation has not been accepted. Please ask the Organization owners to visit the Gemini Code Assist Admin Console to sign it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant