[FA] Unify Base + Opt FWD kernels #233

codingwithsurya · 2025-05-27T21:43:50Z

Summary:

I merged the "base" (_attn_fwd) and "opt" (_attn_fwd_opt) variants of the forward pass of the attention kernel into a single _attn_fwd_unified kernel to streamline the codebase and avoid redundancy. Both the base and opt kernels were great candidates to be merged since they were nearly identical code paths.

This PR also includes diffs from #232 due to Github Export from Phabricator. #232 should be merged before this.

Test Plan:
Unit Tests and Benchmarking

python -m unittest test/test_gpu/main.py -k test_gpu_tritonbench_flash_attention

python run.py --op flash_attention --only triton_tutorial_flash_v2 --batch 4 --seq-len 16384 --n-heads 32 --d-head 64 --precision fp16 --causal --metrics flops

python run.py --op flash_attention --only triton_tutorial_flash_v2_opt --batch 4 --seq-len 16384 --n-heads 32 --d-head 64 --precision fp16 --causal --metrics flops

Differential Revision: D75388323

facebook-github-bot · 2025-05-27T21:44:09Z

This pull request was exported from Phabricator. Differential Revision: D75388323

Summary: This PR consolidates redundant TMA attention kernels into a unified implementation. Previously, `_attn_fwd_tma` and `_attn_fwd_tma_ws` contained duplicate code (mainly the TMA descriptors) and didn't leverage the existing `ENABLE_WS` flag. I've merged the redundant kernels into a single `_attn_fwd_tma_unified` kernel. We now use the `ENABLE_WS` flag to toggle between regular and warp-specialized execution. Changes: * Merged both kernels into `_attn_fwd_tma_unified` kernel with handling of regular and warp-specialized paths * Utilized existing `ENABLE_WS` parameter to control warp specialization * Unified TMA descriptor creation logic Differential Revision: D75307125

Summary: Separated the TMA kernel variant handling into distinct code paths rather than using a conditional parameter. Changed from a unified approach with a dynamic `is_warp_specialized` flag to explicit separate conditions for `tma` and `tma_ws` variants. This improves code clarity by making the execution path more explicit + makes it easier for compiler to optimize. Differential Revision: D75308966

Summary: I merged the "base" (`_attn_fwd`) and "opt" (`_attn_fwd_opt`) variants of the forward pass of the attention kernel into a single `_attn_fwd_unified` kernel to streamline the codebase and avoid redundancy. Both the base and opt kernels were great candidates to be merged since they were nearly identical code paths. I still need to integrate the warp spec path (`attn_fwd_ws`) after some bug-fixing. For now it's still available as its own autotuned kernel. Differential Revision: D75388323

facebook-github-bot · 2025-05-27T21:47:17Z

This pull request was exported from Phabricator. Differential Revision: D75388323

codingwithsurya had a problem deploying to docker-s3-upload May 27, 2025 21:43 — with GitHub Actions Error

facebook-github-bot added the cla signed label May 27, 2025

facebook-github-bot added the fb-exported label May 27, 2025

codingwithsurya added 3 commits May 27, 2025 14:46

facebook-github-bot force-pushed the export-D75388323 branch from 0c93d76 to 5566672 Compare May 27, 2025 21:46

facebook-github-bot had a problem deploying to docker-s3-upload May 27, 2025 21:46 — with GitHub Actions Error

codingwithsurya changed the title ~~Unify Base and Opt Attention Kernels~~ [FA] Unify TMA attention kernels with Warp Spec flag and consolidate base/opt kernels May 27, 2025

codingwithsurya requested review from manman-ren and mandroid6 May 27, 2025 22:00

codingwithsurya mentioned this pull request May 27, 2025

[FA] Unifying TMA Kernels with Warp Specialization Flag #232

Open

codingwithsurya changed the title ~~[FA] Unify TMA attention kernels with Warp Spec flag and consolidate base/opt kernels~~ [FA] Unify TMA attention kernels with Warp Spec flag and Consolidate Base/Opt kernels May 27, 2025

codingwithsurya changed the title ~~[FA] Unify TMA attention kernels with Warp Spec flag and Consolidate Base/Opt kernels~~ [FA] Unify Base + Opt FWD kernels May 27, 2025

codingwithsurya self-assigned this May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FA] Unify Base + Opt FWD kernels #233

[FA] Unify Base + Opt FWD kernels #233

Uh oh!

codingwithsurya commented May 27, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented May 27, 2025

Uh oh!

facebook-github-bot commented May 27, 2025

Uh oh!

Uh oh!

[FA] Unify Base + Opt FWD kernels #233

Are you sure you want to change the base?

[FA] Unify Base + Opt FWD kernels #233

Uh oh!

Conversation

codingwithsurya commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented May 27, 2025

Uh oh!

facebook-github-bot commented May 27, 2025

Uh oh!

Uh oh!

codingwithsurya commented May 27, 2025 •

edited

Loading