Skip to content

Add subgroups feature support #1217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
beaufortfrancois opened this issue Mar 3, 2025 · 5 comments
Open

Add subgroups feature support #1217

beaufortfrancois opened this issue Mar 3, 2025 · 5 comments
Labels
enhancement New feature or request

Comments

@beaufortfrancois
Copy link

Feature request

As reshared by @FL33TW00D at https://x.com/fleetwood___/status/1894754562210165029, having subgroups support in transformers.js would be huge for performance.

I'm filing this feature request to engage conversation and discuss how this can be achieved now that WebGPU subgroups have shipped in Chrome 134: https://developer.chrome.com/blog/new-in-webgpu-134#improve_machine-learning_workloads_with_subgroups

Note that some work has been started in Apache TVM as well in apache/tvm#17699

Motivation

Performance, performance, and performance.

Your contribution

I'd be happy to help answering questions about how subgroups are implemented in Chromium.

@beaufortfrancois beaufortfrancois added the enhancement New feature or request label Mar 3, 2025
@xenova
Copy link
Collaborator

xenova commented Mar 3, 2025

Exciting! 🚀 Let me loop in @guschmue to the discussion to see where we can add support for this 💪

@beaufortfrancois
Copy link
Author

@guschmue @xenova FWIW Apache TVM is currently adding support for subgroupShuffle(), subgroupShuffleUp(), and subgroupShuffleDown().

@beaufortfrancois
Copy link
Author

@guschmue Did you have a chance to figure out where it makes sense to add WebGPU subgroups support to Transformers.js?

@xenova
Copy link
Collaborator

xenova commented Mar 24, 2025

@guschmue Another consideration is the eventual switch over to the native WebGPU EP - perhaps we can align efforts on that front?

Also, cc @FL33TW00D it could be great to integrate your work on optimizing LayerNorm w/ subgroups (https://fleetwood.dev/posts/layernorm-as-fast-as-possible) here. 👀 What do you think?

@beaufortfrancois
Copy link
Author

FYI According to microsoft/onnxruntime@8eb5513, ONNX runtime see a 3x perf increase on Metal with subgroup matrices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants