Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan: subgroup size tuning #12087

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

daniandtheweb
Copy link
Contributor

This PR is a continuation of the tests done in #11826 about the subgroup size in vulkan and its effects in performance especially on RDNA cards. For now it includes a specific configuration that improves RDNA1 performance in almost all operations.

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 26, 2025
@0cc4m
Copy link
Collaborator

0cc4m commented Mar 5, 2025

I haven't forgotten about this, I was just working on the int8 matmul. I'll get back to you here, soon.

@0cc4m
Copy link
Collaborator

0cc4m commented Mar 8, 2025

I tried doing device architecture detection in a better way in daniandtheweb#1, let me know what you think.

@daniandtheweb daniandtheweb force-pushed the rdna-subgroup-size branch 3 times, most recently from c22e0b4 to 7176d04 Compare March 8, 2025 21:01
@0cc4m
Copy link
Collaborator

0cc4m commented Mar 11, 2025

I'll give this another try on RDNA2 soon, I expect that you can add RDNA2 to the RDNA1 logic. Once that is done I think we can merge this.

Can you resolve the merge conflict?

@daniandtheweb
Copy link
Contributor Author

Sure, I'll do it later once I'm back home.

@daniandtheweb daniandtheweb marked this pull request as ready for review March 11, 2025 13:52
@daniandtheweb daniandtheweb changed the title vulkan: subgroup size test vulkan: subgroup size tuning Mar 11, 2025
@daniandtheweb
Copy link
Contributor Author

daniandtheweb commented Mar 11, 2025

I was testing the change on RDNA3 on stable-diffusion.cpp and it seems like using subgroup size 32 on this architecture completely breaks image generation resulting in black images. Do you have any clue what could be causing this? Text generation works properly using wave 32 and there's even a small performance uplift in prompt processing (1164 master / 1238 subgroup 32).

@0cc4m
Copy link
Collaborator

0cc4m commented Mar 11, 2025

I don't know what it is, but to figure out which op is failing I implemented the GGML_VULKAN_CHECK_RESULTS thing. Try to compile with that, it should tell you which result is wrong.

@daniandtheweb
Copy link
Contributor Author

daniandtheweb commented Mar 11, 2025

For some reason all mul_mat and mul_mat_id fail on RDNA3 when using a subgroup size of 32 (I was in a rush before so I wasn't able to run the test-backend-ops) so I'll just avoid doing changes on this arch for now until I find a proper tuning for this card.

@0cc4m
Copy link
Collaborator

0cc4m commented Mar 11, 2025

For some reason all mul_mat and mul_mat_id fail on RDNA3 when using a subgroup size of 32 (I was in a rush before so I wasn't able to run the test-backend-ops) so I'll just avoid doing changes on this arch for now until I find a proper tuning for this card.

That's a coopmat thing, I think. If you ignore coopmat shaders it's probably fine.

Edit: Maybe disable setting the required subgroup size if require_full_subgroups is set, that would cover the coopmat stuff.

@daniandtheweb
Copy link
Contributor Author

daniandtheweb commented Mar 11, 2025

Thanks for the advise, I still haven't looked at coopmat shaders so I didn't know how they would interact with these changes. With the new changes the tests pass and performance on RDNA3 is slightly better in prompt processing (1164 master / 1220 pr). Stable diffusion seems to get a minor improvement.

For now I think it's safe to keep the same values as the ones on RDNA1 as the card seems to behave similarly when it comes to subgroup size.

@0cc4m
Copy link
Collaborator

0cc4m commented Mar 12, 2025

I checked and you can add RDNA2 to the list. It seems to prefer the same shader sizes. Afterwards we can merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants