Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader #12312

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jeffbolznv
Copy link
Collaborator

This is stacked on #12309. The goal is to address cases where the relatively large tile size used in the coopmat2 shaders leads to a lot of wasted loads and calculations. For example, where N is 129 and the tile size is 128 it's wasteful to use 128 for the second row of tiles, and with this PR those workgroups will use 32 instead.

Results for Phi-3-mini-4k-instruct-q4.gguf on RTX 4070:

		PR12309		this PR		speedup
					
        pp5	463.51		461.03		-0.54%
       pp10	561.56		544		-3.13%
       pp20	1105.62		1108.14		0.23%
       pp31	1664.51		1692.87		1.70%
       pp32	1837.95		1789.53		-2.63%
       pp33	1472.52		1460.1		-0.84%
       pp48	2136.36		2195.93		2.79%
       pp54	2398.81		2323.3		-3.15%
       pp63	2733.02		2781.24		1.76%
       pp64	2725.61		2790.49		2.38%
       pp65	2326.46		2308.42		-0.78%
       pp80	2836.59		2747.47		-3.14%
       pp96	3229.14		3279.18		1.55%
      pp112	3722.96		3753.71		0.83%
      pp113	3736.39		3724.57		-0.32%
      pp127	4207.86		4251.1		1.03%
      pp128	4269.83		4272.33		0.06%
      pp129	2848.4		3471.71		21.88%
      pp140	3102.99		3748.07		20.79%
      pp160	3497.35		4114.33		17.64%
      pp180	3913.92		4299.83		9.86%
      pp192	4178.85		4637.4		10.97%
      pp200	4301.45		4176.58		-2.90%
      pp210	4393.56		4361.11		-0.74%
      pp230	4693.42		4851.5		3.37%
      pp248	5352.42		5103.4		-4.65%
      pp255	5221.56		5207		-0.28%
      pp256	5210.66		5202.13		-0.16%
      pp257	3894.24		4406.52		13.15%
      pp280	4192.77		4782.45		14.06%
      pp300	4462.49		4820.17		8.02%
      pp320	4779.86		5082.74		6.34%
      pp350	5035.38		5040.96		0.11%
      pp384	5423.55		5611.65		3.47%
      pp410	4722.13		5182.3		9.74%
      pp448	5059.5		5471.17		8.14%
      pp480	5248.98		5280.59		0.60%
      pp490	5370.3		5412.99		0.79%
      pp511	5572.95		5555.42		-0.31%
      pp512	5547.35		5571.47		0.43%
      pp513	5081.44		5104.33		0.45%
      pp767	5422.29		5359.22		-1.16%
      pp768	5299.44		5411.12		2.11%
      pp769	4780.83		5189.31		8.54%
     pp1023	5482.69		5537.44		1.00%
     pp1024	5537.79		5542.94		0.09%
     pp1025	5230.2		5257.3		0.52%
     pp2047	5308.24		5304.31		-0.07%
     pp2048	5272.6		5284.38		0.22%
     pp2049	5123.08		5103.8		-0.38%

@jeffbolznv jeffbolznv requested a review from 0cc4m March 10, 2025 16:49
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant