Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking #12273

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jeffbolznv
Copy link
Collaborator

@jeffbolznv jeffbolznv commented Mar 8, 2025

This is stacked on #12258 and reserves padding space for the B matrix in the N dimension, so the coopmat2 shader can avoid bounds checking and take the fast path.

Results for Phi-3-mini-4k-instruct-q4.gguf on RTX 4070:

		master		PR12258		PR12273		speedup vs PR12258
							
        pp5	447.15		462.92		466.92		0.86%
       pp10	274.87		494.28		517.04		4.60%
       pp20	564.73		991.1		1045.43		5.48%
       pp31	863.29		1418.82		1730.83		21.99%
       pp32	894.75		1680.77		1745.39		3.84%
       pp33	909.48		1293.95		1512.23		16.87%
       pp48	1347.26		1766.27		2249.4		27.35%
       pp54	1465.89		2030.28		2318.44		14.19%
       pp63	1689.89		2405.95		2694.4		11.99%
       pp64	1702.93		2910.49		2764.43		-5.02%
       pp65	1741.64		1903.68		2260.9		18.76%
       pp80	2039.35		2288.33		2835.27		23.90%
       pp96	2397.05		2722.58		3189.51		17.15%
      pp112	2732.28		3245.6		3632.44		11.92%
      pp113	2787.73		3057.98		3656.96		19.59%
      pp127	3064.93		3449.08		4087.5		18.51%
      pp128	3817.62		4258.3		4174.96		-1.96%
      pp129	2246.45		2516.47		2734.41		8.66%
      pp140	2357.01		2710.37		2907.37		7.27%
      pp160	2606.54		2998.89		3356.15		11.91%
      pp180	2891.38		3292.31		3639.65		10.55%
      pp192	3036.42		3622.36		3857.68		6.50%
      pp200	3106.62		3610.68		3979.56		10.22%
      pp210	3245.74		3771.82		4258.49		12.90%
      pp230	3493.62		4152.18		4589.22		10.53%
      pp248	3678.74		4357.96		4761.21		9.25%
      pp255	3719.19		4338.61		5024.18		15.80%
      pp256	4818.48		5035.21		4919.83		-2.29%
      pp257	2898.17		3605.43		3788.88		5.09%
      pp280	3143.84		3830.24		4150.83		8.37%
      pp300	3300.25		4080.59		4432.56		8.63%
      pp320	3498.11		4217.64		4545.71		7.78%
      pp350	3672.25		4548.51		4877.21		7.23%
      pp384	4251.46		5316.02		5189.39		-2.38%
      pp410	3478.13		4287.22		4580.83		6.85%
      pp448	3741.12		4645.57		4945.34		6.45%
      pp480	3886.56		4830.78		5208.15		7.81%
      pp490	3980.64		4909.99		5185.52		5.61%
      pp511	4051.34		5030.26		5365		6.65%
      pp512	5434.86		5435.16		5398.73		-0.67%
      pp513	4935.44		5077.35		4958.86		-2.33%
      pp767	4633.69		4985.33		5141.08		3.12%
      pp768	5171.77		5251.81		5175.3		-1.46%
      pp769	4143.96		4575.35		4685.68		2.41%
     pp1023	4575.58		5227.51		5300.75		1.40%
     pp1024	5324.02		5318.13		5294.66		-0.44%
     pp1025	5119.71		5118.21		5023.31		-1.85%
     pp2047	4748.43		5093.44		5117.1		0.46%
     pp2048	5152.48		5185.36		5116.32		-1.33%
     pp2049	5045.99		5021.26		5016.38		-0.10%

@jeffbolznv jeffbolznv requested a review from 0cc4m March 8, 2025 17:41
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant