vulkan: use fp32 in coopmat2 q4_k dequant function #12309

jeffbolznv · 2025-03-10T13:45:08Z

This is stacked on #12273 and uses fp32 in dequantFuncQ4_K. It's kind of a toss up whether fp16 of fp32 will be faster, and this was the only function I found where it's currently faster.

Results for Phi-3-mini-4k-instruct-q4.gguf on RTX 4070:

		PR12273		this PR		speedup
					
        pp5	466.92		463.51		-0.73%
       pp10	517.04		561.56		8.61%
       pp20	1045.43		1105.62		5.76%
       pp31	1730.83		1664.51		-3.83%
       pp32	1745.39		1837.95		5.30%
       pp33	1512.23		1472.52		-2.63%
       pp48	2249.4		2136.36		-5.03%
       pp54	2318.44		2398.81		3.47%
       pp63	2694.4		2733.02		1.43%
       pp64	2764.43		2725.61		-1.40%
       pp65	2260.9		2326.46		2.90%
       pp80	2835.27		2836.59		0.05%
       pp96	3189.51		3229.14		1.24%
      pp112	3632.44		3722.96		2.49%
      pp113	3656.96		3736.39		2.17%
      pp127	4087.5		4207.86		2.94%
      pp128	4174.96		4269.83		2.27%
      pp129	2734.41		2848.4		4.17%
      pp140	2907.37		3102.99		6.73%
      pp160	3356.15		3497.35		4.21%
      pp180	3639.65		3913.92		7.54%
      pp192	3857.68		4178.85		8.33%
      pp200	3979.56		4301.45		8.09%
      pp210	4258.49		4393.56		3.17%
      pp230	4589.22		4693.42		2.27%
      pp248	4761.21		5352.42		12.42%
      pp255	5024.18		5221.56		3.93%
      pp256	4919.83		5210.66		5.91%
      pp257	3788.88		3894.24		2.78%
      pp280	4150.83		4192.77		1.01%
      pp300	4432.56		4462.49		0.68%
      pp320	4545.71		4779.86		5.15%
      pp350	4877.21		5035.38		3.24%
      pp384	5189.39		5423.55		4.51%
      pp410	4580.83		4722.13		3.08%
      pp448	4945.34		5059.5		2.31%
      pp480	5208.15		5248.98		0.78%
      pp490	5185.52		5370.3		3.56%
      pp511	5365		5572.95		3.88%
      pp512	5398.73		5547.35		2.75%
      pp513	4958.86		5081.44		2.47%
      pp767	5141.08		5422.29		5.47%
      pp768	5175.3		5299.44		2.40%
      pp769	4685.68		4780.83		2.03%
     pp1023	5300.75		5482.69		3.43%
     pp1024	5294.66		5537.79		4.59%
     pp1025	5023.31		5230.2		4.12%
     pp2047	5117.1		5308.24		3.74%
     pp2048	5116.32		5272.6		3.05%
     pp2049	5016.38		5123.08		2.13%

…s checking

jeffbolznv requested a review from 0cc4m March 10, 2025 13:45

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 10, 2025

jeffbolznv mentioned this pull request Mar 10, 2025

vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader #12312

Open

jeffbolznv added 3 commits March 11, 2025 09:36

vulkan: Adjust coopmat2 tile sizes and selection heuristic

1577cfd

vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bound…

c5f2920

…s checking

vulkan: use fp32 in coopmat2 q4_k dequant function

717fd25

jeffbolznv force-pushed the cm2_q4_k_fp32 branch from 458c70a to 717fd25 Compare March 11, 2025 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: use fp32 in coopmat2 q4_k dequant function #12309

vulkan: use fp32 in coopmat2 q4_k dequant function #12309

jeffbolznv commented Mar 10, 2025

vulkan: use fp32 in coopmat2 q4_k dequant function #12309

Are you sure you want to change the base?

vulkan: use fp32 in coopmat2 q4_k dequant function #12309

Conversation

jeffbolznv commented Mar 10, 2025