Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan: use fp32 in coopmat2 q4_k dequant function #12309

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jeffbolznv
Copy link
Collaborator

This is stacked on #12273 and uses fp32 in dequantFuncQ4_K. It's kind of a toss up whether fp16 of fp32 will be faster, and this was the only function I found where it's currently faster.

Results for Phi-3-mini-4k-instruct-q4.gguf on RTX 4070:

		PR12273		this PR		speedup
					
        pp5	466.92		463.51		-0.73%
       pp10	517.04		561.56		8.61%
       pp20	1045.43		1105.62		5.76%
       pp31	1730.83		1664.51		-3.83%
       pp32	1745.39		1837.95		5.30%
       pp33	1512.23		1472.52		-2.63%
       pp48	2249.4		2136.36		-5.03%
       pp54	2318.44		2398.81		3.47%
       pp63	2694.4		2733.02		1.43%
       pp64	2764.43		2725.61		-1.40%
       pp65	2260.9		2326.46		2.90%
       pp80	2835.27		2836.59		0.05%
       pp96	3189.51		3229.14		1.24%
      pp112	3632.44		3722.96		2.49%
      pp113	3656.96		3736.39		2.17%
      pp127	4087.5		4207.86		2.94%
      pp128	4174.96		4269.83		2.27%
      pp129	2734.41		2848.4		4.17%
      pp140	2907.37		3102.99		6.73%
      pp160	3356.15		3497.35		4.21%
      pp180	3639.65		3913.92		7.54%
      pp192	3857.68		4178.85		8.33%
      pp200	3979.56		4301.45		8.09%
      pp210	4258.49		4393.56		3.17%
      pp230	4589.22		4693.42		2.27%
      pp248	4761.21		5352.42		12.42%
      pp255	5024.18		5221.56		3.93%
      pp256	4919.83		5210.66		5.91%
      pp257	3788.88		3894.24		2.78%
      pp280	4150.83		4192.77		1.01%
      pp300	4432.56		4462.49		0.68%
      pp320	4545.71		4779.86		5.15%
      pp350	4877.21		5035.38		3.24%
      pp384	5189.39		5423.55		4.51%
      pp410	4580.83		4722.13		3.08%
      pp448	4945.34		5059.5		2.31%
      pp480	5208.15		5248.98		0.78%
      pp490	5185.52		5370.3		3.56%
      pp511	5365		5572.95		3.88%
      pp512	5398.73		5547.35		2.75%
      pp513	4958.86		5081.44		2.47%
      pp767	5141.08		5422.29		5.47%
      pp768	5175.3		5299.44		2.40%
      pp769	4685.68		4780.83		2.03%
     pp1023	5300.75		5482.69		3.43%
     pp1024	5294.66		5537.79		4.59%
     pp1025	5023.31		5230.2		4.12%
     pp2047	5117.1		5308.24		3.74%
     pp2048	5116.32		5272.6		3.05%
     pp2049	5016.38		5123.08		2.13%

@jeffbolznv jeffbolznv requested a review from 0cc4m March 10, 2025 13:45
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant