Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan: Adjust coopmat2 tile sizes and selection heuristic #12258

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jeffbolznv
Copy link
Collaborator

This change selects different tile sizes (M/N/K) for the coopmat2 shaders, with the goal of better optimizing for smaller prompt lengths. It turns out the largest tile size didn't need to be so large, and there were better tile sizes for smaller prompts like pp128.

I ran a variety of prompt lengths using each of small/medium/large and found that the previous heuristic of trying to use the largest size that evenly divides the prompt length isn't optimal, and it's better to just round up to the next larger tile size.

I think there's still room to improve some prompt lengths by using a mixture of sizes (e.g. see the falloff from pp128 to pp129, which could do better using a mixture of N=128 and N=32 tiles). But I haven't tried that yet.

		Phi-3-mini-4k-instruct-q4	Llama-3.2-3B-Instruct-Q4_0	DeepSeek-Coder-V2-Lite-Instruct-Q2_K		
		master	PR	delta		master	PR	delta		master	PR	delta
        pp5	447.15	462.92	3.53%		531.84	541.87	1.89%		109.61	144.33	31.68%
       pp10	274.87	494.28	79.82%		320.15	666.86	108.30%		214.57	241.45	12.53%
       pp20	564.73	991.1	75.50%		636.88	1365.68	114.43%		339.73	379.24	11.63%
       pp31	863.29	1418.82	64.35%		1009.82	2012.46	99.29%		482.04	538.28	11.67%
       pp32	894.75	1680.77	87.85%		1001.53	2822.73	181.84%		514.22	584.16	13.60%
       pp33	909.48	1293.95	42.27%		1087.26	1572.45	44.63%		494.02	534.39	8.17%
       pp48	1347.26	1766.27	31.10%		1559.74	2225.58	42.69%		669.58	725.74	8.39%
       pp54	1465.89	2030.28	38.50%		1758.62	2714.58	54.36%		767.75	810.42	5.56%
       pp63	1689.89	2405.95	42.37%		1920.05	2766.84	44.10%		894.12	947.79	6.00%
       pp64	1702.93	2910.49	70.91%		1951.4	3495.35	79.12%		909.94	963.73	5.91%
       pp65	1741.64	1903.68	9.30%		2012.49	2440.7	21.28%		846.71	889.55	5.06%
       pp80	2039.35	2288.33	12.21%		2654.92	3194.68	20.33%		972.21	1032.53	6.20%
       pp96	2397.05	2722.58	13.58%		2997.82	3444.56	14.90%		1127.34	1202.88	6.70%
      pp112	2732.28	3245.6	18.79%		3317.34	3727.07	12.35%		1280.77	1358.74	6.09%
      pp113	2787.73	3057.98	9.69%		3273.65	3841.65	17.35%		1286.89	1354.41	5.25%
      pp127	3064.93	3449.08	12.53%		3983.52	4308.19	8.15%		1387.91	1462.76	5.39%
      pp128	3817.62	4258.3	11.54%		4988.4	5836.28	17.00%		1285.63	1534.76	19.38%
      pp129	2246.45	2516.47	12.02%		3515.47	3659.03	4.08%		1277.89	1414.45	10.69%
      pp140	2357.01	2710.37	14.99%		3560.25	3791.22	6.49%		1349.21	1497.69	11.00%
      pp160	2606.54	2998.89	15.05%		4175.21	4229.96	1.31%		1510.36	1653.83	9.50%
      pp180	2891.38	3292.31	13.87%		4431.38	4698.62	6.03%		1594.75	1823.14	14.32%
      pp192	3036.42	3622.36	19.30%		4845.22	4999.49	3.18%		1882.77	1843.78	-2.07%
      pp200	3106.62	3610.68	16.23%		5037.88	5320.26	5.61%		1609.13	1904.71	18.37%
      pp210	3245.74	3771.82	16.21%		5317.9	5313.79	-0.08%		1657.07	1942.78	17.24%
      pp230	3493.62	4152.18	18.85%		5512.73	5486.64	-0.47%		1726.6	2047.33	18.58%
      pp248	3678.74	4357.96	18.46%		5981.39	6198.01	3.62%		1830.46	2142.02	17.02%
      pp255	3719.19	4338.61	16.65%		6385.1	6439.46	0.85%		1850.39	2173.92	17.48%
      pp256	4818.48	5035.21	4.50%		6876.82	6844.45	-0.47%		1922.72	2281.41	18.66%
      pp257	2898.17	3605.43	24.40%		5340.83	5004.51	-6.30%		1652.4	2016.53	22.04%
      pp280	3143.84	3830.24	21.83%		5523.79	5364.38	-2.89%		1740.22	2143.03	23.15%
      pp300	3300.25	4080.59	23.64%		5917.37	5847.97	-1.17%		1800.03	2199.72	22.20%
      pp320	3498.11	4217.64	20.57%		6204.19	6174.56	-0.48%		2291.67	2310.74	0.83%
      pp350	3672.25	4548.51	23.86%		6630.85	6418.19	-3.21%		1787.9	2326.1	30.10%
      pp384	4251.46	5316.02	25.04%		7845.15	7781.41	-0.81%		2301.83	2466.57	7.16%
      pp410	3478.13	4287.22	23.26%		6224.08	6335.52	1.79%		1741.14	2356.38	35.34%
      pp448	3741.12	4645.57	24.18%		6700.98	6655.14	-0.68%		2399.2	2439.5	1.68%
      pp480	3886.56	4830.78	24.29%		6970.5	7216.27	3.53%		1692.51	2405.76	42.14%
      pp490	3980.64	4909.99	23.35%		7010.11	7031.98	0.31%		1691.38	2420.9	43.13%
      pp511	4051.34	5030.26	24.16%		7622.17	7214.6	-5.35%		1711.46	2461.29	43.81%
      pp512	5434.86	5435.16	0.01%		7948.65	7810.69	-1.74%		2469.96	2498.51	1.16%
      pp513	4935.44	5077.35	2.88%		7085.09	6986.99	-1.38%		2382.88	2423.44	1.70%
      pp767	4633.69	4985.33	7.59%		7023.5	7201.18	2.53%		2192.93	2371.51	8.14%
      pp768	5171.77	5251.81	1.55%		7373.88	7401.12	0.37%		2265.41	2384.82	5.27%
      pp769	4143.96	4575.35	10.41%		6645.42	6699.65	0.82%		2127.2	2295.49	7.91%
     pp1023	4575.58	5227.51	14.25%		7599.04	7661.06	0.82%		2010.02	2436.48	21.22%
     pp1024	5324.02	5318.13	-0.11%		7738.87	7874.87	1.76%		2448.96	2447.92	-0.04%
     pp1025	5119.71	5118.21	-0.03%		7025.86	7397.71	5.29%		2385.33	2384.26	-0.04%
     pp2047	4748.43	5093.44	7.27%		7197.72	7268.61	0.98%		2130.06	2344.45	10.06%
     pp2048	5152.48	5185.36	0.64%		7290.56	7404.3	1.56%		2357.14	2353.71	-0.15%
     pp2049	5045.99	5021.26	-0.49%		7065.23	7198.21	1.88%		2309.27	2332.74	1.02%

@jeffbolznv jeffbolznv requested a review from 0cc4m March 7, 2025 18:15
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 7, 2025
jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Mar 10, 2025
Use VK_KHR_pipeline_executable_properties to query the register count, and
use that to try to better estimate how many workgroups can fit in the SMs.
Particularly with recent tile size changes (ggml-org#12258) the old heuristic is
out of date.

This heuristic benefits both coopmat1 and coopmat2 paths on NVIDIA. Would
be good if somebody can hook up the missing details for other hardware.

Calling getPipelineExecutableStatisticsKHR required more fully initializing
Vulkan-HPP. The steps needed are documented in the Vulkan-HPP readme.
@jeffbolznv
Copy link
Collaborator Author

Note that the Q4_0 perf in the description is out of date now, see discussion in #12319.

jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Mar 11, 2025
Use VK_KHR_pipeline_executable_properties to query the register count, and
use that to try to better estimate how many workgroups can fit in the SMs.
Particularly with recent tile size changes (ggml-org#12258) the old heuristic is
out of date.

This heuristic benefits both coopmat1 and coopmat2 paths on NVIDIA. Would
be good if somebody can hook up the missing details for other hardware.

Calling getPipelineExecutableStatisticsKHR required more fully initializing
Vulkan-HPP. The steps needed are documented in the Vulkan-HPP readme.
jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Mar 11, 2025
Use VK_KHR_pipeline_executable_properties to query the register count, and
use that to try to better estimate how many workgroups can fit in the SMs.
Particularly with recent tile size changes (ggml-org#12258) the old heuristic is
out of date.

This heuristic benefits both coopmat1 and coopmat2 paths on NVIDIA. Would
be good if somebody can hook up the missing details for other hardware.

Calling getPipelineExecutableStatisticsKHR required more fully initializing
Vulkan-HPP. The steps needed are documented in the Vulkan-HPP readme.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant