[Discussion] Loop Tiling Could possibly make it faster for clustering #36

andyafter · 2025-03-09T14:38:42Z

Throughout the previous couple days I have tried various instances on AWS, from C6, C7 series to R6, R7, All the instances that I have tried have more than 96 cores and 256GB RAM.

I found that the initialization of clustering constantly remains around 5~11 hours for the kmeans initialization of flop.

However, HPC6a.48Xlarge instances demonstrated a significant improvement in this process, with 12 minutes initialization!!!

The HPC instance that I used contains around 384GB of RAM(which was barely used), and 96cores, and most importantly, 512MB of L3 cache and 48GB of L2 cache. C6a.32xlarge has 128MB of L3 cache, that seems to be the key that caused this training time improvement.

Which, my conclusion is that, the clustering is handling significant amount of repeatitive memory fetching and writing, with which we could implement a loop tiling technique to significantly boost the clustering efficiency.

What do you guys think!

krukah · 2025-03-10T00:07:13Z

this sounds quite promising! it is certainly true that we do a ton of repetitive memory access in the training steps, and i've not thought about how to optimize these operations. what do you imagine a loop tiling approach could look like here?

andyafter · 2025-03-10T04:16:26Z

Still trying to get a good understand of the clustering code. Let me answer this a bit later.

krukah mentioned this issue Mar 10, 2025

[Request] Is there any Benchmarks on the Resource consumption of Each Step? #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Loop Tiling Could possibly make it faster for clustering #36

[Discussion] Loop Tiling Could possibly make it faster for clustering #36

andyafter commented Mar 9, 2025

krukah commented Mar 10, 2025

andyafter commented Mar 10, 2025

[Discussion] Loop Tiling Could possibly make it faster for clustering #36

[Discussion] Loop Tiling Could possibly make it faster for clustering #36

Comments

andyafter commented Mar 9, 2025

krukah commented Mar 10, 2025

andyafter commented Mar 10, 2025