Does simultaneous use of NPU, GPU, and CPU increase memory bandwidth usage on SoCs? #14294
Replies: 1 comment
-
In general, the gap between the peak bandwidth of the interface and effective bandwidth is explained by the accumulation of small periods of memory underutilization over the course of executing a large model. See this excellent blog for a discussion of some of these effects. At the hardware level, each of the chips is normally capable of saturating the memory interface (with the exception of some poorly designed, older NPUs) but scheduling everything in a way that the memory is constantly busy, without introducing some other undesireable complexity is not easy. I haven't read the paper you shared yet (will take a look for sure) but I'm guessing that using two chips is not providing any fundamental bandwidth advantage, and more likely just achieves better bandwidth in practice due to quirks of how the operations and reads get scheduled. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
In the paper “HeteroLLM,” the authors claim that using the NPU together with other components (such as the CPU and GPU) allows the system to utilize memory bandwidth closer to its theoretical maximum, thereby improving performance.
However, in the following link, it is stated that the CPU alone can already saturate the available memory bandwidth, meaning the NPU’s peak performance is inherently determined by the memory bandwidth limit. This seems to contradict the HeteroLLM paper. Furthermore, if multiple components each process parts of the model and share the bandwidth, I would expect interference, potentially resulting in lower effective bandwidth for each component.
So my question is: Does using multiple components (NPU, CPU, GPU, etc.) together actually allow higher effective bandwidth utilization, or can contention between them reduce bandwidth?
Both the paper and the link discuss QNN, so it doesn’t seem to be a matter of platform differences.
Beta Was this translation helpful? Give feedback.
All reactions