[Question] Difference between Embedding Training Cache and GPU Embedding Cache #424

hsezhiyan · 2023-10-19T00:05:07Z

What is the difference between the Embedding Training Cache (https://github.com/NVIDIA-Merlin/HugeCTR/tree/main/HugeCTR/src/embedding_training_cache) and the GPU Embedding Cache (https://github.com/NVIDIA-Merlin/HugeCTR/tree/main/gpu_cache)?

It appears as if the Embedding Training Cache is used only during training. Does it use the GPU Embedding Cache under the hood?

minseokl · 2023-10-19T02:34:49Z

Hi @hsezhiyan

Yes the Embedding Training Cache (ETC) is a feature for training, which enables the use of embedding tables beyond the GPU memory capacity. It is not implemented based on the GPU Embedding Cache. Please also note that this feature is under deprecation.
The GPU Embedding Cache is mainly used by our inference use cases, through the Hierarchical Parameter Server (HPS). If you are interested in HPS, please checkout https://nvidia-merlin.github.io/HugeCTR/main/hierarchical_parameter_server/index.html

Thanks,
Minseok

hsezhiyan · 2023-10-19T17:44:11Z

Thank you for the response @minseokl

In that case, will ETC (which is under deprecation) be replaced by GPU Embedding Cache for training cases? Because it looks like GPU Embedding Cache can be used for both inference and training

yingcanw · 2023-10-20T01:15:42Z

@hsezhiyan
The ETC will be be replaced by HierarchicalKV on the training using hierarchical memory. We actually have no plans to integrate the GPU embedding cache into training. In addition, we have completed the implementation of a new generation GPU embedding cache with with higher performance and will release it soon.

sezhiyanhari · 2023-10-31T18:08:38Z

Thank you for the answer @yingcanw! I'd like to ask a few followup questions:

Are there any instructions on how to use HierarchicalKV during training? I can only find HugeCTR training examples using ETC.
Is there an expected timeframe when the updated GPU embedding cache will be released?
From a design perspective, why are different caching systems (ETC, GPU Embedding Cache) for training and inference? Was there a reason to not include a single caching system for both training and inference?

sezhiyanhari · 2023-11-06T18:07:01Z

@minseokl if you also have any insights, I would appreciate it!

yingcanw · 2023-11-07T02:47:06Z

@sezhiyanhari Sorry for the late reply.
1.Here is the relevant API description about HKV. In addition, we have integrated HKV into sok and can conduct seamless training on the tf platform. @kanghui0204 will provide a more detailed introduction, if you have any questions about sok.
2. It is expected to be soon. If you currently only need the highest performance GPU embedding cache lookup, you can also use this version of the cache.
3. Because training and inference focus on different indicators in industrial cases. For example, the inference has very strict requirements on prediction latency. At the same time, the model also needs to be updated in real-time with high frequency, which requires the cache to provide high performance of concurrent read and write. However, synchronous training can separate cache R&W, and pipeline can be optimized through operations such as prefetching... Therefore, different cache systems need to be designed to meet the performance requirements of training and inference.

lausannel · 2023-12-22T06:24:21Z

@sezhiyanhari Sorry for the late reply. 1.Here is the relevant API description about HKV. In addition, we have integrated HKV into sok and can conduct seamless training on the tf platform. @kanghui0204 will provide a more detailed introduction, if you have any questions about sok. 2. It is expected to be soon. If you currently only need the highest performance GPU embedding cache lookup, you can also use this version of the cache. 3. Because training and inference focus on different indicators in industrial cases. For example, the inference has very strict requirements on prediction latency. At the same time, the model also needs to be updated in real-time with high frequency, which requires the cache to provide high performance of concurrent read and write. However, synchronous training can separate cache R&W, and pipeline can be optimized through operations such as prefetching... Therefore, different cache systems need to be designed to meet the performance requirements of training and inference.

Hi, could you provide an example script about training using HKV and SOK?

I am a little confused about how HKV could replace ETC because as far as I know, HKV is a single GPU key-value store. Could it eliminate the Parameter Server in ETC?

Any insights are appreciated.

kanghui0204 · 2023-12-25T03:32:54Z

Hi @lausannel ,
here is an example of using SOK+HKV.
SOK+HKV example

HKV is a key-value store that uses GPU + CPU memory, where the memory for values can be stored either on the GPU or on the CPU.

HKV repo

lausannel · 2023-12-27T10:42:46Z

@kanghui0204 Thanks for your explaination!

hsezhiyan added the question Further information is requested label Oct 19, 2023

yingcanw mentioned this issue Dec 25, 2023

[Question] How to dump incremental model to kafka in Release 23.12? #438

Open

yingcanw mentioned this issue Dec 27, 2023

[Question] Is there pipeline mechanism to help the lookup requests always be handled on device cache in HugeCTR? #437

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Difference between Embedding Training Cache and GPU Embedding Cache #424

[Question] Difference between Embedding Training Cache and GPU Embedding Cache #424

hsezhiyan commented Oct 19, 2023

minseokl commented Oct 19, 2023

hsezhiyan commented Oct 19, 2023

yingcanw commented Oct 20, 2023

sezhiyanhari commented Oct 31, 2023 •

edited

Loading

sezhiyanhari commented Nov 6, 2023

yingcanw commented Nov 7, 2023

lausannel commented Dec 22, 2023 •

edited

Loading

kanghui0204 commented Dec 25, 2023

lausannel commented Dec 27, 2023

[Question] Difference between Embedding Training Cache and GPU Embedding Cache #424

[Question] Difference between Embedding Training Cache and GPU Embedding Cache #424

Comments

hsezhiyan commented Oct 19, 2023

minseokl commented Oct 19, 2023

hsezhiyan commented Oct 19, 2023

yingcanw commented Oct 20, 2023

sezhiyanhari commented Oct 31, 2023 • edited Loading

sezhiyanhari commented Nov 6, 2023

yingcanw commented Nov 7, 2023

lausannel commented Dec 22, 2023 • edited Loading

kanghui0204 commented Dec 25, 2023

lausannel commented Dec 27, 2023

sezhiyanhari commented Oct 31, 2023 •

edited

Loading

lausannel commented Dec 22, 2023 •

edited

Loading