DistributedDataParallel #26

LanceGe · 2020-07-17T13:30:24Z

Is there a way to make SupContrast work for DistributedDataParallel? By default each worker can only see its own sub-batch so the inter-sub-batch relationship of the samples will be utilized.

HobbitLong · 2020-07-17T18:37:32Z

You can use all_gather to gather features together. The caveat is that you need to manually propagate gradients through all_gather op, as it doesn't auto-bp.

LanceGe · 2020-07-18T08:58:00Z

You can use all_gather to gather features together. The caveat is that you need to manually propagate gradients through all_gather op, as it doesn't auto-bp.

I finally make it work with the help of diffdist, which provides a differentiable all_gather wrapper.

ShijianXu · 2020-07-21T09:13:53Z

Hi, can you share your code about how to implement this? I am not familiar with all_gather .etc operations. Thanks a lot.

LanceGe · 2020-07-21T09:22:41Z

Hi, can you share your code about how to implement this? I am not familiar with all_gather .etc operations. Thanks a lot.

First, install diffdist.
Then put the following snippet before calling the criterion:

    import diffdist.functional as distops

    features = distops.all_gather(
        gather_list=[torch.zeros_like(features) for _ in range(torch.distributed.get_world_size())],
        tensor=features,
        next_backprop=None,
        inplace=True,
    )
    features = torch.cat(features)

    labels = distops.all_gather(
        gather_list=[torch.zeros_like(labels) for _ in range(torch.distributed.get_world_size())],
        tensor=labels,
        next_backprop=None,
        inplace=True,
    )
    labels = torch.cat(labels)

ShijianXu · 2020-07-21T09:26:54Z

Thank you for your quick reply.

So, then I can simply compute the loss as usual and then backward the gradient?

LanceGe · 2020-07-21T09:57:19Z

Yes, but I'm not sure this is bug-free.

…

On Tue, Jul 21, 2020, 5:27 PM XU Shijian ***@***.***> wrote: Thank you for your quick reply. So, then I can simply compute the loss as usual and then backward the gradient? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFQBDVHNFCQAFCRFJGVORDR4VNO7ANCNFSM4O6IJ54A> .

ShijianXu · 2020-07-21T10:36:59Z

OK. Anyway, thanks a lot.

ShijianXu · 2020-07-31T06:21:45Z

Just for reference, this seems to be a reliable solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DistributedDataParallel #26

DistributedDataParallel #26

LanceGe commented Jul 17, 2020

HobbitLong commented Jul 17, 2020

LanceGe commented Jul 18, 2020

ShijianXu commented Jul 21, 2020

LanceGe commented Jul 21, 2020 •

edited

Loading

ShijianXu commented Jul 21, 2020

LanceGe commented Jul 21, 2020 via email

ShijianXu commented Jul 21, 2020

ShijianXu commented Jul 31, 2020

DistributedDataParallel #26

DistributedDataParallel #26

Comments

LanceGe commented Jul 17, 2020

HobbitLong commented Jul 17, 2020

LanceGe commented Jul 18, 2020

ShijianXu commented Jul 21, 2020

LanceGe commented Jul 21, 2020 • edited Loading

ShijianXu commented Jul 21, 2020

LanceGe commented Jul 21, 2020 via email

ShijianXu commented Jul 21, 2020

ShijianXu commented Jul 31, 2020

LanceGe commented Jul 21, 2020 •

edited

Loading