[GraphBolt] fix random generator for shuffle among all workers #6982

Rhett-Ying · 2024-01-19T11:23:22Z

Description

Test accuracy 45.8261

Epoch: 01, Loss: 2.3362, Valid accuracy: 48.41%, Time 68.9185
Training~Epoch 02: 0it [00:00, ?it/s]rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
rank None shuffled indices: tensor([404999, 305221, 507294,  ..., 354460, 285126, 516371])
Training~Epoch 02: 615it [02:25,  4.23it/s]
Evaluating the model on the validation set.
Inference: 16it [00:03,  4.22it/s]
Finish evaluating on validation set.
Epoch: 02, Loss: 1.5541, Valid accuracy: 48.49%, Time 145.3738
Training~Epoch 03: 0it [00:00, ?it/s]rank None shuffled indices: tensor([ 87977, 530066, 189900,  ..., 302993,  15348, 600035])
rank None shuffled indices: tensor([ 87977, 530066, 189900,  ..., 302993,  15348, 600035])
rank None shuffled indices: tensor([ 87977, 530066, 189900,  ..., 302993,  15348, 600035])
rank None shuffled indices: tensor([ 87977, 530066, 189900,  ..., 302993,  15348, 600035])
rank None shuffled indices: tensor([ 87977, 530066, 189900,  ..., 302993,  15348, 600035])
rank None shuffled indices: tensor([ 87977, 530066, 189900,  ..., 302993,  15348, 600035])
rank None shuffled indices: tensor([ 87977, 530066, 189900,  ..., 302993,  15348, 600035])

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
I've leverage the tools to beautify the python and c++ code.
The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Related issue is referred in this PR
If the PR is for a new model/paper, I've updated the example index here.

Changes

dgl-bot · 2024-01-19T11:23:50Z

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

dgl-bot · 2024-01-19T12:07:48Z

Commit ID: 0618b4bfc3fd55cb8e9386c9c2d5f71a400a563b

Build ID: 1

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

mfbalin · 2024-01-19T15:31:23Z

Is there a particular reason we use np.rng and np.shuffle? If we used torch, the code could be enabled to work on the GPU later.

Rhett-Ying · 2024-01-20T03:58:55Z

Is there a particular reason we use np.rng and np.shuffle? If we used torch, the code could be enabled to work on the GPU later.

torch.Generator() cannot be pickled. and is this targeted on GPU? why do we need that?

mfbalin · 2024-01-20T04:28:18Z

Is there a particular reason we use np.rng and np.shuffle? If we used torch, the code could be enabled to work on the GPU later.

torch.Generator() cannot be pickled. and is this targeted on GPU? why do we need that?

I see. @frozenbugs was also asking about whether we should consider enabling GPU for the ItemSampler. Now I know what are the limitations. Thanks!

There is still a way to utilize torch shuffle while utilizing np.rng. We store an np.rng object and whenever we need to shuffle, we get a random number from np.rng, and seed a new torch rng. Then we shuffle with torch shuffle with torch rng.

python/dgl/graphbolt/item_sampler.py

Rhett-Ying · 2024-01-20T04:54:34Z

I don't know whether moving ItemSampler to GPU is a good idea or best practice. Does torch.DataLoader ever think about it? Putting all workloads onto GPU may lose overlapping. The best practice may be combining CPU and GPU together instead of relying on single one of them.

mfbalin · 2024-01-20T05:09:35Z

I don't know whether moving ItemSampler to GPU is a good idea or best practice. Does torch.DataLoader ever think about it? Putting all workloads onto GPU may lose overlapping. The best practice may be combining CPU and GPU together instead of relying on single one of them.

There is one more reason the solution I propose could be better than the numpy one. If we use numpy, it will keep its own thread pool separate from torch thread pool. It is best if we use torch for all operations if we can, even if it runs on the CPU and we don't do any refactoring to enable GPU execution.

mfbalin · 2024-01-20T06:04:57Z

I benchmarked the two approaches and seems like numpy is faster on the CPU, at least in the Google Colab environment: https://colab.research.google.com/drive/1_5M2aLPrjLcHSrnTyT6GpB0FMyRhuCbO?usp=sharing

import torch
import numpy
import time

def numpy_way(len, shuffle, device, rng):
    indices = torch.arange(len)
    if shuffle:
        rng.shuffle(indices.numpy())
    return indices.to(device)

def torch_way(len, shuffle, device, rng):
    if shuffle:
        seed = rng.integers(2 ** 62).item()
        torch_rng = torch.Generator(device).manual_seed(seed)
        return torch.randperm(len, generator=torch_rng, device=device)
    else:
        return torch.arange(len, device=device)

rng = numpy.random.default_rng()

for log_len in range(15, 25):
    len = 2 ** log_len
    for device in ["cpu", "cuda"]:
        runtimes = []
        for shuffler in [numpy_way, torch_way]:
            for i in range(10):
                if i == 3: # Warmup for 3 iterations.
                    start = time.time()
                indices = shuffler(len, True, device, rng)
            if device == "cuda":
                torch.cuda.synchronize()
            runtimes.append(time.time() - start)
        print(f"len: {len}, device: {device}, numpy: {runtimes[0]:.4f}, torch: {runtimes[1]:.4f}")

len: 32768, device: cpu, numpy: 0.0046, torch: 0.0037
len: 32768, device: cuda, numpy: 0.0037, torch: 0.0017
len: 65536, device: cpu, numpy: 0.0059, torch: 0.0069
len: 65536, device: cuda, numpy: 0.0072, torch: 0.0020
len: 131072, device: cpu, numpy: 0.0128, torch: 0.0146
len: 131072, device: cuda, numpy: 0.0143, torch: 0.0036
len: 262144, device: cpu, numpy: 0.0285, torch: 0.0339
len: 262144, device: cuda, numpy: 0.0312, torch: 0.0056
len: 524288, device: cpu, numpy: 0.0593, torch: 0.0757
len: 524288, device: cuda, numpy: 0.0618, torch: 0.0111
len: 1048576, device: cpu, numpy: 0.1561, torch: 0.1794
len: 1048576, device: cuda, numpy: 0.1328, torch: 0.0212
len: 2097152, device: cpu, numpy: 0.3178, torch: 0.4349
len: 2097152, device: cuda, numpy: 0.3435, torch: 0.0391
len: 4194304, device: cpu, numpy: 0.8298, torch: 1.1266
len: 4194304, device: cuda, numpy: 0.9278, torch: 0.0763
len: 8388608, device: cpu, numpy: 2.9981, torch: 2.5587
len: 8388608, device: cuda, numpy: 1.8150, torch: 0.1607
len: 16777216, device: cpu, numpy: 5.9163, torch: 7.2578
len: 16777216, device: cuda, numpy: 5.5185, torch: 0.3070

@Rhett-Ying So we might wanna keep your existing solution because it is more performant, in the google colab environment. However, if we want to have the advantages I listed above, we could go with the torch solution.

mfbalin · 2024-01-20T06:08:46Z

Actually, on my machine with AMD Ryzen 5950X CPU, torch seems to be the much more performant one:

len: 32768, device: cpu, numpy: 0.0017, torch: 0.0010
len: 32768, device: cuda, numpy: 0.0028, torch: 0.0050
len: 65536, device: cpu, numpy: 0.0053, torch: 0.0038
len: 65536, device: cuda, numpy: 0.0055, torch: 0.0039
len: 131072, device: cpu, numpy: 0.0119, torch: 0.0052
len: 131072, device: cuda, numpy: 0.0101, torch: 0.0060
len: 262144, device: cpu, numpy: 0.0174, torch: 0.0108
len: 262144, device: cuda, numpy: 0.0208, torch: 0.0050
len: 524288, device: cpu, numpy: 0.0380, torch: 0.0230
len: 524288, device: cuda, numpy: 0.0437, torch: 0.0086
len: 1048576, device: cpu, numpy: 0.0776, torch: 0.0530
len: 1048576, device: cuda, numpy: 0.0868, torch: 0.0156
len: 2097152, device: cpu, numpy: 0.1667, torch: 0.1479
len: 2097152, device: cuda, numpy: 0.2033, torch: 0.0585
len: 4194304, device: cpu, numpy: 0.4680, torch: 0.3354
len: 4194304, device: cuda, numpy: 0.5764, torch: 0.2689
len: 8388608, device: cpu, numpy: 1.2623, torch: 0.9207
len: 8388608, device: cuda, numpy: 1.4132, torch: 0.2697
len: 16777216, device: cpu, numpy: 3.0960, torch: 2.1247
len: 16777216, device: cuda, numpy: 3.4015, torch: 0.2903
len: 33554432, device: cpu, numpy: 6.9723, torch: 4.6462
len: 33554432, device: cuda, numpy: 7.4700, torch: 0.2341
len: 67108864, device: cpu, numpy: 14.5287, torch: 9.9723
len: 67108864, device: cuda, numpy: 15.6430, torch: 0.3678
len: 134217728, device: cpu, numpy: 30.6493, torch: 20.7330
len: 134217728, device: cuda, numpy: 32.3042, torch: 0.4654

mfbalin · 2024-01-20T06:25:54Z

The reason why we might want to move the shuffling operation to the GPU is as you can see, it starts taking very long as the itemset size grows. That is why, it might be a good idea to have GPU option. Otherwise, GPUs will idle between epochs while shuffling.

[GraphBolt] fix random generator for shuffle among all workers

4a1dfcc

Rhett-Ying requested review from frozenbugs and RamonZhou January 19, 2024 11:23

mfbalin reviewed Jan 20, 2024

View reviewed changes

python/dgl/graphbolt/item_sampler.py Show resolved Hide resolved

Rhett-Ying requested a review from jermainewang January 21, 2024 01:30

frozenbugs approved these changes Jan 22, 2024

View reviewed changes

Rhett-Ying merged commit 03ca11f into dmlc:master Jan 22, 2024
2 checks passed

Rhett-Ying deleted the gb_itemsampler_seed branch January 22, 2024 06:39

Rhett-Ying mentioned this pull request Jan 23, 2024

[GraphBolt] items are not shuffled across the whole set if num_workers>0 #6947

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GraphBolt] fix random generator for shuffle among all workers #6982

[GraphBolt] fix random generator for shuffle among all workers #6982

Rhett-Ying commented Jan 19, 2024 •

edited

Loading

dgl-bot commented Jan 19, 2024

dgl-bot commented Jan 19, 2024

mfbalin commented Jan 19, 2024 •

edited

Loading

Rhett-Ying commented Jan 20, 2024

mfbalin commented Jan 20, 2024

Rhett-Ying commented Jan 20, 2024

mfbalin commented Jan 20, 2024 •

edited

Loading

mfbalin commented Jan 20, 2024 •

edited

Loading

mfbalin commented Jan 20, 2024 •

edited

Loading

mfbalin commented Jan 20, 2024 •

edited

Loading

[GraphBolt] fix random generator for shuffle among all workers #6982

[GraphBolt] fix random generator for shuffle among all workers #6982

Conversation

Rhett-Ying commented Jan 19, 2024 • edited Loading

Description

Checklist

Changes

dgl-bot commented Jan 19, 2024

dgl-bot commented Jan 19, 2024

mfbalin commented Jan 19, 2024 • edited Loading

Rhett-Ying commented Jan 20, 2024

mfbalin commented Jan 20, 2024

Rhett-Ying commented Jan 20, 2024

mfbalin commented Jan 20, 2024 • edited Loading

mfbalin commented Jan 20, 2024 • edited Loading

mfbalin commented Jan 20, 2024 • edited Loading

mfbalin commented Jan 20, 2024 • edited Loading

Rhett-Ying commented Jan 19, 2024 •

edited

Loading

mfbalin commented Jan 19, 2024 •

edited

Loading

mfbalin commented Jan 20, 2024 •

edited

Loading

mfbalin commented Jan 20, 2024 •

edited

Loading

mfbalin commented Jan 20, 2024 •

edited

Loading

mfbalin commented Jan 20, 2024 •

edited

Loading