-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GraphBolt] fix random generator for shuffle among all workers #6982
Conversation
To trigger regression tests:
|
Is there a particular reason we use |
|
I see. @frozenbugs was also asking about whether we should consider enabling GPU for the ItemSampler. Now I know what are the limitations. Thanks! There is still a way to utilize torch shuffle while utilizing np.rng. We store an np.rng object and whenever we need to shuffle, we get a random number from np.rng, and seed a new torch rng. Then we shuffle with torch shuffle with torch rng. |
I don't know whether moving |
There is one more reason the solution I propose could be better than the numpy one. If we use numpy, it will keep its own thread pool separate from torch thread pool. It is best if we use torch for all operations if we can, even if it runs on the CPU and we don't do any refactoring to enable GPU execution. |
I benchmarked the two approaches and seems like numpy is faster on the CPU, at least in the Google Colab environment: https://colab.research.google.com/drive/1_5M2aLPrjLcHSrnTyT6GpB0FMyRhuCbO?usp=sharing import torch
import numpy
import time
def numpy_way(len, shuffle, device, rng):
indices = torch.arange(len)
if shuffle:
rng.shuffle(indices.numpy())
return indices.to(device)
def torch_way(len, shuffle, device, rng):
if shuffle:
seed = rng.integers(2 ** 62).item()
torch_rng = torch.Generator(device).manual_seed(seed)
return torch.randperm(len, generator=torch_rng, device=device)
else:
return torch.arange(len, device=device)
rng = numpy.random.default_rng()
for log_len in range(15, 25):
len = 2 ** log_len
for device in ["cpu", "cuda"]:
runtimes = []
for shuffler in [numpy_way, torch_way]:
for i in range(10):
if i == 3: # Warmup for 3 iterations.
start = time.time()
indices = shuffler(len, True, device, rng)
if device == "cuda":
torch.cuda.synchronize()
runtimes.append(time.time() - start)
print(f"len: {len}, device: {device}, numpy: {runtimes[0]:.4f}, torch: {runtimes[1]:.4f}")
@Rhett-Ying So we might wanna keep your existing solution because it is more performant, in the google colab environment. However, if we want to have the advantages I listed above, we could go with the torch solution. |
Actually, on my machine with AMD Ryzen 5950X CPU, torch seems to be the much more performant one:
|
The reason why we might want to move the shuffling operation to the GPU is as you can see, it starts taking very long as the itemset size grows. That is why, it might be a good idea to have GPU option. Otherwise, GPUs will idle between epochs while shuffling. |
Description
Checklist
Please feel free to remove inapplicable items for your PR.
Changes