-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modified successfully, but not displayed on ALLTOALL? #25
Comments
@yiguCM Hi, I encountered the same issue as you. Have you resolved it? |
Hello, sorry for the late reply
From a phenomenon point of view, the P2P of 4090 is bad in all to all performance, which is most likely related to the bar length to 32G.
We stopped the project for the time being because there was no progress and no practical use value now
Possibly in the future
… From: ***@***.***>
Date: Thu, Dec 19, 2024, 14:54
Subject: Re: [tinygrad/open-gpu-kernel-modules] Modified successfully, but not displayed on ALLTOALL? (Issue #25)
To: ***@***.***>
Cc: ***@***.***>, ***@***.***>
@yiguCM<https://github.com/yiguCM> Hi, I encountered the same issue as you. Have you resolved it?
—
Reply to this email directly, view it on GitHub<#25 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BLUPAJ2PPUJLUNGNZDMWE7D2GJUSHAVCNFSM6AAAAABR2GXYDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJSHEZDAOJWGM>.
You are receiving this because you were mentioned.[image: https://github.com/notifications/beacon/BLUPAJ37U4ACQRINH4UEZ2D2GJUSHA5CNFSM6AAAAABR2GXYDOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUYFJ5YG.gif]Message ID: ***@***.***>
|
Seems to be solved with multiple gpu. On the epyc cpu. https://github.com/aikitoria/open-gpu-kernel-modules |
@yiguCM @mylesgoose Thank you all. We are currently experimenting with different CPU and hardware configurations, hoping to make some discoveries. |
I think if you have 2 cpu. You have to bridge the p lanes on epyc with mcio cables from ports on one cpu to other. Giving you 128 lanes. With 8 gpu. If you use 10 gpu or 9 then p2p goes via the cpu. 160 lanes. Pcie 16x. |
Also I have not tested but if your using a dual root plx board maybe same issue. The issue seems to come from the gpu being not single root. You could try buying some cpayne pcie 5 to two pcie 4 mcio splitters and put the gpu all on same cpu. Or root complex |
NVIDIA Open GPU Kernel Modules Version
550.90.07-p2p
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Ubuntu 22.04.5 LTS
Kernel Release
Linux 6.8.0-47-generic NVIDIA#47~22.04.1-Ubuntu
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
GPU 0: NVIDIA GeForce RTX 4090 ~ GPU 7: NVIDIA GeForce RTX 4090
Describe the bug
Hi, I successfully modified BAR1 and enabled the P2P function, which increased the performance of P2PTEST, but when I performed the NCCL test, I found that the performance of ALL TO ALL scenarios decreased. Why is this?
Is it because the BAR1 register is too large? Can we only open P2P and not modify BAR1?
To Reproduce
/nccl-tests/build/alltoall_perf -b 8 -e 8G -f 2 -g 8
Bug Incidence
Always
nvidia-bug-report.log.gz
More Info
No response
The text was updated successfully, but these errors were encountered: