-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add support for Ampere AmpereOne processors #5309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Thank you. I expect that the compiler options will offer some performance benefit over treating them the same as the NeoverseN1 reference design, but it should be possible to simplify the KERNEL files to an |
I compared the HPL test with the library build from the default develop branch and with this patch on the same AmpereOne 192c system: with this patch: Yes, Martin, I agree with you, I should use |
Thanks, the speedup is quite impressive, but then I guess a purely default build would currently treat it as a very basic ARMV8 ( |
Yes, the default build will use "-march=armv8-a" and the kernel file might also be the general one. As you mentioned, there will be next gen Ampere processor, is there any potential complexity if putting AMPERE1/AMPERE1A into NEOVERSEN1 DYNAMIC_ARCH? |
My thinking is simply that if these two AmpereOne processors are basically a faster NeoverseN1 compatible and there is no compelling reason to write dedicated BLAS kernels for them in the near future (?), we could avoid code duplication and just match the new cpuids to NEOVERSEN1 like we do for Altra. For DYNAMIC_ARCH, this would mean keeping the library relatively small. For dedicated builds, it would avoid the extra sets of GEMM parameters and having to check that we add "ifdef AMPEREONE" whereever there is conditional code for Neoverse already - the only user-visible drawback would be that the library will not be named libopenblas_ampereone |
Although we are using the kernel file of NeoverseN1, but these processors are quite different from the NeoverseN1 core. Altra is based on N1 core, which is armv8.2 based. And ampere1 and ampere1a is armv8.6 based. So it is not a good idea to simply map the ampere cpu id to NEOVERSEN1. I didn't consider the DYNAMIC_ARCH in these commit. If that is necessary, I can make another PR to support it. |
right, but 8.6 probably won't do much good when the code doesn't make use of it. That was why I asked how much of the performance gain vs. generic ARMV8 was from treating it like N1 compared to what the correct
DYNAMIC_ARCH is what typically gets used by third-party distributors - EasyBuild/EESSI in the HPC environment or the Python "wheels" for NumPy/SciPy, and anybody who does not want to build OpenBLAS on each target in a heterogeneous (though arm64) environment. This is why I put the simple forwarding to N1 into 0.3.30 as a last- minute patch. |
Add support for Ampere processors(AmpereOne AC3, AC4), which are arm8.6+ ISA combability.