Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any project that uses the macro "PARAMTEST" to search for optimal P, Q, and R values? #5115

Open
Rubiczhang opened this issue Feb 8, 2025 · 4 comments

Comments

@Rubiczhang
Copy link

I’ve noticed a code snippet in OpenBLAS/driver/level3/gemm.c that looks like this:

#ifdef PARAMTEST
#undef GEMM_P
#undef GEMM_Q
#undef GEMM_R

#define GEMM_P (args -> gemm_p)
#define GEMM_Q (args -> gemm_q)
#define GEMM_R (args -> gemm_r)
#endif

However, I’m unable to determine where args->gemm_p, args->gemm_q, and args->gemm_r are set.

I understand that the args structure is initialized in OpenBLAS/interface/gemm.c, but I cannot locate where these specific fields (args->gemm_p, args->gemm_q, args->gemm_r) are assigned values.

My goal is to use the PARAMTEST macro to search for optimal P, Q, and R values on a new RISC-V platform. Does anyone know of any existing code or projects that perform such parameter searches?
Or I can implement this function, and open a pr.

@martin-frbg
Copy link
Collaborator

The PARAMTEST macro is one of several undocumented development hooks left over from the original early 2000s GotoBLAS of Kazushige Goto. I am not aware of any code that uses it for an automated search of parameter space, for all we know it may have been a simple kludge to supply individual "handcrafted" values while bypassing param.h and the values derived from that.
What we do have is a python script to generate GEMM kernels for RISC-V (see kernel/riscv64/generate_kernel.py) given some basic constraints.

@Rubiczhang
Copy link
Author

Hi martin!
Thank you for your response. Very glad to know the generate_kerne.py thing.

If we do not "supply individual 'handcrafted' values while bypassing param.h," for a new architecture, are its P, Q, and R values calculated or determined through a search process?

  • If they are determined through a search process, I assume that each iteration of the search would require:
    • Modifying param.h (or passing modified parameters via -D compiler flags),
    • Recompiling the affected .o files,
    • Relinking the binaries.
  • Compared to passing parameters via PARAMTEST, this approach would incur significantly higher(?) overhead. Or we could say that, compared to the time required for GEMM computation, this overhead is negligible.

@martin-frbg
Copy link
Collaborator

As silly as it may read, the conventional wisdom is to start by copying the values from a "similar" cpu and adjusting the P and Q so that GEMM_P * GEMM_Q is about half the size of the level 2 cache that the new cpu has. There is no documentation (not even any form of RCS or CVS history) from the early days beyond the papers K.Goto published while a postdoc at TAMU,
even the simple benchmark codes provided are a fairly recent addition (by an earlier developer of x86_64 kernels who sadly stopped contributing and responding to messages very suddenly in 2017).

I am unsure if anybody has ever tried a fully variational search process to find optimal parameters (which is probably not an easy task given the parameters involved, unless one limits oneself to matrices of fairly similar dimensions), though it would certainly be a good idea to try.

One logical place to inject the args->gemm_p etc. (if not doing it directly in the level3 driver file) would be interface/gemm.c,
but even the version of it imported from the last release of GotoBLAS2 contains no trace of PARAMTEST or supplying gemm_p by "unusual" means (such as reading from a pipe or a set of environment variables). It is entirely possible that even Goto himself had never used that mechanism, or perhaps only used it in the very first iterations of GotoBLAS...

@Rubiczhang
Copy link
Author

Thanks for explaining the historical background on the parameters!

I'll start working on the auto-search implementation.

As you mentioned, we're facing two main challenges:

  1. Picking the right metrics is tricky - I noticed BLAS papers usually test with square matrices (m=k=n), so I'll probably start there.
  2. The parameter space is huge! To keep things manageable, I'm planning to first explore a small range around our calculated P and Q values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants