Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory related issues with OpenMP + OneDAL and OpenMP + Scikit-learn #5077

Open
shivammonaka opened this issue Jan 16, 2025 · 2 comments
Open

Comments

@shivammonaka
Copy link
Contributor

OneDAL: When build with OpenBLAS with OpenMP backend and running DBScan, it throws memory related error as follows:
OpenBLAS warning: precompiled NUM_THREADS exceeded, adding auxiliary array for thread metadata. To avoid this warning, please rebuild your copy of OpenBLAS with a larger NUM_THREADS setting or set the environment variable OPENBLAS_NUM_THREADS to 32 or lower BLAS : Bad memory unallocation! : 576 0xff4e09000000

OpenBLAS Build Command (OneDAL): make USE_OPENMP=1

Scikit-learn: When build with OpenBLAS with NUM_PARALLEL>=30 (I am trying to modify OpenBLAS and it requires to use NUM_PARALLEL=30) and using KMeans it throws similar memory related error as above

OpenBLAS Build Command (Scikit-learn): make USE_OPENMP=1 NUM_PARALLEL=30

I suspect this is happening because of how locks are used around buffer allocation and de-allocation in memory.c. I am not able to find the exact cause and would love to have some help from the community in resolving it.

@martin-frbg
Copy link
Collaborator

I addressed something similar in #4233 (0.3.25), but ideally you would not end up in this situation at all, so this has seen limited testing (earlier versions of OpenBLAS would have simply given up there).

@martin-frbg
Copy link
Collaborator

Does the message go away if you remove the && !defined(USE_OPENMP) from the various #ifdefs that govern locking in blas_memory_free() - around line 3150 of file driver/others/memory.c ? OpenMP on its own may not guarantee thread safety at that point (or not in all implementations), though it would be nice for performance if it did. (Having a reproducer that does not depend on something as big as Scikit-learn or OneDAL would probably help too)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants