From fdc26b43f67da0abdea152171d8b7d3c78f5254e Mon Sep 17 00:00:00 2001 From: "Fedotov, Aleksei" Date: Mon, 4 Nov 2024 15:43:57 +0100 Subject: [PATCH 01/13] Add sub-RFC for increased availability of NUMA API --- .../increased_availability/README.org | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100755 rfcs/proposed/simplified_numa_support/increased_availability/README.org diff --git a/rfcs/proposed/simplified_numa_support/increased_availability/README.org b/rfcs/proposed/simplified_numa_support/increased_availability/README.org new file mode 100755 index 0000000000..b184296f76 --- /dev/null +++ b/rfcs/proposed/simplified_numa_support/increased_availability/README.org @@ -0,0 +1,101 @@ +# -*- fill-column: 80; -*- + +#+title: Improve predictability of API for NUMA support + +*Note:* This is a sub-RFC of the https://github.com/oneapi-src/oneTBB/pull/1535. +Specifically, its section about "Increased availability of NUMA support". + +* Introduction +oneTBB has soft dependency on several versions of ~tbbbind~, which are loaded by +the library as part of its initialization stage. In turn, each ~tbbbind~ has +hard dependency, i.e., relies on load-time linking, on concrete version of the +HWLOC library [1, 2]. The soft dependency of oneTBB on ~tbbbind~ allows the +library to continue its execution even if system loader is unable to resolve +hard dependency on HWLOC for ~tbbbind~. In this case, the NUMA support API does +nothing. An error is also not reported and the code that uses NUMA support +facilities may continue running expecting it to work. Such behavior is not +readily noticable by the users of oneTBB and this represents the main problem +with the current behavior. + +Having a dependency on a shared HWLOC library has a number of advantages: +1. Code reuse with all of the positive consequences out of this. That's the + primary reason shared libraries are for. +2. Sharing HWLOC context between its clients. This avoids performing multiple + times same operations issuing identical results. +3. A drop-in replacement. Users are able to use their own version of HWLOC + without recompilation of oneTBB. + +The only disadvantage from depending on HWLOC dynamically is that user needs to +make sure the library is available and can be found by oneTBB. Depending on the +distribution model of a user's code, this is achieved either by: +1. Asking the end user to have necessary version of a dependency pre-installed. +2. Bundling necessary HWLOC version together with other pieces of a product + release. + +However, the requirement to fulfill one of the above steps for the NUMA API to +start paying off may be considered as an incovenience for users of oneTBB and, +what is more important, it is not always obvious. Especially, due to silent +behavior in case HWLOC library cannot be found in the environment. + +This proposal suggests an improvement on these two points by having HWLOC +library linked statically with one of the ~tbbbind~ libraries that are +distributed together with oneTBB. + +[1] [[https://www.open-mpi.org/projects/hwloc/][HWLOC project main page]] + +[2] [[https://github.com/open-mpi/hwloc][HWLOC project repository on GitHub]] + +* Proposal +Introduce: +1. New version of ~tbbbind~ shared library with the name ~tbbbind_static~ that is + statically-linked with HWLOC library and distributed along side with the other + ~tbbbind~ versions. +2. Loading of the new ~tbbbind_static~ as the last attempt, i.e., a fallback + path, to resolve the dependency on functionality provided by ~tbbbind~ layer. +3. Printing what ~tbbbind~ version is used when ~TBB_VERSION=1~ environment + variable is present. + +** Advantages +The proposed behavior allows having a fallback mechanism for resolving a +dependency on HWLOC library in case it cannot be found in the environment, while +still preferring user-provided version of HWLOC. + +As a result, the following use of oneTBB API should work as expected, returning +enumerated list of actual NUMA nodes and core types on the system the code is +running on, provided that the loaded HWLOC library works on that system: + +#+begin_src C++ +std::vector numa_nodes = oneapi::tbb::info::numa_nodes(); +std::vector core_types = oneapi::tbb::info::core_types(); +#+end_src + +** Disadvantages +1. The oneTBB distribution package is now extended with an additional version of + ~tbbbind~ library that is statically linked with certain version of HWLOC. +2. Still silent by default behavior in case user failed to setup environment + with their own version of HWLOC library correctly. Although, specifying + ~TBB_VERSION=1~ envar will help identifying an issue with an environment + setup pretty quickly. +3. Non-shared HWLOC context in case of ~tbbbind_static~ library is used. + +* Alternative Solutions Considered +The other solution for being silent in case HWLOC library is not found is either +to issue a warning or to throw an exception. + +Comparing these alternative solutions to the one proposed. +** Common Advantages +1. Explicitly tells the user that the functionality being used is not going to + work. +2. Does not require additional version of ~tbbbind~ library to be distributed + along with the others. + +** Common Disadvantages +- Requires additional step from the user side to resolve the problem. + +** Disadvantages of Issuing a Warning +- Does not solve the problem completely as a warning may still not be visible to + the user, especially if standard streams are closed. + +** Disadvantages of Throwing an Exception +1. May break existing code as it does not expect an exception to be thrown. +2. Requires introduction of an additional exception hierarchy From ce6746d83fbec3d801075eb8088303d6ed534ba7 Mon Sep 17 00:00:00 2001 From: Aleksei Fedotov Date: Fri, 8 Nov 2024 14:19:23 +0100 Subject: [PATCH 02/13] Apply suggestions from Mike Co-authored-by: Mike Voss --- .../increased_availability/README.org | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/rfcs/proposed/simplified_numa_support/increased_availability/README.org b/rfcs/proposed/simplified_numa_support/increased_availability/README.org index b184296f76..c4ea7fa094 100755 --- a/rfcs/proposed/simplified_numa_support/increased_availability/README.org +++ b/rfcs/proposed/simplified_numa_support/increased_availability/README.org @@ -6,12 +6,12 @@ Specifically, its section about "Increased availability of NUMA support". * Introduction -oneTBB has soft dependency on several versions of ~tbbbind~, which are loaded by +oneTBB has a soft dependency on several variants of ~tbbbind~, which are loaded by the library as part of its initialization stage. In turn, each ~tbbbind~ has -hard dependency, i.e., relies on load-time linking, on concrete version of the +a hard dependency, i.e., relies on load-time linking, on a concrete version of the HWLOC library [1, 2]. The soft dependency of oneTBB on ~tbbbind~ allows the -library to continue its execution even if system loader is unable to resolve -hard dependency on HWLOC for ~tbbbind~. In this case, the NUMA support API does +library to continue its execution even if the system loader is unable to resolve +the hard dependency on HWLOC for ~tbbbind~. In this case, the NUMA support API does nothing. An error is also not reported and the code that uses NUMA support facilities may continue running expecting it to work. Such behavior is not readily noticable by the users of oneTBB and this represents the main problem @@ -19,9 +19,9 @@ with the current behavior. Having a dependency on a shared HWLOC library has a number of advantages: 1. Code reuse with all of the positive consequences out of this. That's the - primary reason shared libraries are for. -2. Sharing HWLOC context between its clients. This avoids performing multiple - times same operations issuing identical results. + primary purpose of shared libraries. +2. Sharing HWLOC context between its clients. This avoids performing + the same operations repeatedly with identical results. 3. A drop-in replacement. Users are able to use their own version of HWLOC without recompilation of oneTBB. From 258b82c396aca0d433a049de9409cb16b25f7019 Mon Sep 17 00:00:00 2001 From: "Fedotov, Aleksei" Date: Fri, 8 Nov 2024 14:21:03 +0100 Subject: [PATCH 03/13] Align text to be within 80 characters width --- .../increased_availability/README.org | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/rfcs/proposed/simplified_numa_support/increased_availability/README.org b/rfcs/proposed/simplified_numa_support/increased_availability/README.org index c4ea7fa094..71ce7b7795 100755 --- a/rfcs/proposed/simplified_numa_support/increased_availability/README.org +++ b/rfcs/proposed/simplified_numa_support/increased_availability/README.org @@ -6,13 +6,13 @@ Specifically, its section about "Increased availability of NUMA support". * Introduction -oneTBB has a soft dependency on several variants of ~tbbbind~, which are loaded by -the library as part of its initialization stage. In turn, each ~tbbbind~ has -a hard dependency, i.e., relies on load-time linking, on a concrete version of the -HWLOC library [1, 2]. The soft dependency of oneTBB on ~tbbbind~ allows the +oneTBB has a soft dependency on several variants of ~tbbbind~, which are loaded +by the library as part of its initialization stage. In turn, each ~tbbbind~ has +a hard dependency, i.e., relies on load-time linking, on a concrete version of +the HWLOC library [1, 2]. The soft dependency of oneTBB on ~tbbbind~ allows the library to continue its execution even if the system loader is unable to resolve -the hard dependency on HWLOC for ~tbbbind~. In this case, the NUMA support API does -nothing. An error is also not reported and the code that uses NUMA support +the hard dependency on HWLOC for ~tbbbind~. In this case, the NUMA support API +does nothing. An error is also not reported and the code that uses NUMA support facilities may continue running expecting it to work. Such behavior is not readily noticable by the users of oneTBB and this represents the main problem with the current behavior. @@ -20,8 +20,8 @@ with the current behavior. Having a dependency on a shared HWLOC library has a number of advantages: 1. Code reuse with all of the positive consequences out of this. That's the primary purpose of shared libraries. -2. Sharing HWLOC context between its clients. This avoids performing - the same operations repeatedly with identical results. +2. Sharing HWLOC context between its clients. This avoids performing the same + operations repeatedly with identical results. 3. A drop-in replacement. Users are able to use their own version of HWLOC without recompilation of oneTBB. @@ -47,9 +47,9 @@ distributed together with oneTBB. * Proposal Introduce: -1. New version of ~tbbbind~ shared library with the name ~tbbbind_static~ that is - statically-linked with HWLOC library and distributed along side with the other - ~tbbbind~ versions. +1. New version of ~tbbbind~ shared library with the name ~tbbbind_static~ that + is statically-linked with HWLOC library and distributed along side with the + other ~tbbbind~ versions. 2. Loading of the new ~tbbbind_static~ as the last attempt, i.e., a fallback path, to resolve the dependency on functionality provided by ~tbbbind~ layer. 3. Printing what ~tbbbind~ version is used when ~TBB_VERSION=1~ environment From 90bfaba7fbf1a00869262a6d2c6a9be2c4b35f46 Mon Sep 17 00:00:00 2001 From: "Fedotov, Aleksei" Date: Mon, 11 Nov 2024 12:00:45 +0100 Subject: [PATCH 04/13] Address Alexey's remarks --- .../increased_availability/README.org | 112 ++++++++++-------- 1 file changed, 63 insertions(+), 49 deletions(-) diff --git a/rfcs/proposed/simplified_numa_support/increased_availability/README.org b/rfcs/proposed/simplified_numa_support/increased_availability/README.org index 71ce7b7795..9f6cf76cd3 100755 --- a/rfcs/proposed/simplified_numa_support/increased_availability/README.org +++ b/rfcs/proposed/simplified_numa_support/increased_availability/README.org @@ -11,11 +11,20 @@ by the library as part of its initialization stage. In turn, each ~tbbbind~ has a hard dependency, i.e., relies on load-time linking, on a concrete version of the HWLOC library [1, 2]. The soft dependency of oneTBB on ~tbbbind~ allows the library to continue its execution even if the system loader is unable to resolve -the hard dependency on HWLOC for ~tbbbind~. In this case, the NUMA support API -does nothing. An error is also not reported and the code that uses NUMA support -facilities may continue running expecting it to work. Such behavior is not -readily noticable by the users of oneTBB and this represents the main problem -with the current behavior. +the hard dependency on HWLOC for ~tbbbind~. In this case, the HW topology is not +discovered and the machine is seen as if all CPU cores were uniform, which is +the default TBB behavior when NUMA constraints are not used. Thus, the following +code returns meaningless values as these values are just ignored by oneTBB: + +#+begin_src C++ +std::vector numa_nodes = oneapi::tbb::info::numa_nodes(); +std::vector core_types = oneapi::tbb::info::core_types(); +#+end_src + +An error is also not reported and the client code that uses NUMA support +facilities may continue running expecting it to work as it was intended. Such +behavior is not readily noticable by developers that use oneTBB and this +represents the main problem with the current behavior. Having a dependency on a shared HWLOC library has a number of advantages: 1. Code reuse with all of the positive consequences out of this. That's the @@ -25,77 +34,82 @@ Having a dependency on a shared HWLOC library has a number of advantages: 3. A drop-in replacement. Users are able to use their own version of HWLOC without recompilation of oneTBB. -The only disadvantage from depending on HWLOC dynamically is that user needs to -make sure the library is available and can be found by oneTBB. Depending on the -distribution model of a user's code, this is achieved either by: +The only disadvantage from depending on HWLOC library dynamically is that the +developers that use oneTBB's NUMA support API need to make sure the library is +available and can be found by oneTBB. Depending on the distribution model of a +developer's code, this is achieved either by: 1. Asking the end user to have necessary version of a dependency pre-installed. 2. Bundling necessary HWLOC version together with other pieces of a product release. However, the requirement to fulfill one of the above steps for the NUMA API to -start paying off may be considered as an incovenience for users of oneTBB and, -what is more important, it is not always obvious. Especially, due to silent -behavior in case HWLOC library cannot be found in the environment. +start paying off may be considered as an incovenience and, what is more +important, it is not always obvious that one of these steps is needed. +Especially, due to silent behavior in case HWLOC library cannot be found in the +environment. -This proposal suggests an improvement on these two points by having HWLOC -library linked statically with one of the ~tbbbind~ libraries that are -distributed together with oneTBB. +This proposal suggests an improvement to reduce the effect of the disadvantage +being dependent on a dynamic version of HWLOC library by having it linked +statically with one of the ~tbbbind~ libraries that are distributed together +with oneTBB, yet leaving possibility to specify another version of HWLOC library +if users see the need. [1] [[https://www.open-mpi.org/projects/hwloc/][HWLOC project main page]] [2] [[https://github.com/open-mpi/hwloc][HWLOC project repository on GitHub]] * Proposal -Introduce: -1. New version of ~tbbbind~ shared library with the name ~tbbbind_static~ that - is statically-linked with HWLOC library and distributed along side with the - other ~tbbbind~ versions. -2. Loading of the new ~tbbbind_static~ as the last attempt, i.e., a fallback - path, to resolve the dependency on functionality provided by ~tbbbind~ layer. -3. Printing what ~tbbbind~ version is used when ~TBB_VERSION=1~ environment - variable is present. +1. Introduce new variant of the ~tbbbind~ library with the name ~tbbbind_static~ + which is statically-linked with HWLOC library and distributed along side with + the other ~tbbbind~ variants. +2. Add loading of ~tbbbind_static~ as the last attempt to resolve the dependency + on functionality provided by ~tbbbind~ layer. +3. Update the oneTBB documentation considering [[https://oneapi-src.github.io/oneTBB/search.html?q=tbb%3A%3Ainfo][these documentation pages]] to + include steps determining the variant of ~tbbbind~ being used. ** Advantages -The proposed behavior allows having a fallback mechanism for resolving a -dependency on HWLOC library in case it cannot be found in the environment, while -still preferring user-provided version of HWLOC. +The proposed behavior allows having a mechanism for resolving a dependency on +HWLOC library in case it cannot be found in the environment, while still +preferring user-provided version of HWLOC. -As a result, the following use of oneTBB API should work as expected, returning -enumerated list of actual NUMA nodes and core types on the system the code is -running on, provided that the loaded HWLOC library works on that system: - -#+begin_src C++ -std::vector numa_nodes = oneapi::tbb::info::numa_nodes(); -std::vector core_types = oneapi::tbb::info::core_types(); -#+end_src +As a result, the problematic use of oneTBB API mentioned above should work as +expected, returning enumerated list of actual NUMA nodes and core types on the +system the code is running on, provided that the loaded HWLOC library works on +that system and that an application properly distributes all binaries of oneTBB, +sets the environment so that the necessary variant of ~tbbbind~ library can be +found and loaded. ** Disadvantages -1. The oneTBB distribution package is now extended with an additional version of - ~tbbbind~ library that is statically linked with certain version of HWLOC. +1. There will be one more ~tbbbind~ variation binary to ship in oneTBB + distribution packages. 2. Still silent by default behavior in case user failed to setup environment with their own version of HWLOC library correctly. Although, specifying ~TBB_VERSION=1~ envar will help identifying an issue with an environment setup pretty quickly. -3. Non-shared HWLOC context in case of ~tbbbind_static~ library is used. +3. Statically-linked HWLOC does not share its context with those loaded + dynamically in case of ~tbbbind_static~ library is used. -* Alternative Solutions Considered -The other solution for being silent in case HWLOC library is not found is either -to issue a warning or to throw an exception. +* Alternative handling of inability to parse system topology +The other behavior in case HWLOC library cannot be found is to be more explicit +about the problem of a missing component and to either issue a warning or to +refuse working requiring one of the ~tbbbind~ variant to be loaded (e.g., throw +an exception). -Comparing these alternative solutions to the one proposed. +Comparing these alternative approaches to the one proposed. ** Common Advantages -1. Explicitly tells the user that the functionality being used is not going to - work. -2. Does not require additional version of ~tbbbind~ library to be distributed - along with the others. +- Explicitly tells that the functionality being used is not going to work + instead of just being silent. +- Does not require additional variant of ~tbbbind~ library to be distributed + along with the others. ** Common Disadvantages -- Requires additional step from the user side to resolve the problem. +- Requires additional step from the user side to resolve the problem. In other + words, it does not provide complete solution to the problem. ** Disadvantages of Issuing a Warning -- Does not solve the problem completely as a warning may still not be visible to - the user, especially if standard streams are closed. +- The warning may still not be visible, especially if standard streams are + closed. ** Disadvantages of Throwing an Exception -1. May break existing code as it does not expect an exception to be thrown. -2. Requires introduction of an additional exception hierarchy +- May break existing code as it does not expect an exception to be thrown. +- Requires introduction of an additional exception hierarchy. From 58a441f59cbc5f4a27f73ec11e2919b7042f2327 Mon Sep 17 00:00:00 2001 From: "Fedotov, Aleksei" Date: Wed, 13 Nov 2024 20:58:04 +0100 Subject: [PATCH 05/13] Move and rename in accordance with the main RFC --- .../increase-numa-support-availability.org} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename rfcs/proposed/{simplified_numa_support/increased_availability/README.org => numa_support/increase-numa-support-availability.org} (100%) diff --git a/rfcs/proposed/simplified_numa_support/increased_availability/README.org b/rfcs/proposed/numa_support/increase-numa-support-availability.org similarity index 100% rename from rfcs/proposed/simplified_numa_support/increased_availability/README.org rename to rfcs/proposed/numa_support/increase-numa-support-availability.org From 5e8b79e6c1364d0517f1088b08f04ab27ffdbd48 Mon Sep 17 00:00:00 2001 From: "Fedotov, Aleksei" Date: Wed, 13 Nov 2024 21:04:41 +0100 Subject: [PATCH 06/13] Fix small readability issue --- .../numa_support/increase-numa-support-availability.org | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/proposed/numa_support/increase-numa-support-availability.org b/rfcs/proposed/numa_support/increase-numa-support-availability.org index 9f6cf76cd3..3705d2f78e 100755 --- a/rfcs/proposed/numa_support/increase-numa-support-availability.org +++ b/rfcs/proposed/numa_support/increase-numa-support-availability.org @@ -80,7 +80,7 @@ sets the environment so that the necessary variant of ~tbbbind~ library can be found and loaded. ** Disadvantages -1. There will be one more ~tbbbind~ variation binary to ship in oneTBB +1. There will be one more variation of a ~tbbbind~ binary to ship in oneTBB distribution packages. 2. Still silent by default behavior in case user failed to setup environment with their own version of HWLOC library correctly. Although, specifying From d0bf3731b703e06c6f6f389d721302b47280c5ca Mon Sep 17 00:00:00 2001 From: "Fedotov, Aleksei" Date: Thu, 21 Nov 2024 19:24:11 +0100 Subject: [PATCH 07/13] Address remarks on review --- ...lity.org => tbbbind-link-static-hwloc.org} | 57 +++++++++++-------- 1 file changed, 33 insertions(+), 24 deletions(-) rename rfcs/proposed/numa_support/{increase-numa-support-availability.org => tbbbind-link-static-hwloc.org} (65%) diff --git a/rfcs/proposed/numa_support/increase-numa-support-availability.org b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org similarity index 65% rename from rfcs/proposed/numa_support/increase-numa-support-availability.org rename to rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org index 3705d2f78e..1405101c15 100755 --- a/rfcs/proposed/numa_support/increase-numa-support-availability.org +++ b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org @@ -8,31 +8,40 @@ Specifically, its section about "Increased availability of NUMA support". * Introduction oneTBB has a soft dependency on several variants of ~tbbbind~, which are loaded by the library as part of its initialization stage. In turn, each ~tbbbind~ has -a hard dependency, i.e., relies on load-time linking, on a concrete version of -the HWLOC library [1, 2]. The soft dependency of oneTBB on ~tbbbind~ allows the -library to continue its execution even if the system loader is unable to resolve -the hard dependency on HWLOC for ~tbbbind~. In this case, the HW topology is not -discovered and the machine is seen as if all CPU cores were uniform, which is -the default TBB behavior when NUMA constraints are not used. Thus, the following -code returns meaningless values as these values are just ignored by oneTBB: +a hard dependency on a concrete version of the HWLOC library [1, 2]. The soft +dependency of oneTBB on ~tbbbind~ allows the library to continue its execution +even if the system loader is unable to resolve the hard dependency on HWLOC for +~tbbbind~. In this case, the HW topology is not discovered and the machine is +seen as if all CPU cores were uniform, which is the default TBB behavior when +NUMA constraints are not used. Thus, the following code returns the values that +do not reflect the real topology and do not matter: #+begin_src C++ std::vector numa_nodes = oneapi::tbb::info::numa_nodes(); std::vector core_types = oneapi::tbb::info::core_types(); #+end_src -An error is also not reported and the client code that uses NUMA support -facilities may continue running expecting it to work as it was intended. Such -behavior is not readily noticable by developers that use oneTBB and this -represents the main problem with the current behavior. - -Having a dependency on a shared HWLOC library has a number of advantages: -1. Code reuse with all of the positive consequences out of this. That's the - primary purpose of shared libraries. -2. Sharing HWLOC context between its clients. This avoids performing the same - operations repeatedly with identical results. -3. A drop-in replacement. Users are able to use their own version of HWLOC - without recompilation of oneTBB. +This lack of valid HW topology data due to absence of a third party library is +the major problem with the current oneTBB behavior. There is no diagnostics for +the issue, which likely makes it unnoticeable by developers, and the code that +uses oneTBB NUMA support facilities continues running but does not use NUMA as +intended. + +Having a dependency on a shared HWLOC library has advantages: +1. Code reuse with all of the positive consequences out of this, including + relying on the same code that has been tested and debugged, allowing the OS + to share it among different processes, which consequently improves on cache + locality and memory footprint. That's the primary purpose of shared + libraries. +2. A drop-in replacement. Users are able to use their own version of HWLOC + without recompilation of oneTBB. This specific version of HWLOC could include + a hotfix to support a particular and/or new hardware that a customer has, but + whose support is not yet upstreamed to HWLOC project. It is also possible + that such support won't be upstreamed at all if that hardware is not going to + be available for massive users. It could also be a development version of + HWLOC that someone wants to test on their systems first. Of course, they can + do it with the static version as well, but that's more cumbersome as it + requires recompilation of every dependent component. The only disadvantage from depending on HWLOC library dynamically is that the developers that use oneTBB's NUMA support API need to make sure the library is @@ -60,7 +69,7 @@ if users see the need. * Proposal 1. Introduce new variant of the ~tbbbind~ library with the name ~tbbbind_static~ - which is statically-linked with HWLOC library and distributed along side with + which is linked with a static HWLOC library and distributed along side with the other ~tbbbind~ variants. 2. Add loading of ~tbbbind_static~ as the last attempt to resolve the dependency on functionality provided by ~tbbbind~ layer. @@ -82,10 +91,10 @@ found and loaded. ** Disadvantages 1. There will be one more variation of a ~tbbbind~ binary to ship in oneTBB distribution packages. -2. Still silent by default behavior in case user failed to setup environment - with their own version of HWLOC library correctly. Although, specifying - ~TBB_VERSION=1~ envar will help identifying an issue with an environment - setup pretty quickly. +2. By default still no diagnostics if users failed to setup environment with + their own version of HWLOC library correctly. Although, specifying + ~TBB_VERSION=1~ envar will help identifying an issue with setup of + environment pretty quickly. 3. Statically-linked HWLOC does not share its context with those loaded dynamically in case of ~tbbbind_static~ library is used. From a10984c47a7f5674bbde9ea88e4de31a692231d0 Mon Sep 17 00:00:00 2001 From: "Fedotov, Aleksei" Date: Fri, 22 Nov 2024 12:51:59 +0100 Subject: [PATCH 08/13] Move references to the bottom --- .../numa_support/tbbbind-link-static-hwloc.org | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org index 1405101c15..96fafb0ff7 100755 --- a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org +++ b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org @@ -63,10 +63,6 @@ statically with one of the ~tbbbind~ libraries that are distributed together with oneTBB, yet leaving possibility to specify another version of HWLOC library if users see the need. -[1] [[https://www.open-mpi.org/projects/hwloc/][HWLOC project main page]] - -[2] [[https://github.com/open-mpi/hwloc][HWLOC project repository on GitHub]] - * Proposal 1. Introduce new variant of the ~tbbbind~ library with the name ~tbbbind_static~ which is linked with a static HWLOC library and distributed along side with @@ -95,8 +91,6 @@ found and loaded. their own version of HWLOC library correctly. Although, specifying ~TBB_VERSION=1~ envar will help identifying an issue with setup of environment pretty quickly. -3. Statically-linked HWLOC does not share its context with those loaded - dynamically in case of ~tbbbind_static~ library is used. * Alternative handling of inability to parse system topology The other behavior in case HWLOC library cannot be found is to be more explicit @@ -122,3 +116,7 @@ Comparing these alternative approaches to the one proposed. ** Disadvantages of Throwing an Exception - May break existing code as it does not expect an exception to be thrown. - Requires introduction of an additional exception hierarchy. + +* References +1. [[https://www.open-mpi.org/projects/hwloc/][HWLOC project main page]] +2. [[https://github.com/open-mpi/hwloc][HWLOC project repository on GitHub]] From 35d7f553b8d537f489a12a1abeae69f31edabc49 Mon Sep 17 00:00:00 2001 From: "Fedotov, Aleksei" Date: Wed, 27 Nov 2024 16:35:48 +0100 Subject: [PATCH 09/13] Update the title to better match the proposal --- rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org index 96fafb0ff7..0a2fcb0834 100755 --- a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org +++ b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org @@ -1,6 +1,6 @@ # -*- fill-column: 80; -*- -#+title: Improve predictability of API for NUMA support +#+title: Link ~tbbbind~ with static HWLOC to improve predictability of NUMA support API *Note:* This is a sub-RFC of the https://github.com/oneapi-src/oneTBB/pull/1535. Specifically, its section about "Increased availability of NUMA support". From 81021be345f6bad261f948128c70a28fa84f7127 Mon Sep 17 00:00:00 2001 From: "Fedotov, Aleksei" Date: Wed, 27 Nov 2024 16:36:21 +0100 Subject: [PATCH 10/13] Replace tbbbind linked with an old HWLOC --- .../tbbbind-link-static-hwloc.org | 45 ++++++++++--------- 1 file changed, 24 insertions(+), 21 deletions(-) diff --git a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org index 0a2fcb0834..8628da284b 100755 --- a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org +++ b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org @@ -63,34 +63,37 @@ statically with one of the ~tbbbind~ libraries that are distributed together with oneTBB, yet leaving possibility to specify another version of HWLOC library if users see the need. +Since HWLOC 1.x is an old version of HWLOC and modern versions of operating +systems install HWLOC 2.x by default, the probability of someone who is +constrained by using only HWLOC 1.x on their system is relatively small. Thus, +the filename of the ~tbbbind~ library that is linked against HWLOC 1.x can be +re-used for the library that is linked against static HWLOC version 2.x. + * Proposal -1. Introduce new variant of the ~tbbbind~ library with the name ~tbbbind_static~ - which is linked with a static HWLOC library and distributed along side with - the other ~tbbbind~ variants. -2. Add loading of ~tbbbind_static~ as the last attempt to resolve the dependency - on functionality provided by ~tbbbind~ layer. +1. Replace the dynamic link of ~tbbbind~ library which is currently linked + against HWLOC 1.x with the link to a static HWLOC library version 2.x. +2. Add loading of that ~tbbbind~ variant as the last attempt to resolve the + dependency on functionality provided by ~tbbbind~ layer. 3. Update the oneTBB documentation considering [[https://oneapi-src.github.io/oneTBB/search.html?q=tbb%3A%3Ainfo][these documentation pages]] to include steps determining the variant of ~tbbbind~ being used. ** Advantages -The proposed behavior allows having a mechanism for resolving a dependency on -HWLOC library in case it cannot be found in the environment, while still -preferring user-provided version of HWLOC. - -As a result, the problematic use of oneTBB API mentioned above should work as -expected, returning enumerated list of actual NUMA nodes and core types on the -system the code is running on, provided that the loaded HWLOC library works on -that system and that an application properly distributes all binaries of oneTBB, -sets the environment so that the necessary variant of ~tbbbind~ library can be -found and loaded. +1. The proposed behavior allows having a mechanism for resolving a dependency on + HWLOC library in case it cannot be found in the environment, while still + preferring user-provided version of HWLOC. As a result, the problematic use of + oneTBB API mentioned above should work as expected, returning enumerated list + of actual NUMA nodes and core types on the system the code is running on, + provided that the loaded HWLOC library works on that system and that an + application properly distributes all binaries of oneTBB, sets the environment + so that the necessary variant of ~tbbbind~ library can be found and loaded. +2. The drop of support for HWLOC 1.x allows to not introducing additional + ~tbbbind~ variant of the library, yet maintaining support for popular + versions of HWLOC. ** Disadvantages -1. There will be one more variation of a ~tbbbind~ binary to ship in oneTBB - distribution packages. -2. By default still no diagnostics if users failed to setup environment with - their own version of HWLOC library correctly. Although, specifying - ~TBB_VERSION=1~ envar will help identifying an issue with setup of - environment pretty quickly. +By default still no diagnostics if users failed to setup environment with their +own version of HWLOC library correctly. Although, specifying ~TBB_VERSION=1~ +envar will help identifying an issue with setup of environment pretty quickly. * Alternative handling of inability to parse system topology The other behavior in case HWLOC library cannot be found is to be more explicit From fd3661e09476cca30f0a7f802bc18831a3b020b6 Mon Sep 17 00:00:00 2001 From: Aleksei Fedotov Date: Tue, 21 Jan 2025 18:49:06 +0100 Subject: [PATCH 11/13] Apply suggestions from code review Co-authored-by: Alexandra --- .../tbbbind-link-static-hwloc.org | 92 +++++++++---------- 1 file changed, 45 insertions(+), 47 deletions(-) diff --git a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org index 8628da284b..e2851a7e29 100755 --- a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org +++ b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org @@ -2,32 +2,31 @@ #+title: Link ~tbbbind~ with static HWLOC to improve predictability of NUMA support API -*Note:* This is a sub-RFC of the https://github.com/oneapi-src/oneTBB/pull/1535. -Specifically, its section about "Increased availability of NUMA support". +*Note:* This document is a sub-RFC of the https://github.com/oneapi-src/oneTBB/pull/1535. +Specifically, the "Increased availability of NUMA support" section. * Introduction -oneTBB has a soft dependency on several variants of ~tbbbind~, which are loaded -by the library as part of its initialization stage. In turn, each ~tbbbind~ has -a hard dependency on a concrete version of the HWLOC library [1, 2]. The soft -dependency of oneTBB on ~tbbbind~ allows the library to continue its execution -even if the system loader is unable to resolve the hard dependency on HWLOC for -~tbbbind~. In this case, the HW topology is not discovered and the machine is -seen as if all CPU cores were uniform, which is the default TBB behavior when -NUMA constraints are not used. Thus, the following code returns the values that -do not reflect the real topology and do not matter: +oneTBB has a soft dependency on several variants of ~tbbbind~, which +the library loads during the initialization stage. Each ~tbbbind~, in turn, has +a hard dependency on a specific version of the HWLOC library [1, 2]. The soft +dependency means that the library continues the execution +even if the system loader fails to resolve the hard dependency on HWLOC for +~tbbbind~. In this case, oneTBB does not discover the hardware topology. +Instead, it defaults to viewing all CPU cores as uniform, consistent with TBB behavior when +NUMA constraints are not used. As a result, the following code returns the irrelevant values that +do not reflect the actual topology: #+begin_src C++ std::vector numa_nodes = oneapi::tbb::info::numa_nodes(); std::vector core_types = oneapi::tbb::info::core_types(); #+end_src -This lack of valid HW topology data due to absence of a third party library is -the major problem with the current oneTBB behavior. There is no diagnostics for -the issue, which likely makes it unnoticeable by developers, and the code that -uses oneTBB NUMA support facilities continues running but does not use NUMA as -intended. +This lack of valid HW topology, caused by the absence of a third-party library, is +the major problem with the current oneTBB behavior. The problem lies in the lack of diagnostics +making it difficult for developers to detect. +As a result, the code continues to run but fails to use NUMA as intended. -Having a dependency on a shared HWLOC library has advantages: +Dependency on a shared HWLOC library has the following benefits: 1. Code reuse with all of the positive consequences out of this, including relying on the same code that has been tested and debugged, allowing the OS to share it among different processes, which consequently improves on cache @@ -57,45 +56,45 @@ important, it is not always obvious that one of these steps is needed. Especially, due to silent behavior in case HWLOC library cannot be found in the environment. -This proposal suggests an improvement to reduce the effect of the disadvantage -being dependent on a dynamic version of HWLOC library by having it linked -statically with one of the ~tbbbind~ libraries that are distributed together -with oneTBB, yet leaving possibility to specify another version of HWLOC library -if users see the need. +The proposal is to reduce the effect of the disadvantage +of relying on a dynamic HWLOC library. +The improvements involve statically linking HWLOC with one of the ~tbbbind~ libraries distributed together +with oneTBB. At the same time, you retain the flexibility to specify different version of HWLOC library +if needed. -Since HWLOC 1.x is an old version of HWLOC and modern versions of operating -systems install HWLOC 2.x by default, the probability of someone who is -constrained by using only HWLOC 1.x on their system is relatively small. Thus, -the filename of the ~tbbbind~ library that is linked against HWLOC 1.x can be -re-used for the library that is linked against static HWLOC version 2.x. +Since HWLOC 1.x is an older version and modern operating +systems install HWLOC 2.x by default, the probability of users being +restricted to HWLOC 1.x is relatively small. Thus, +we can reuse the filename of the ~tbbbind~ library linked to HWLOC 1.x +for the library linked against a static HWLOC 2.x. * Proposal -1. Replace the dynamic link of ~tbbbind~ library which is currently linked - against HWLOC 1.x with the link to a static HWLOC library version 2.x. +1. Replace the dynamic link of ~tbbbind~ library currently linked + against HWLOC 1.x with a link to a static HWLOC library version 2.x. 2. Add loading of that ~tbbbind~ variant as the last attempt to resolve the - dependency on functionality provided by ~tbbbind~ layer. -3. Update the oneTBB documentation considering [[https://oneapi-src.github.io/oneTBB/search.html?q=tbb%3A%3Ainfo][these documentation pages]] to - include steps determining the variant of ~tbbbind~ being used. + dependency on functionality provided by the ~tbbbind~ layer. +3. Update the oneTBB documentation, including [[https://oneapi-src.github.io/oneTBB/search.html?q=tbb%3A%3Ainfo][these pages]], to + detail the steps for identifying which ~tbbbind~ is being used. ** Advantages -1. The proposed behavior allows having a mechanism for resolving a dependency on - HWLOC library in case it cannot be found in the environment, while still - preferring user-provided version of HWLOC. As a result, the problematic use of - oneTBB API mentioned above should work as expected, returning enumerated list +1. The proposed behavior introduces a fallback mechanism for resolving + the HWLOC library dependency when it is not in the environment, while still + preferring user-provided versions. As a result, the problematic oneTBB API usage + works as expected, returning an enumerated list of actual NUMA nodes and core types on the system the code is running on, provided that the loaded HWLOC library works on that system and that an application properly distributes all binaries of oneTBB, sets the environment so that the necessary variant of ~tbbbind~ library can be found and loaded. 2. The drop of support for HWLOC 1.x allows to not introducing additional - ~tbbbind~ variant of the library, yet maintaining support for popular + ~tbbbind~ variant while maintaining support for widely used versions of HWLOC. ** Disadvantages -By default still no diagnostics if users failed to setup environment with their -own version of HWLOC library correctly. Although, specifying ~TBB_VERSION=1~ -envar will help identifying an issue with setup of environment pretty quickly. +By default, there is still no diagnostics if you fail to correctly setup an environment with your +version of HWLOC. Although, specifying the ~TBB_VERSION=1~ +environment variable helps identify configuration issues quickly. -* Alternative handling of inability to parse system topology +* Alternative Handling for Missing System Topology The other behavior in case HWLOC library cannot be found is to be more explicit about the problem of a missing component and to either issue a warning or to refuse working requiring one of the ~tbbbind~ variant to be loaded (e.g., throw @@ -103,21 +102,20 @@ an exception). Comparing these alternative approaches to the one proposed. ** Common Advantages -- Explicitly tells that the functionality being used is not going to work - instead of just being silent. -- Does not require additional variant of ~tbbbind~ library to be distributed - along with the others. +- Explicitly indicates that the functionality being used does not work, + instead of failing silently. +- Avoids the need to distribute an additional variant of ~tbbbind~ library. ** Common Disadvantages - Requires additional step from the user side to resolve the problem. In other words, it does not provide complete solution to the problem. ** Disadvantages of Issuing a Warning -- The warning may still not be visible, especially if standard streams are +- The warning may be unnoticed, especially if standard streams are closed. ** Disadvantages of Throwing an Exception -- May break existing code as it does not expect an exception to be thrown. +- May break existing code that does not expect an exception to be thrown. - Requires introduction of an additional exception hierarchy. * References From c9d2572d831e413585365df5520e0cc8363430e6 Mon Sep 17 00:00:00 2001 From: Aleksei Fedotov Date: Tue, 21 Jan 2025 18:53:39 +0100 Subject: [PATCH 12/13] Apply suggestions from code review --- rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org index e2851a7e29..da67be97cc 100755 --- a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org +++ b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org @@ -85,7 +85,7 @@ for the library linked against a static HWLOC 2.x. provided that the loaded HWLOC library works on that system and that an application properly distributes all binaries of oneTBB, sets the environment so that the necessary variant of ~tbbbind~ library can be found and loaded. -2. The drop of support for HWLOC 1.x allows to not introducing additional +2. Dropping support for HWLOC 1.x, does not introduce an additional ~tbbbind~ variant while maintaining support for widely used versions of HWLOC. @@ -110,11 +110,11 @@ Comparing these alternative approaches to the one proposed. - Requires additional step from the user side to resolve the problem. In other words, it does not provide complete solution to the problem. -** Disadvantages of Issuing a Warning +*** Disadvantages of Issuing a Warning - The warning may be unnoticed, especially if standard streams are closed. -** Disadvantages of Throwing an Exception +*** Disadvantages of Throwing an Exception - May break existing code that does not expect an exception to be thrown. - Requires introduction of an additional exception hierarchy. From f03f6699e113e1dcf39d927ec1b3e85726b5a4df Mon Sep 17 00:00:00 2001 From: "Fedotov, Aleksei" Date: Thu, 23 Jan 2025 15:04:31 +0100 Subject: [PATCH 13/13] Apply review remarks and align the lines --- .../tbbbind-link-static-hwloc.org | 84 +++++++++---------- 1 file changed, 40 insertions(+), 44 deletions(-) diff --git a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org index da67be97cc..d108ac1283 100755 --- a/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org +++ b/rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org @@ -1,30 +1,30 @@ # -*- fill-column: 80; -*- -#+title: Link ~tbbbind~ with static HWLOC to improve predictability of NUMA support API +#+title: Link ~tbbbind~ with Static HWLOC for NUMA API predictability -*Note:* This document is a sub-RFC of the https://github.com/oneapi-src/oneTBB/pull/1535. -Specifically, the "Increased availability of NUMA support" section. +*Note:* This document is a sub-RFC of the [[file:README.md][umbrella RFC about improving NUMA +support]]. Specifically, the "Increased availability of NUMA support" section. * Introduction -oneTBB has a soft dependency on several variants of ~tbbbind~, which -the library loads during the initialization stage. Each ~tbbbind~, in turn, has -a hard dependency on a specific version of the HWLOC library [1, 2]. The soft -dependency means that the library continues the execution -even if the system loader fails to resolve the hard dependency on HWLOC for -~tbbbind~. In this case, oneTBB does not discover the hardware topology. -Instead, it defaults to viewing all CPU cores as uniform, consistent with TBB behavior when -NUMA constraints are not used. As a result, the following code returns the irrelevant values that -do not reflect the actual topology: +oneTBB has a soft dependency on several variants of ~tbbbind~, which the library +loads during the initialization stage. Each ~tbbbind~, in turn, has a hard +dependency on a specific version of the HWLOC library [1, 2]. The soft +dependency means that the library continues the execution even if the system +loader fails to resolve the hard dependency on HWLOC for ~tbbbind~. In this +case, oneTBB does not discover the hardware topology. Instead, it defaults to +viewing all CPU cores as uniform, consistent with TBB behavior when NUMA +constraints are not used. As a result, the following code returns the irrelevant +values that do not reflect the actual topology: #+begin_src C++ std::vector numa_nodes = oneapi::tbb::info::numa_nodes(); std::vector core_types = oneapi::tbb::info::core_types(); #+end_src -This lack of valid HW topology, caused by the absence of a third-party library, is -the major problem with the current oneTBB behavior. The problem lies in the lack of diagnostics -making it difficult for developers to detect. -As a result, the code continues to run but fails to use NUMA as intended. +This lack of valid HW topology, caused by the absence of a third-party library, +is the major problem with the current oneTBB behavior. The problem lies in the +lack of diagnostics making it difficult for developers to detect. As a result, +the code continues to run but fails to use NUMA as intended. Dependency on a shared HWLOC library has the following benefits: 1. Code reuse with all of the positive consequences out of this, including @@ -56,17 +56,15 @@ important, it is not always obvious that one of these steps is needed. Especially, due to silent behavior in case HWLOC library cannot be found in the environment. -The proposal is to reduce the effect of the disadvantage -of relying on a dynamic HWLOC library. -The improvements involve statically linking HWLOC with one of the ~tbbbind~ libraries distributed together -with oneTBB. At the same time, you retain the flexibility to specify different version of HWLOC library -if needed. +The proposal is to reduce the effect of the disadvantage of relying on a dynamic +HWLOC library. The improvements involve statically linking HWLOC with one of the +~tbbbind~ libraries distributed together with oneTBB. At the same time, you +retain the flexibility to specify different version of HWLOC library if needed. -Since HWLOC 1.x is an older version and modern operating -systems install HWLOC 2.x by default, the probability of users being -restricted to HWLOC 1.x is relatively small. Thus, -we can reuse the filename of the ~tbbbind~ library linked to HWLOC 1.x -for the library linked against a static HWLOC 2.x. +Since HWLOC 1.x is an older version and modern operating systems install HWLOC +2.x by default, the probability of users being restricted to HWLOC 1.x is +relatively small. Thus, we can reuse the filename of the ~tbbbind~ library +linked to HWLOC 1.x for the library linked against a static HWLOC 2.x. * Proposal 1. Replace the dynamic link of ~tbbbind~ library currently linked @@ -77,21 +75,20 @@ for the library linked against a static HWLOC 2.x. detail the steps for identifying which ~tbbbind~ is being used. ** Advantages -1. The proposed behavior introduces a fallback mechanism for resolving - the HWLOC library dependency when it is not in the environment, while still - preferring user-provided versions. As a result, the problematic oneTBB API usage - works as expected, returning an enumerated list - of actual NUMA nodes and core types on the system the code is running on, - provided that the loaded HWLOC library works on that system and that an - application properly distributes all binaries of oneTBB, sets the environment - so that the necessary variant of ~tbbbind~ library can be found and loaded. -2. Dropping support for HWLOC 1.x, does not introduce an additional - ~tbbbind~ variant while maintaining support for widely used - versions of HWLOC. +1. The proposed behavior introduces a fallback mechanism for resolving the HWLOC + library dependency when it is not in the environment, while still preferring + user-provided versions. As a result, the problematic oneTBB API usage works + as expected, returning an enumerated list of actual NUMA nodes and core types + on the system the code is running on, provided that the loaded HWLOC library + works on that system and that an application properly distributes all + binaries of oneTBB, sets the environment so that the necessary variant of + ~tbbbind~ library can be found and loaded. +2. Dropping support for HWLOC 1.x, does not introduce an additional ~tbbbind~ + variant while maintaining support for widely used versions of HWLOC. ** Disadvantages -By default, there is still no diagnostics if you fail to correctly setup an environment with your -version of HWLOC. Although, specifying the ~TBB_VERSION=1~ +By default, there is still no diagnostics if you fail to correctly setup an +environment with your version of HWLOC. Although, specifying the ~TBB_VERSION=1~ environment variable helps identify configuration issues quickly. * Alternative Handling for Missing System Topology @@ -102,17 +99,16 @@ an exception). Comparing these alternative approaches to the one proposed. ** Common Advantages -- Explicitly indicates that the functionality being used does not work, - instead of failing silently. -- Avoids the need to distribute an additional variant of ~tbbbind~ library. +- Explicitly indicates that the functionality being used does not work, instead + of failing silently. +- Avoids the need to distribute an additional variant of ~tbbbind~ library. ** Common Disadvantages - Requires additional step from the user side to resolve the problem. In other words, it does not provide complete solution to the problem. *** Disadvantages of Issuing a Warning -- The warning may be unnoticed, especially if standard streams are - closed. +- The warning may be unnoticed, especially if standard streams are closed. *** Disadvantages of Throwing an Exception - May break existing code that does not expect an exception to be thrown.