Skip to content

Commit

Permalink
Update documentation for oneTBB 2021.10.0 (#1147)
Browse files Browse the repository at this point in the history
* Update documentation for oneTBB 2021.10.0

Signed-off-by: Olga Malysheva <[email protected]>
  • Loading branch information
omalyshe authored Jul 24, 2023
1 parent 9cb1b45 commit be2fb93
Show file tree
Hide file tree
Showing 5 changed files with 266 additions and 24 deletions.
32 changes: 13 additions & 19 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,29 +21,23 @@ This document contains changes of oneTBB compared to the last release.
- [New Features](#new-features)
- [Known Limitations](#known-limitations)
- [Fixed Issues](#fixed-issues)
- [Open-source Contributions Integrated](#open-source-contributions-integrated)

## :tada: New Features
- Hybrid CPU support is now a fully supported feature.
- Since C++17, parallel algorithms and Flow Graph nodes are allowed to accept pointers to the member functions and member objects as the user-provided callables.
- Added missed member functions, such as assignment operators and swap function, to the ``concurrent_queue`` and ``concurrent_bounded_queue`` containers.

## :rotating_light: Known Limitations
- A static assert will cause compilation failures in oneTBB headers when compiling with clang 12.0.0 or newer if using the LLVM standard library with -ffreestanding and C++11/14 compiler options.
- An application using Parallel STL algorithms in libstdc++ versions 9 and 10 may fail to compile due to incompatible interface changes between earlier versions of Threading Building Blocks (TBB) and oneAPI Threading Building Blocks (oneTBB). Disable support for Parallel STL algorithms by defining PSTL_USE_PARALLEL_POLICIES (in libstdc++ 9) or _GLIBCXX_USE_TBB_PAR_BACKEND (in libstdc++ 10) macro to zero before inclusion of the first standard header file in each translation unit.
- On Linux* OS, if oneAPI Threading Building Blocks (oneTBB) or Threading Building Blocks (TBB) are installed in a system folder like /usr/lib64, the application may fail to link due to the order in which the linker searches for libraries. Use the -L linker option to specify the correct location of oneTBB library. This issue does not affect the program execution.
- The oneapi::tbb::info namespace interfaces might unexpectedly change the process affinity mask on Windows* OS systems (see https://github.com/open-mpi/hwloc/issues/366 for details) when using hwloc version lower than 2.5.
- Using a hwloc version other than 1.11, 2.0, or 2.5 may cause an undefined behavior on Windows OS. See https://github.com/open-mpi/hwloc/issues/477 for details.
- The NUMA topology may be detected incorrectly on Windows OS machines where the number of NUMA node threads exceeds the size of 1 processor group.
- On Windows OS on ARM64*, when compiling an application using oneTBB with the Microsoft* Compiler, the compiler issues a warning C4324 that a structure was padded due to the alignment specifier. Consider suppressing the warning by specifying /wd4324 to the compiler command line.
- oneTBB does not support fork(), to work-around the issue, consider using task_scheduler_handle to join oneTBB worker threads before using fork().
- A static assert will cause compilation failures in oneTBB headers when compiling with clang 12.0.0 or newer if using the LLVM standard library with ``-ffreestanding`` and C++11/14 compiler options.
- An application using Parallel STL algorithms in libstdc++ versions 9 and 10 may fail to compile due to incompatible interface changes between earlier versions of Threading Building Blocks (TBB) and oneAPI Threading Building Blocks (oneTBB). Disable support for Parallel STL algorithms by defining ``PSTL_USE_PARALLEL_POLICIES`` (in libstdc++ 9) or ``_GLIBCXX_USE_TBB_PAR_BACKEND`` (in libstdc++ 10) macro to zero before inclusion of the first standard header file in each translation unit.
- On Linux* OS, if oneAPI Threading Building Blocks (oneTBB) or Threading Building Blocks (TBB) are installed in a system folder like ``/usr/lib64``, the application may fail to link due to the order in which the linker searches for libraries. Use the ``-L`` linker option to specify the correct location of oneTBB library. This issue does not affect the program execution.
- The ``oneapi::tbb::info`` namespace interfaces might unexpectedly change the process affinity mask on Windows* OS systems (see https://github.com/open-mpi/hwloc/issues/366 for details) when using hwloc* version lower than 2.5.
- Using a hwloc* version other than 1.11, 2.0, or 2.5 may cause an undefined behavior on Windows* OS. See https://github.com/open-mpi/hwloc/issues/477 for details.
- The NUMA* topology may be detected incorrectly on Windows* OS machines where the number of NUMA* node threads exceeds the size of 1 processor group.
- On Windows* OS on ARM64*, when compiling an application using oneTBB with the Microsoft* Compiler, the compiler issues a warning C4324 that a structure was padded due to the alignment specifier. Consider suppressing the warning by specifying ``/wd4324`` to the compiler command line.
- oneTBB does not support ``fork()``, to work-around the issue, consider using task_scheduler_handle to join oneTBB worker threads before using fork().
- C++ exception handling mechanism on Windows* OS on ARM64* might corrupt memory if an exception is thrown from any oneTBB parallel algorithm (see Windows* OS on ARM64* compiler issue: https://developercommunity.visualstudio.com/t/ARM64-incorrect-stack-unwinding-for-alig/1544293).

## :hammer: Fixed Issues
- Improved robustness of thread creation algorithm on Linux* OS.
- Enabled full support of Thread Sanitizer on macOS*
- Fixed the issue with destructor calls for uninitialized objects in oneapi::tbb::parallel_for_each algorithm (GitHub* #691)
- Fixed the issue with tbb::concurrent_lru_cache when items history capacity is zero (GitHub* #265)
- Fixed compilation issues on modern GCC* versions

## :octocat: Open-source Contributions Integrated
- Fixed the issue reported by the Address Sanitizer. Contributed by Rui Ueyama (https://github.com/oneapi-src/oneTBB/pull/959).
- Fixed the input_type alias exposed by flow_graph::join_node. Contributed by Deepan (https://github.com/oneapi-src/oneTBB/pull/868).
- Fixed the hang in the reserve method of concurrent unordered containers ([GitHub* #1056](http://github.com/oneapi-src/oneTBB/issues/1056)).
- Fixed the C++20 three-way comparison feature detection ([GitHub* #1093](http://github.com/oneapi-src/oneTBB/issues/1093)).
- Fixed oneTBB integration with CMake* in the Conda* environment.
31 changes: 31 additions & 0 deletions doc/GSG/next_steps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,3 +118,34 @@ Build and Run a Sample
#. If oneTBB is configured correctly, the output displays ``Sum: 5050``.


Hybrid CPU and NUMA Support
****************************

If you need NUMA/Hybrid CPU support in oneTBB, you need to make sure that HWLOC* is installed on your system.

HWLOC* (Hardware Locality) is a library that provides a portable abstraction of the hierarchical topology of modern architectures (NUMA, hybrid CPU systems, etc). oneTBB relies on HWLOC* to identify the underlying topology of the system to optimize thread scheduling and memory allocation.

Without HWLOC*, oneTBB may not take advantage of NUMA/Hybrid CPU support. Therefore, it's important to make sure that HWLOC* is installed before using oneTBB on such systems.

Check HWLOC* on the System
^^^^^^^^^^^^^^^^^^^^^^^^^^^
To check if HWLOC* is already installed on your system, run ``hwloc-ls``:

* For Linux* OS, in the command line.
* For Windows* OS, in the command prompt.

If HWLOC* is installed, the command displays information about the hardware topology of your system. If it is not installed, you receive an error message saying that the command ``hwloc-ls`` could not be found.

.. note:: For Hybrid CPU support, make sure that HWLOC* is version 2.5 or higher. For NUMA support, install HWLOC* version 1.11 or higher.

Install HWLOC*
^^^^^^^^^^^^^^

To install HWLOC*, visit the official Portable Hardware Locality website (https://www-lb.open-mpi.org/projects/hwloc/).

* For Windows* OS, binaries are available for download.
* For Linux* OS, only the source code is provided and binaries should be built.

On Linux* OS, HWLOC* can be also installed with package managers, such as APT*, YUM*, etc. To do so, run: sudo apt install hwloc.

.. note:: For Hybrid CPU support, make sure that HWLOC* is version 2.5 or higher. For NUMA support, install HWLOC* version 1.11 or higher.
9 changes: 4 additions & 5 deletions doc/main/_templates/layout.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,10 @@
var wapLocalCode = 'us-en'; // Dynamically set per localized site, see mapping table for values
var wapSection = "oneapi-tbb"; // WAP team will give you a unique section for your site
// Load TMS
(function () {
var url = 'https://www.intel.com/content/dam/www/global/wap/tms-loader.js'; // WAP file URL
var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = url;
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
(function () {
var url = 'https://www.intel.com/content/dam/www/global/wap/tms-loader.js'; // WAP file URL
var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = url;
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
})();
</script>
<link href="{{ pathto("_static/style.css", True) }}" rel="stylesheet" type="text/css">
{% endblock %}
217 changes: 217 additions & 0 deletions doc/main/tbb_userguide/std_invoke.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
.. _std_invoke:

Invoke a Callable Object
==========================

Starting from C++17, the requirements for callable objects passed to algorithms or Flow Graph nodes are relaxed. It allows using additional types of bodies.
Previously, the body of the algorithm or Flow Graph node needed to be a Function Object (see `C++ Standard Function Object <https://en.cppreference.com/w/cpp/utility/functional>`_) and provide an
``operator()`` that accepts input parameters.

Now the body needs to meet the more relaxed requirements of being Callable (see `C++ Standard Callable <https://en.cppreference.com/w/cpp/named_req/Callable>`_) that covers three types of objects:

* **Function Objects that provide operator(arg1, arg2, ...)**, which accepts the input parameters
* **Pointers to member functions** that you can use as the body of the algorithm or the Flow Graph node
* **Pointers to member objects** work as the body of the algorithm or parallel construct

You can use it not only for a Flow Graph but also for algorithms. See the example below:

.. code::
// The class models oneTBB Range
class StrideRange {
public:
StrideRange(int* s, std::size_t sz, std::size_t str)
: start(s), size(sz), stride(str) {}
// A copy constructor
StrideRange(const StrideRange&) = default;
// A splitting constructor
StrideRange(StrideRange& other, oneapi::tbb::split)
: start(other.start), size(other.size / 2)
{
other.size -= size;
other.start += size;
}
~StrideRange() = default;
// Indicate if the range is empty
bool empty() const {
return size == 0;
}
// Indicate if the range can be divided
bool is_divisible() const {
return size >= stride;
}
void iterate() const {
for (std::size_t i = 0; i < size; i += stride) {
// Performed an action for each element of the range,
// implement the code based on your requirements
}
}
private:
int* start;
std::size_t size;
std::size_t stride;
};
Where:

* The ``StrideRange`` class models oneTBB range that should be iterated with a specified stride during its initial construction.
* The ``stride`` value is stored in a private field within the range. Therefore, the class provides the member function ``iterate() const`` that implements a loop with the specified stride.

``range.iterate()``
*******************

Before C++17, to utilize a range in a parallel algorithm, such as ``parallel_for``, it was required to provide a ``Function Object`` as the algorithm's body. This Function Object defined the operations to be executed on each iteration of the range:

.. code::
int main() {
std::size_t array_size = 1000;
int* array_to_iterate = new int[array_size];
StrideRange range(array_to_iterate, array_size, /* stride = */ 2);
// Define a lambda function as the body of the parallel_for loop
auto pfor_body = [] (const StrideRange& range) {
range.iterate();
};
// Perform parallel iteration
oneapi::tbb::parallel_for(range, pfor_body);
delete[] array_to_iterate;
}
An additional lambda function ``pfor_body`` was also required. This lambda function invoked the ``rage.iterate()`` function.

Now with C++17, you can directly utilize a pointer to ``range.iterate()`` as the body of the algorithm:

.. code::
int main() {
std::size_t array_size = 1000;
int* array_to_iterate = new int[array_size];
// Performs the iteration over the array elements with the specified stride
StrideRange range(array_to_iterate, array_size, /* stride = */ 2);
// Parallelize the iteration over the range object
oneapi::tbb::parallel_for(range, &StrideRange::iterate);
delete[] array_to_iterate;
}
``std::invoke``
****************

``std::invoke`` is a function template that provides a syntax for invoking different types of callable objects with a set of arguments.

oneTBB implementation uses the C++ standard function ``std::invoke(&StrideRange::iterate, range)`` to execute the body. It is the equivalent of ``range.iterate()``.
Therefore, it allows you to invoke a callable object, such as a function object, with the provided arguments.

.. tip:: Refer to `C++ Standard <https://en.cppreference.com/w/cpp/utility/functional/invoke>`_ to learn more about ``std::invoke``.

Example
^^^^^^^^

Consider a specific scenario with ``function_node`` within a Flow Graph.

In the example below, a ``function_node`` takes an object as an input to read a member object of that input and proceed it to the next node in the graph:

.. code::
struct Object {
int number;
};
int main() {
using namespace oneapi::tbb::flow;
// Lambda function to read the member object of the input Object
auto number_reader = [] (const Object& obj) {
return obj.number;
};
// Lambda function to process the received integer
auto number_processor = [] (int i) { /* processing integer */ };
graph g;
// Function node that takes an Object as input and produces an integer
function_node<Object, int> func1(g, unlimited, number_reader);
// Function node that takes an integer as input and processes it
function_node<int, int> func2(g, unlimited, number_processor);
// Connect the function nodes
make_edge(func1, func2);
// Provide produced input to the graph
func1.try_put(Object{1});
// Wait for the graph to complete
g.wait_for_all();
}
Before C++17, the ``function_node`` in the Flow Graph required the body to be a Function Object. A lambda function was required to extract the number from the Object.

With C++17, you can use ``std::invoke`` with a pointer to the member number directly as the body.

You can update the previous example as follows:

.. code::
struct Object {
int number;
};
int main() {
using namespace oneapi::tbb::flow;
// The processing logic for the received integer
auto number_processor = [] (int i) { /* processing integer */ };
// Create a graph object g to hold the flow graph
graph g;
// Use a member function pointer to the number member of the Object struct as the body
function_node<Object, int> func1(g, unlimited, &Object::number);
// Use the number_processor lambda function as the body
function_node<int, int> func2(g, unlimited, number_processor);
// Connect the function nodes
make_edge(func1, func2);
// Connect the function nodes
func1.try_put(Object{1});
// Wait for the graph to complete
g.wait_for_all();
}
Find More
*********

The following APIs supports Callable object as Bodies:

* `parallel_for <https://oneapi-src.github.io/oneAPI-spec/spec/elements/oneTBB/source/algorithms/functions/parallel_for_func.html>`_
* `parallel_reduce <https://oneapi-src.github.io/oneAPI-spec/spec/elements/oneTBB/source/algorithms/functions/parallel_reduce_func.html>`_
* `parallel_deterministic_reduce <https://oneapi-src.github.io/oneAPI-spec/spec/elements/oneTBB/source/algorithms/functions/parallel_deterministic_reduce_func.html>`_
* `parallel_for_each <https://oneapi-src.github.io/oneAPI-spec/spec/elements/oneTBB/source/algorithms/functions/parallel_for_each_func.html>`_
* `parallel_scan <https://oneapi-src.github.io/oneAPI-spec/spec/elements/oneTBB/source/algorithms/functions/parallel_scan_func.html>`_
* `parallel_pipeline <https://oneapi-src.github.io/oneAPI-spec/spec/elements/oneTBB/source/algorithms/functions/parallel_pipeline_func.html>`_
* `function_node <https://oneapi-src.github.io/oneAPI-spec/spec/elements/oneTBB/source/flow_graph/func_node_cls.html>`_
* `multifunction_node <https://oneapi-src.github.io/oneAPI-spec/spec/elements/oneTBB/source/flow_graph/multifunc_node_cls.html>`_
* `async_node <https://oneapi-src.github.io/oneAPI-spec/spec/elements/oneTBB/source/flow_graph/async_node_cls.html>`_
* `sequencer_node <https://oneapi-src.github.io/oneAPI-spec/spec/elements/oneTBB/source/flow_graph/sequencer_node_cls.html>`_
* `join_node with key_matching policy <https://oneapi-src.github.io/oneAPI-spec/spec/elements/oneTBB/source/flow_graph/join_node_cls.html>`_
1 change: 1 addition & 0 deletions doc/main/tbb_userguide/title.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
../tbb_userguide/design_patterns/Design_Patterns
../tbb_userguide/Migration_Guide
../tbb_userguide/Constraints
../tbb_userguide/std_invoke
../tbb_userguide/appendix_A
../tbb_userguide/appendix_B
../tbb_userguide/References

0 comments on commit be2fb93

Please sign in to comment.