Skip to content

v25.01.00

Latest
Compare
Choose a tag to compare
@marcinz marcinz released this 08 Feb 06:20
6658905

This is a closed-source release, governed by the following EULA: https://docs.nvidia.com/legate/25.01/eula.pdf.

Linux x86 and ARM conda packages with multi-node support (based on UCX or GASNet) are available at https://anaconda.org/legate/legate (GASNet-based packages are under the gex label).

Documentation for this release can be found at https://docs.nvidia.com/legate/25.01/.

New features

Memory management

  • There is no longer a separation between the memory pools used for ahead-of-task-execution ("deferred") allocations, and task-execution-time ("eager") allocations. The --eager-alloc-percentage flag is thus obsolete. Instead, a task that creates temporary or output buffers during execution must be registered with has_allocations=true, and a new allocation_pool_size() mapper callback must provide an upper bound for the task's total size of allocations. See https://docs.nvidia.com/legate/25.01/api/cpp/mapping.html for more detailed instructions.
  • Add the offload_to() API, that allows a user to offload a store or array to a particular memory kind, such that any copies in other memories are discarded. This can be useful e.g. to evict an array from GPU memory onto system memory, freeing up space for subsequent GPU tasks.

I/O

  • Move the HDF5 interface out of the experimental namespace.
  • Use cuFile to accelerate HDF5 reads on the GPU.
  • Add support for reading "binary" HDF5 datasets.

Deprecations

  • Remove the task_target() callback from the Legate mapper. Users should utilize the resource scoping mechanism instead, if they need to restrict where tasks should run.
  • Drop support for the Maxwell GPU architecture. Legate now requires at least Pascal (sm_60).

Miscellaneous

  • Increase the maximum array dimension from 4 to 6.
  • Record stacktraces on Legate exceptions and error messages.
  • Consider NUMA node topology when allocating CPU cores and memory during automatic machine configuration.
  • Add environment variable LEGATE_LIMIT_STDOUT, to only print out the output from one of the copies of the top-level program in a multi-process execution.
  • Add legate::LogicalStore::reinterpret_as() to reinterpret the underlying storage of a LogicalStore as another data-type.

Full changelog: https://docs.nvidia.com/legate/25.01/changes/2501.html