Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add general block sparse tensor support #135

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
9d77571
[WIP] lambda function is_non_zero_check
erdalmutlu Nov 8, 2024
8b4603f
Committing clang-format changes
exachem23 Nov 8, 2024
f9961c1
add simple test case
erdalmutlu Nov 8, 2024
fd99b54
[WIP] adds struct based BlockSparse Tensor
erdalmutlu Nov 11, 2024
31c0c49
Merge branch 'block_sparse' of github.com:NWChemEx/TAMM into block_sp…
erdalmutlu Nov 11, 2024
c586dd7
[WIP] minor
erdalmutlu Nov 11, 2024
d8accf3
[WIP] changes is non zero logic to more general use of reference indi…
erdalmutlu Nov 12, 2024
04c9aff
[WIP] minor format fix
erdalmutlu Nov 12, 2024
8f6ce76
adds support for Block Sparse tensors with tests
erdalmutlu Dec 3, 2024
ea0efa9
adds LocalTensor support with tests and documentation
erdalmutlu Nov 12, 2024
b8e859f
[PG] add support for subgroup creation
ajaypanyala Nov 14, 2024
55e7e4c
[PG] allow parent group specification internally
ajaypanyala Nov 15, 2024
06edda2
adds documentation for BlockSparse Tensor and moves tests for upcxx b…
erdalmutlu Dec 4, 2024
cf280ef
adds copying capabilities from/to distributed tensors
erdalmutlu Dec 11, 2024
20faf96
updates the block sparse tensor implementation
erdalmutlu Dec 19, 2024
6c461ad
profiling update
ajaypanyala Nov 9, 2024
3dca616
fix warnings
ajaypanyala Nov 10, 2024
dd4c35d
fix some clang warnings
ajaypanyala Nov 10, 2024
799b963
[CCSD Test] remove unnecessary memcpy
ajaypanyala Nov 10, 2024
ada987e
minor update to print_vector
ajaypanyala Dec 19, 2024
21ab651
cleans up debugging info
erdalmutlu Dec 20, 2024
c291104
fixes bug with exact copy
erdalmutlu Jan 10, 2025
0916274
adds new constructors for block sparse tensors
erdalmutlu Jan 21, 2025
c499c94
renames the BlockSparseInfo to TensorInfo
erdalmutlu Jan 23, 2025
da20abb
adds error handling for block sparse tensors by adding new checks on …
erdalmutlu Jan 30, 2025
3435391
fix for missing update on exact copy on block sparse tensors tests
erdalmutlu Jan 30, 2025
07f134d
Merge branch 'main' into block_sparse
erdalmutlu Jan 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 164 additions & 2 deletions docs/user_guide/tensor_construction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,7 @@ Different than the default tensor constructors, users can choose to use size val
to construct correspond tensors.

.. code:: cpp

// Tensor<T> B{tis1, tis1};
// Local tensor construction using TiledIndexSpaces
LocalTensor<T> local_A{tis1, tis1, tis1};
Expand All @@ -400,11 +401,12 @@ to construct correspond tensors.
LocalTensor<T> local_E{10, 10, 10};

Similar to general tensor objects in TAMM, ``LocalTensor`` objects have to be allocated.
While allocation/deallcoation calls are the same with general Tensor constructs, users
While allocation/deallocation calls are the same with general Tensor constructs, users
have to use an ``ExecutionContext`` object with ``LocalMemoryManager``. Below is an
example of how the allocation for these tensors looks like

.. code:: cpp

// Execution context with LocalMemoryManager
ExecutionContext local_ec{sch.ec().pg(), DistributionKind::nw, MemoryManagerKind::local};
// Scheduler constructed with the new local_ec
Expand Down Expand Up @@ -469,10 +471,170 @@ enabling element access via index location.
}
}
}

The examples above illustrate element-wise operations. Users can perform scheduler-based
operations with the local scheduler or define element-wise updates using loops.

`LocalTensor` object also allows copying from or to a distributed tensor object. This is
particularly useful in situations where users need a local copy of distributed
tensors to apply element-wise updates. Below is an example usage of this scenario:

.. code-block:: cpp

// Distributed tensor constructor
Tensor<T> dist_A{tN, tN, tN};
// ...

// Local tensor construction
LocalTensor<T> local_A{dist_A.tiled_index_spaces()};

sch_local.allocate(local_A)
.execute();

// Copy from distributed tensor
local_A.from_distributed_tensor(dist_A);

// Apply updates
sch_local
(local_A() = 21.0)
.execute();

// Copy back to distributed tensor
local_A.to_distributed_tensor(dist_A);


Block Sparse Tensor Construction
--------------------------------

TAMM supports the construction of general block sparse tensors using underlying
`TiledIndexSpace` constructs. Users can specify non-zero blocks by providing a
lambda function that replaces the block-wise `is_non_zero` check, which is internally
called for each block operation (e.g., allocation, element-wise operations,
tensor operations). This approach allows for efficient allocation of only non-zero
blocks and optimized tensor operations on these portions.

The following code demonstrates how to define a custom lambda function to check
for block sparsity and construct a block sparse tensor:

.. code-block:: cpp

// List of index spaces for the tensor construction
TiledIndexSpaceVec t_spaces{SpinTIS, SpinTIS};
// Spin mask for the dimensions
std::vector<SpinPosition> spin_mask_2D{SpinPosition::lower, SpinPosition::upper};

// Custom lambda function for the is_non_zero check
auto is_non_zero_2D = [t_spaces, spin_mask_2D](const IndexVector& blockid) -> bool {
Spin upper_total = 0, lower_total = 0, other_total = 0;
for (size_t i = 0; i < 2; i++) {
const auto& tis = t_spaces[i];
if (spin_mask_2D[i] == SpinPosition::upper) {
upper_total += tis.spin(blockid[i]);
} else if (spin_mask_2D[i] == SpinPosition::lower) {
lower_total += tis.spin(blockid[i]);
} else {
other_total += tis.spin(blockid[i]);
}
}

return (upper_total == lower_total);
};

// TensorInfo construction
TensorInfo tensor_info{t_spaces, is_non_zero_2D};

// Tensor constructor
Tensor<T> tensor{t_spaces, tensor_info};

TAMM offers a more convenient `TensorInfo` struct to describe non-zero blocks
using stringed sub-space constructs in `TiledIndexSpace`s. This simplifies the
process of constructing block sparse tensors.

Here's an example of using `TensorInfo`:

.. code-block:: cpp

// Map labels to corresponding sub-space strings
Char2TISMap char2MOstr = {{'i', "occ"}, {'j', "occ"}, {'k', "occ"}, {'l', "occ"},
{'a', "virt"}, {'b', "virt"}, {'c', "virt"}, {'d', "virt"}};

// Construct TensorInfo
TensorInfo tensor_info{
{MO, MO, MO, MO}, // Tensor dimensions
{"ijab", "iajb", "ijka", "ijkl", "iabc", "abcd"}, // Allowed blocks
char2MOstr // Character to sub-space string mapping
// ,{"abij", "aibj"} // Disallowed blocks (optional)
};

// Block Sparse Tensor construction
Tensor<T> tensor{{MO, MO, MO, MO}, tensor_info};

TAMM also provides a simplified constructor that only requires a list of allowed
blocks and the character-to-sub-space string map:

.. code-block:: cpp

// Block Sparse Tensor construction using allowed blocks
Tensor<T> tensor{{MO, MO, MO, MO}, {"ijab", "ijka", "iajb"}, char2MOstr};

Block Sparse `Tensor` inherits from general TAMM tensor constructs, enabling the application
of standard tensor operations to block sparse tensors. Users can employ labels over the entire
`TiledIndexSpace` for general computations or use sub-space labels to access specific blocks.

The following code illustrates how to allocate, set values, and perform operations on different
blocks of block sparse tensors:

.. code-block:: cpp

// Construct Block Sparse Tensors with different allowed blocks
Tensor<T> tensorA{{MO, MO, MO, MO}, {"ijab", "ijkl"}, char2MOstr};
Tensor<T> tensorB{{MO, MO, MO, MO}, {"ijka", "iajb"}, char2MOstr};
Tensor<T> tensorC{{MO, MO, MO, MO}, {"iabc", "abcd"}, char2MOstr};

// Allocate and set values
sch.allocate(tensorA, tensorB, tensorC)
(tensorA() = 2.0)
(tensorB() = 4.0)
(tensorC() = 0.0)
.execute();

// Use different blocks to update output tensor
// a, b, c, d: MO virtual space labels
// i, j, k, l: MO virtual space labels
sch
(tensorC(a, b, c, d) += tensorA(i, j, a, b) * tensorB(j, c, i, d))
(tensorC(i, a, b, c) += 0.5 * tensorA(j, k, a, b) * tensorB(i, j, k, c))
.execute();

// De-allocate tensors
tensorA.deallocate();
tensorB.deallocate();
tensorC.deallocate();

TAMM also provides block sparse constructors similar to the general tensor construction
by allowing use of TiledIndexLabels, TiledIndexSpaces, and strings corresponding to the
sub-space names in TiledIndexSpaces for representing only the allowed blocks. With these
constructors users don't have to provide a mapping from char to corresponding sub-space
names as they are provided explicitly. Below code shows the use of this constructions,
similar to previous case block sparse tensors constructed using these methods can be
directly used in any tensor operations for general tensors:

.. code-block:: cpp

// Construct Block Sparse Tensors with different allowed blocks
// Using TiledIndexLabels for allowed blocks
Tensor<T> tensorA{{MO, MO, MO, MO}, {{i, j, a, b}, {i, j, k, l}}};
// Using TiledIndexSpaces for allowed blocks
TiledIndexSpace Occ = MO("occ");
TiledIndexSpace Virt = MO("virt");
Tensor<T> tensorB{{MO, MO, MO, MO},
{TiledIndexSpaceVec{Occ, Occ, Occ, Occ},
TiledIndexSpaceVec{Occ, Virt, Occ, Virt}}};
// Using list of comma seperated strings representing sub-space names
Tensor<T> tensorC{{MO, MO, MO, MO}, {{"occ, virt, virt, virt"},
{"virt, virt, virt, virt"}}};

// ...

Example Tensor Constructions
----------------------------
Expand Down
1 change: 1 addition & 0 deletions src/tamm/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ set(TAMM_INCLUDES
tensor.hpp
tensor_impl.hpp
tensor_base.hpp
tensor_info.hpp
distribution.hpp
labeled_tensor.hpp
execution_context.hpp
Expand Down
20 changes: 20 additions & 0 deletions src/tamm/labeled_tensor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,26 @@ class LabeledTensor {
for(const auto& lbl: ilv_) {
for(const auto& dlbl: lbl.secondary_labels()) { EXPECTS(lbl.primary_label() != dlbl); }
}

auto tensor_base = tensor_.base_ptr();
EXPECTS(tensor_base != nullptr);

if(tensor_base->kind() == TensorBase::TensorKind::block_sparse) {
bool has_non_zero = false;
LabelLoopNest loop_nest{ilv_};

for(const auto& blockid: loop_nest) {
auto translated_blockid =
internal::translate_blockid_with_labels(blockid, ilv_, tensor_.tiled_index_spaces());

if(tensor_base->is_non_zero(translated_blockid)) {
has_non_zero = true;
break;
}
}
EXPECTS_STR(has_non_zero, "Labeled tensor should be constructed over an allowed block!");
}

} // validate

void unpack(size_t index) {
Expand Down
92 changes: 65 additions & 27 deletions src/tamm/local_tensor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -133,33 +133,71 @@ class LocalTensor: public Tensor<T> { // move to another hpp
old_tensor.deallocate();
}

// /// @brief
// /// @param sbuf
// /// @param block_dims
// /// @param block_offset
// /// @param copy_to_local
// void patch_copy_local(std::vector<T>& sbuf, const std::vector<size_t>& block_dims,
// const std::vector<size_t>& block_offset, bool copy_to_local) {
// auto num_dims = local_tensor_.num_modes();
// // Compute the total number of elements to copy
// size_t total_elements = 1;
// for(size_t dim: block_dims) { total_elements *= dim; }

// // Initialize indices to the starting offset
// std::vector<size_t> indices(block_offset);

// for(size_t c = 0; c < total_elements; ++c) {
// // Access the tensor element at the current indices
// if(copy_to_local) (*this)(indices) = sbuf[c];
// else sbuf[c] = (*this)(indices);

// // Increment indices
// for(int dim = num_dims - 1; dim >= 0; --dim) {
// if(++indices[dim] < block_offset[dim] + block_dims[dim]) { break; }
// indices[dim] = block_offset[dim];
// }
// }
// }
/**
* @brief Method for filling the local tensor data with the original distributed tensor. We first
* construct a loop nest and to a get on all blocks that are then written to the corresponding
* place in the new local tensor
*
* @param dist_tensor Distributed source tensor to copy
*/
void from_distributed_tensor(const Tensor<T>& dist_tensor) {
for(const auto& blockid: dist_tensor.loop_nest()) {
const tamm::TAMM_SIZE size = dist_tensor.block_size(blockid);
std::vector<T> buf(size);
dist_tensor.get(blockid, buf);
auto block_dims = dist_tensor.block_dims(blockid);
auto block_offset = dist_tensor.block_offsets(blockid);
patch_copy_local(buf, block_dims, block_offset, true);
}
}

/**
* @brief Method for filling the original distributed tensor data with the local tensor. We first
* construct a loop nest and to a get on all blocks that are then written to the corresponding
* place in the distributed tensor
*
* @param dist_tensor Distributed destination tensor to copy to
*/
void to_distributed_tensor(Tensor<T>& dist_tensor) {
for(const auto& blockid: dist_tensor.loop_nest()) {
const tamm::TAMM_SIZE size = dist_tensor.block_size(blockid);
std::vector<T> buf(size);
dist_tensor.get(blockid, buf);
auto block_dims = dist_tensor.block_dims(blockid);
auto block_offset = dist_tensor.block_offsets(blockid);
patch_copy_local(buf, block_dims, block_offset, false);
dist_tensor.put(blockid, buf);
}
}

/// @brief A helper method that copy a block of that to a corresponding patch in the local copy
/// @param sbuf Block data that wants to be copied
/// @param block_dims Block dimensions to find the accurate location in the linearized local
/// tensor
/// @param block_offset The offsets of the input data from the original multidimensional tensor
void patch_copy_local(std::vector<T>& sbuf, const std::vector<size_t>& block_dims,
const std::vector<size_t>& block_offset, bool copy_to_local) {
auto num_dims = this->num_modes();
// Compute the total number of elements to copy
size_t total_elements = 1;
for(size_t dim: block_dims) { total_elements *= dim; }
// Initialize indices to the starting offset
std::vector<size_t> indices(block_offset);

for(size_t c = 0; c < total_elements; ++c) {
size_t linearIndex = compute_linear_index(indices);

// Access the tensor element at the current indices
if(copy_to_local) this->access_local_buf()[linearIndex] = sbuf[c];
else sbuf[c] = this->access_local_buf()[linearIndex];

// Increment indices
for(int dim = num_dims - 1; dim >= 0; --dim) {
if(++indices[dim] < block_offset[dim] + block_dims[dim]) { break; }
indices[dim] = block_offset[dim];
}
}
}

/// @brief Method for applying the copy operation from a smaller LocalTensor to a bigger
/// LocalTensor used for re-sizing
Expand Down
Loading