-
-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSR/CSC: Allow working with borrowed data + generalize index type #878
Comments
I'm fairly interested in this functionality. At present there's no way to slice sparse matrix types, and having borrowed storage makes sense for such functionality to exist. I'm not sure that a generalized index type is needed for that (it seems like a distinct issue). Do you have recommendations for where to start if I wanted to add a custom storage generic parameter and start implementing the ability to slice csc / csr matrices? |
As for generalizing across index types, is the |
To clarify: by slicing, do you mean - for example - to take a "view" only to a select set of rows in a CSR matrix, or a select set of colums in a CSC matrix? Or do you mean something even more general? If we want to support this we also need to relax/change our assumptions on CSR/CSC data, so we'll have to think about if this imposes any problems first.
It is a distinct issue, but the connection is that we want an overall design that encapsulates both borrowed storage and generalized index types with minimal API complexity. So it would be good to at least consider how possible designs for these features impact each other in terms of the API complexity. Currently, the API of CSR/CSC matrices has been deliberately kept very simple so that it's accessible and understandable in the generated
I think we'll first have to clarify precisely the expectations of "slicing", then consider how this might impact the sparse matrix formats for "non-slicing" workflows, if at all. Once this is resolved it would perhaps make sense to explore creating a |
I think |
Quite literally, I want to be able to effectively call this function from dense matrices on sparse matrices.
I think at a top level how pub(crate) trait CsStorage<T> {
fn pattern(&self) -> &SparsityPattern;
fn values(&self) -> &[T];
fn get_entry(&self, major_index: usize, minor_index: usize) -> Option<SparseEntry<'_, T>>;
fn get_lane(&self, index: usize) -> Option<CsLane<'_, T>>;
fn lane_iter(&self) -> CsLaneIter<'_, T>;
} Of course, there'd have to be a corresponding impl<T, S> CsMatrix<T, S>
where
S: CsStorage<T>,
{
fn pattern(&self) -> &SparsityPattern {
self.storage.pattern()
}
// ...
}
impl<T, S> CsMatrix<T, S>
where
S: CsStorageMut<T>,
{
fn get_entry_mut(
&mut self,
major_index: usize,
minor_index: usize,
) -> Option<SparseEntryMut<'_, T>>
{
// etc
}
// ...
} Simply echoing the underlying (crate-private) trait. This way the main API doesn't need to change, but immutable and mutable slice types on the storage can exist separately.
I'm not so sure that
Well, yes and no. You have to index according to the same type as the upper size of your dimension type. The abstraction between runtime / compile time size is something that's unnecessary here though. In any case, if you wanted to extend what I had above, I think it would be possible, but not actually necessary. As far as extending it, you'd change
Indices and sizes should not be signed. So while I guess I'd agree that I'm going to continue down the above path and submit a PR if I can accomplish slicing sparse matrices. Further discussion can continue in the meanwhile, but it'll be easier to demonstrate with actual code I reckon. |
My primary motivation for working with borrowed data is not to enable slicing, but rather to enable interoperability with other libraries. Supporting similar zero-cost general slices/view with sparse matrices is generally not possible - certainly not without significant changes to the CSR/CSC formats. The current formats for CSR/CSC are also very deliberate: the choices made so far ensure maximum compatibility with other libraries. (This is a huge usability problem with for example Eigen's sparse matrices. Sometimes you have a reference Are you sure you really need to have a slice/view and not just be able to copy out parts of a matrix? I have been thinking about a "selection API" along the following lines: let selection = matrix.select(rows, cols).clone_owned();
output.select_mut(output_rows, output_cols).assign(&matrix.select(rows, cols));
There are two different concerns with somewhat different requirements here: creating CSR/CSC matrices from borrowed data, and slicing. In both cases, however, the pattern itself needs to have borrowed data, otherwise you'd have to clone the entire sparsity pattern for creating a slice, for example. And moreover, we need to expose a well-defined low-level format that applications can work with. For example, for FEM assembly (which is a large part of my motivation for building
I personally agree, and the Rust ecosystem largely agrees. However, existing sparse linear algebra software generally does not. Indexing in C libraries is commonly signed, so we need to account for that in order to support interoperability with existing libraries (unfortunately).
Sounds good - though for me there are a number of open questions we need to answer. However, perhaps these can be partially answered with the help of prototyping. I think the most pressing question, however, is how to proceed (if at all) with respect to slicing. |
Partly services issue dimforge#878, by generalizing over how data is stored within CsMatrix. I likewise took the liberty to make the naming for some functions consistent (e.g. get_entry_from_slices_mut, into_pattern_and_values). The function `take_pattern_and_values` literally was the same function as `into_pattern_and_values`, and I'm not sure why it existed.
This issue describes two in principle disjoint issues, but a design that address either needs to take care to accommodate both. They are:
usize
. For the vast majority of applications, 32-bit indices suffice, which can be expected to have a significant impact on performance due to reduced memory bandwidth requirements. We should therefore generalize the index type.There are nuances to both these issues, and in the end they must remain compatible. Our primary goal for the design of both these features, however, should be to try to reduce the added complexity of a larger and increasingly generic API surface. The library should remain easy to use, the documentation should be easy to browse and the API easy to use.
To this end, one possibility would be to add another Storage generic parameter to
CsrMatrix/CscMatrix
and make it default to owned storage, e.g.:The existing
CsMatrix
struct, which is currently only an implementation detail used to reduce repetition between the CSR and CSC implementations, might be repurposed to fit the role of aOwnedCsStorage
.There's similarly a host of issues related to storing the indices. For example, we would like to be able to use signed integers as indices, because other software might use signed integers for this purpose. However, we need to make sure that we can soundly convert back and forth between
usize
and the index type on demand. That is, we need to ensure that all indices stored in a valid CSR/CSC matrix are actually convertible tousize
without overflow/underflow.The text was updated successfully, but these errors were encountered: