Store linearized indices instead of full coordinates #164

hameerabbasi · 2018-07-03T08:06:37Z

I was thinking of replacing the full coordinates (coords) with just a 1-D array of linearised indices.

Pros:

More memory efficient (this gives back a lot of benefits of A range of fixes relating to small coords dtypes #158)
Some O(N) operations become O(1) such as reshape, flatnonzero.
COO becomes a special case of the GCRS/GCCS scheme, which means we can drop CSD and go with GCRS/GCCS, but with the difference that any collection of axes (not just rows/columns) can be compressed.
CSR/CSC are represented exactly as in SciPy/other libraries (instead of having 2-D indices)

Cons:

Adds a small (my intuition says almost negligible) performance penalty in many operations.
Some O(1) operations become O(N) such as nonzero, the one-argument version of where.

cc @mrocklin, @stefanv

The text was updated successfully, but these errors were encountered:

ahwillia · 2018-07-03T15:01:58Z

Would it be tenable to support both indexing schemes and give users the option to choose?

…

On Tue, Jul 3, 2018, 1:06 AM Hameer Abbasi ***@***.***> wrote: I was thinking of replacing the full coordinates (coords) with just a 1-D array of linearised indices. Pros: - More memory efficient (this gives back a lot of benefits of #158 <#158>) - Some O(N) operations become O(1) such as reshape, flatnonzero. - COO becomes a special case of the GCRS/GCCS <https://ieeexplore.ieee.org/document/7237032/> scheme, which means we can drop CSD and go with GCRS/GCCS, but with the difference that any collection of axes (not just rows/columns) can be compressed. - CSR/CSC are represented exactly as in SciPy/other libraries Cons: - Adds a small (my intuition says almost negligible) performance penalty in many operations. - Some O(1) operations become O(N) such as nonzero, the one-argument version of where. cc @mrocklin <https://github.com/mrocklin>, @stefanv <https://github.com/stefanv> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#164>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAm20WTqGziuAMyE2EsEL0mqIflAKTYaks5uCyYOgaJpZM4VAeim> .

hameerabbasi · 2018-07-03T15:13:02Z

Would it be tenable to support both indexing schemes and give users the option to choose?

COO arrays will be indexed exactly as before. This is about storage, not indexing.

ahwillia · 2018-07-03T15:51:06Z

Yes but if there are performance tradeoffs then it might be nice to support both. Though maybe those tradeoffs are negligible?

hameerabbasi · 2018-07-03T16:07:32Z

Yes, the performance loss in most cases is negligible according to my intuition. But keeping both incurs technical debt, or at least maintenance debt.

perimosocordiae · 2018-07-06T20:11:08Z

FWIW, this is the approach I used for sparray: https://github.com/perimosocordiae/sparray/blob/master/sparray/flat.py#L20

I found it to be quite convenient, though it does mean you may need more integer bits to represent the flat indices.

hameerabbasi · 2018-07-06T20:21:28Z

@perimosocordiae Currently, we use np.intp anyway, but due to the way things are designed, the maximum supported size is 2 ** 64 - 1 anyway, and I predict it might decrease to 2 ** 63 - 1.

So it's a win for memory either way. In any case it seems like you're +1 on this. :-) Thanks a bunch for the feedback.

hameerabbasi added the discussion label Jul 3, 2018

This was referenced Jul 24, 2018

MAINT: better error messages for initialization #171

Merged

Sparse tech improvements section draft scipy/scipy-articles#43

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store linearized indices instead of full coordinates #164

Store linearized indices instead of full coordinates #164

hameerabbasi commented Jul 3, 2018 •

edited

Loading

ahwillia commented Jul 3, 2018 via email

hameerabbasi commented Jul 3, 2018

ahwillia commented Jul 3, 2018

hameerabbasi commented Jul 3, 2018 via email •

edited

Loading

perimosocordiae commented Jul 6, 2018

hameerabbasi commented Jul 6, 2018

Store linearized indices instead of full coordinates #164

Store linearized indices instead of full coordinates #164

Comments

hameerabbasi commented Jul 3, 2018 • edited Loading

ahwillia commented Jul 3, 2018 via email

hameerabbasi commented Jul 3, 2018

ahwillia commented Jul 3, 2018

hameerabbasi commented Jul 3, 2018 via email • edited Loading

perimosocordiae commented Jul 6, 2018

hameerabbasi commented Jul 6, 2018

hameerabbasi commented Jul 3, 2018 •

edited

Loading

hameerabbasi commented Jul 3, 2018 via email •

edited

Loading