Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store linearized indices instead of full coordinates #164

Open
hameerabbasi opened this issue Jul 3, 2018 · 6 comments
Open

Store linearized indices instead of full coordinates #164

hameerabbasi opened this issue Jul 3, 2018 · 6 comments

Comments

@hameerabbasi
Copy link
Collaborator

hameerabbasi commented Jul 3, 2018

I was thinking of replacing the full coordinates (coords) with just a 1-D array of linearised indices.

Pros:

  • More memory efficient (this gives back a lot of benefits of A range of fixes relating to small coords dtypes #158)
  • Some O(N) operations become O(1) such as reshape, flatnonzero.
  • COO becomes a special case of the GCRS/GCCS scheme, which means we can drop CSD and go with GCRS/GCCS, but with the difference that any collection of axes (not just rows/columns) can be compressed.
  • CSR/CSC are represented exactly as in SciPy/other libraries (instead of having 2-D indices)

Cons:

  • Adds a small (my intuition says almost negligible) performance penalty in many operations.
  • Some O(1) operations become O(N) such as nonzero, the one-argument version of where.

cc @mrocklin, @stefanv

@ahwillia
Copy link
Contributor

ahwillia commented Jul 3, 2018 via email

@hameerabbasi
Copy link
Collaborator Author

Would it be tenable to support both indexing schemes and give users the option to choose?

COO arrays will be indexed exactly as before. This is about storage, not indexing.

@ahwillia
Copy link
Contributor

ahwillia commented Jul 3, 2018

Yes but if there are performance tradeoffs then it might be nice to support both. Though maybe those tradeoffs are negligible?

@hameerabbasi
Copy link
Collaborator Author

hameerabbasi commented Jul 3, 2018 via email

@perimosocordiae
Copy link

FWIW, this is the approach I used for sparray: https://github.com/perimosocordiae/sparray/blob/master/sparray/flat.py#L20

I found it to be quite convenient, though it does mean you may need more integer bits to represent the flat indices.

@hameerabbasi
Copy link
Collaborator Author

@perimosocordiae Currently, we use np.intp anyway, but due to the way things are designed, the maximum supported size is 2 ** 64 - 1 anyway, and I predict it might decrease to 2 ** 63 - 1.

So it's a win for memory either way. In any case it seems like you're +1 on this. :-) Thanks a bunch for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants