Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for DOK slicing and advanced indexing #298

Open
stschroe opened this issue Nov 6, 2019 · 9 comments
Open

Support for DOK slicing and advanced indexing #298

stschroe opened this issue Nov 6, 2019 · 9 comments

Comments

@stschroe
Copy link

stschroe commented Nov 6, 2019

Since the DOK array is versatile applicable I use them frequently in my projects. I needed support for slicing and advanced indexing and implemented it myself, see gist.

The indexing supports slicing, advanced (fancy) indexing and as I need it most of the time "array indexing". This is a fast version for advanced indexing. Only vectors (1d sequences) are accepted for indexing. The getitem method always returns dense ndarrays in contrast to normal advanced indexing of sparse arrays which create new copies of sparse arrays. The setitem method allows only vectors and scalars as values. This allows fast calculations over ndarray methods without the expensive creation of temporary sparse arrays between operations. This is of course only useful for small vectors, which fit easily in to memory.

If there is interest in adding this to pydata.sparse, I could help. But as I have never contributed to any open project, I have no idea if I could handle the necessary amount of work.

@hameerabbasi
Copy link
Collaborator

Hi! First of all, thanks for the offer to contribute! It means a lot.

Second, your code looks pretty much perfect, except that one thing I'd like is for parts of it to be ported to Numba, along with some of the other indexing logic.

If you're willing to accept that burden, it'd be great to have your contribution in.

@stschroe
Copy link
Author

stschroe commented Nov 6, 2019

During working on the indexing code I also made some experiments in speeding up the code with numba. But I decided to not use numba for now and let it mature for some time. Especially the numba typed list and dict objects as well as the slice object get some development work at the moment. So it may be much less work in the near future.

@hameerabbasi
Copy link
Collaborator

Fair enough. Were the experiments successful or are you not using them for some other reason? I'm also willing to accept your code and speed it up in the future.

@stschroe
Copy link
Author

stschroe commented Nov 6, 2019

I have done my experiments to evaluate, if all the functionality could be written in nopython mode. And the conclusion is yes. But as I wrote the code, it turned out, the numba version would result in a much bigger code base therefore much more work. For example I now use the slice indices method and the itertools product function. And the replecement of the usage of the itertools product function alone wold be not so easy. And as you can imagine, at the start, I had no idea how complex the whole indexing logic is.

@hameerabbasi
Copy link
Collaborator

In that case, feel free to make a PR.

@k-a-mendoza
Copy link

Is this bychance related to this issue found over at xarray? Or should I post a separate issue?

@hameerabbasi
Copy link
Collaborator

@El-minadero It seems the issue you're looking for is #114.

@DragaDoncila
Copy link
Contributor

@hameerabbasi I'd like to pick up this issue again, and if possible turn @stschroe's gist into a PR. Fancy indexing for item access and assignment for DOK arrays would be very useful for us over at napari to support sparse labels layers (see the issue on the napari repo here

Is this something you would still be open to? If so, I think we can get a PR in very soon.

@hameerabbasi
Copy link
Collaborator

Of course. If it doesn’t have a wontfix label, we’re open to PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants