-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling sparse matrices #346
Comments
You're starting with 1.5GiB chunk sizes on So I would also try installing |
Running this locally, I also spot a dask scheduling bug where it doesn't treat |
Ah I keep forgetting this I think this is a dask scheduling issue. |
In my real world use case, I get this just loading data from a zarr store.
Me too, but I'm not sure why flox seems to be triggering it. In the dask issue I show that other tree aggregations with this array ( |
Your last comment is important context! (the zarr bit in particular). I would add that to the other issue |
Reproducer here: https://gist.github.com/ivirshup/eb4f5beb1bb33724b8c11bd0eacf03a6 from dask/dask#11026 (comment) This works on my laptop with 32GB RAM with some spilling res, codes = flox.groupby_reduce(
X_dense.T,
by,
func="sum",
fill_value=0,
) If you turn on logging with import logging
logger = logging.getLogger("flox")
logger.setLevel("DEBUG")
console_handler = logging.StreamHandler()
logger.addHandler(console_handler) you'll see it automatically chooses EDIT: your choice of
EDIT2: I guess you're densifying on your own... EDIT3: Memory issues can be controlled by using res, codes = flox.groupby_reduce(
X_dense.T,
by,
func="nansum",
fill_value=0,
engine="numbagg",
) |
How would flox handle sparse matrices directly? |
In @Intron7 may have some thoughts here as he wrote some CUDA kernels for these aggregations in memory. |
I was thinking of using it through
Oh nice! We could move your code into flox as an "engine" or alternatively tie in to scanpy with For reference, "engine"s handle the in-memory part of the aggregation. And we already have the |
The wrapper is apache-2, but it pulls in graphblas which is GPL. So effectively same distribution issues is my understanding.
That's how I've been thinking this sort of thing would work. Would it be weird to have a "sparse" engine that only works with sparse chunks? Or would you want to add it to another engine?
I am trying to remember why I was using reindex=True. I think it may have just been for a memory usage I could estimate as I had a lot of trouble getting this to work without running out of memory at all. FWIW, if there is sparse support, then we can increase the chunk sizes, which means we end up hitting more groups. Also, we don't really have set expectations about the distribution of groups per chunk, so I'd like to be sure this works in the worst case where the labels are well shuffled. |
Don't think so. In general, I'd like users to not set it at all (the default is
Yeah that would be nice. But it would also be good to get a real example. We might instead look to speed up the |
I'm getting unexpectedly high memory usage with flox. Here's what I've been doing:
This always warns about memory usage then fails on my dev machine with 64 gb of memory. However, I'm able to do plenty of other operations with an array this size (e.g. PCA, simple reductions). To me, a tree reduction here should be more than capable of handling this size of array.
Is this just me and my compute being odd, or do I have an incorrect expectation here?
cc: @ilan-gold
The text was updated successfully, but these errors were encountered: