-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ungridded to ungridded performance #9
Comments
I'm currently of the thinking that we would be better off converting the sample points loop to C, either through Cython or PyPy (or Numba?) rather than trying to vectorise it because I don't think we'll get the flexibility we want that way. |
By profiling an example CloudSat onto aircraft collocation it appears that the biggest bottleneck is actually the constraint step within the loop. Underneath we use the query_ball_point method to find nearby points for each sample point, but the query_ball_tree method should be much faster as we pass it all the sample points in one go. This moves the constraint step outside the sample point loop. I've created a branch for this work here: https://github.com/cedadev/cis/tree/optimize_ug_col |
I've had a bit more of a play with this today. Using query_ball_point does help (speed-up of about 50%) but not as much as I'd hoped - the bottleneck just gets moved to the loop over the constrained points. I've had a look at ways of dealing with the staggered (jagged) array which you end up with. I think a scipy sparse array is probably the way to do this, but how to construct it and then actually use it to calculate the kernel values is going to need a bit more thought. The other option is to calculate the whole sparse_distance_matrix in the first place. I've created an implementation for it, but it's very slow once you have many constrained points to calculate the distance of. We'll need this when calculating weighted averages anyway so I I'll probably concentrate on making this faster - using Cython for the main loop I think. |
This originated here: https://jira.ceh.ac.uk/browse/JASCIS-305
One way to speed up collocation (and perhaps other operations) is to output the set of values as a ragged array, then perform the kernel as an array operation.
So e.g. output a numpy array of length output values, but with elements which are numpy arrays of values to operate on. Then just operate on those values as a numpy array wise operation. Not sure how much time this would save, but could be much quicker in some situations.
The text was updated successfully, but these errors were encountered: