You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One means of greatly reducing overhead is to add support for bulk insert and query APIs based around NumPy arrays. For example, at the moment when one creates an index the most efficient option currently available is to pass in a generator. This generator is then converted to a callback which is passed to the C API. Overall, this is quite a bit less efficient in the non-object case compared to passing a NumPy array of (N,) idxs and (N, 2*ndim) coordinates. With the latter only a single foreign function call would be required consisting of two pointers and a count.
Similar benefits could be had on the query side where one could pass an array of (N, 2*ndim) coordinates to the C API. The resulting idx's could then be returned as a pair of arrays: one of size (M,) consisting of the idx's themselves and another of size (N + 1,) consisting of displacements into this idx array (saying where each query point begins). This would again greatly reduce the amount of time spent in Python/ctypes and also simplify memory management.
From an implementation standpoint this would require some minor additions to libspatialindex but they should be relatively straightforwards (just simple loops which call the C++ routines would suffice as a first pass). The result would be a 'fast path' for when one does not need object support and is working with data already in some native array format.
The text was updated successfully, but these errors were encountered:
One means of greatly reducing overhead is to add support for bulk insert and query APIs based around NumPy arrays. For example, at the moment when one creates an index the most efficient option currently available is to pass in a generator. This generator is then converted to a callback which is passed to the C API. Overall, this is quite a bit less efficient in the non-object case compared to passing a NumPy array of (N,) idxs and (N, 2*ndim) coordinates. With the latter only a single foreign function call would be required consisting of two pointers and a count.
Similar benefits could be had on the query side where one could pass an array of (N, 2*ndim) coordinates to the C API. The resulting idx's could then be returned as a pair of arrays: one of size (M,) consisting of the idx's themselves and another of size (N + 1,) consisting of displacements into this idx array (saying where each query point begins). This would again greatly reduce the amount of time spent in Python/ctypes and also simplify memory management.
From an implementation standpoint this would require some minor additions to libspatialindex but they should be relatively straightforwards (just simple loops which call the C++ routines would suffice as a first pass). The result would be a 'fast path' for when one does not need object support and is working with data already in some native array format.
The text was updated successfully, but these errors were encountered: