-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[POC] first try of multiprocessing for reprojection 2/2 #648
Comments
(Commenting here for both this issue and #647) I have two main remarks at this point (we can discuss more during the call next week!):
|
2.1. Regarding task order management, I hadn't realized this issue. Creating a flag to enable the next steps could be a solution. 2.2. For memory management, we could temporarily store the result in a NumPy array and convert it as needed once all computations are completed. I'm not an expert, feel free to propose something else |
OK for 1 and 2.1. For 2.2: We can't store the NumPy array in memory all at once, so we'll need to add a function that can write chunks of the output array in a raster file bit by bit with Rasterio. |
If we make sure that each worker returns the result and that the main process handles the writing, then yes, we will write to the files with Rasterio. |
Ticket No. 2
Context
The goal of this ticket is to implement tiled reprojection to facilitate the processing of "heavy" datasets, such as 40,000x40,000 DSM CARS.
Proposal
We have observed that this need was anticipated with the creation of the file [delayed.py](https://github.com/GlacioHack/geoutils/blob/main/geoutils/raster/delayed.py). Unfortunately, this file relies on the use of the DASK module, which enables distributed computing. However, based on past experiences, the CS 3D development team does not feel comfortable maintaining such a component for ICC needs, for the following reasons:
For these reasons, we propose implementing a
delayed_multiproc
module as an alternative to the current delayed implementation.Implementation (suite)
delayed_multiproc.py
(7 functions in total).Example:
dask.compute()
calls: these calls executedask.delayed()
functions, so they need to be replaced with equivalent multiprocessing implementations (4 instances in total).Tests
Adapt the dask tests to multiprocessing
Documentation
Updating the documentation
The text was updated successfully, but these errors were encountered: