Description
Hi!
I am trying to use ZFP compression for weather forecast data in zarr.
The problem:
My data itself has dimensions (forecast_reference_time, valid_time, height, lat, lon), which is 5 dimensions.
I would like to use ZFP for lossy compression, but even when I try to write a single 2D image with dimensions (1, 1, 1, 1000, 1000), ZFPY codec fails with
RuntimeError: Greater than 4 dimensions not supported
This prevents me from using ZFP at all for this data, even if I want to leverage the spatial correlation only in the last two dimensions.
Even if my my data was lacking, let's say, the 'height' dimension, and thus be just 4-dimensional (which is ok for ZFP), it would still be unsatisfactory, because there is very little correlation in the first two dimensions and it would sabotage the compression.
IMHO, the correct way to handle this would be to flatten the array in all but the last two dimensions before passing it to ZFP.
So currently my workaround would be to define a custom filter "Flatten", that would concat along the first three dimensions and return a 2D array with good correlation in all dimensions, so that ZFP can properly compress it.
Shouldn't such "Flatten" or "Reshape" filter be part of numcodecs (similar to "AsType")? Some thought would have to be given to how to properly unflatten the array back into the original shape, maybe it is easy, I am not sure.
Or alternatively, this flattening could be part of ZFPY codec and configuration.
I feel like ZFPY codec is actually useless for zarr as-is because of these reasons, or am I using it incorrectly?
Thanks!