Skip to content

Feature request: flatten filter (for ZFP), or flatten_dimensions configuration field for ZFPY codec #751

Closed
@vladidobro

Description

@vladidobro

Hi!

I am trying to use ZFP compression for weather forecast data in zarr.

The problem:
My data itself has dimensions (forecast_reference_time, valid_time, height, lat, lon), which is 5 dimensions.
I would like to use ZFP for lossy compression, but even when I try to write a single 2D image with dimensions (1, 1, 1, 1000, 1000), ZFPY codec fails with

RuntimeError: Greater than 4 dimensions not supported

This prevents me from using ZFP at all for this data, even if I want to leverage the spatial correlation only in the last two dimensions.
Even if my my data was lacking, let's say, the 'height' dimension, and thus be just 4-dimensional (which is ok for ZFP), it would still be unsatisfactory, because there is very little correlation in the first two dimensions and it would sabotage the compression.

IMHO, the correct way to handle this would be to flatten the array in all but the last two dimensions before passing it to ZFP.
So currently my workaround would be to define a custom filter "Flatten", that would concat along the first three dimensions and return a 2D array with good correlation in all dimensions, so that ZFP can properly compress it.

Shouldn't such "Flatten" or "Reshape" filter be part of numcodecs (similar to "AsType")? Some thought would have to be given to how to properly unflatten the array back into the original shape, maybe it is easy, I am not sure.

Or alternatively, this flattening could be part of ZFPY codec and configuration.
I feel like ZFPY codec is actually useless for zarr as-is because of these reasons, or am I using it incorrectly?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions