Parent groups automatically created in Zarr-Python 3 #2772
-
In import zarr
store = "/tmp/foo.zarr"
shape = (64, 64, 4)
za = zarr.create_array(
store,
shape=shape,
dtype=float,
zarr_format=3,
name="foo/bar",
) initializes the directory tree
This is a departure from writing V3 format in
Is this an intended change in behavior? This subtlety is important in the case of unsynchronized, concurrent writers. It used to be the case that if two writers were writing to different Zarr groups, there was a guarantee that they would not touch the same files. Now, if one writer wrote
would be written to by both writers. While the write is idempotent, we have a use-case where, like Icechunk, we have an abstraction layer that handles write consistency, and part of that requires us to detect whether two writes mutate the same files. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
thanks for this report. the story here is kind of complicated, because the zarr specs don't actually contain a complete definition of what a zarr hierarchy is. The v2 spec says this:
Similarly, the v3 spec says:
So although we are consistent with the specs, think the specs are on weak ground here, and I think we should provide APIs for users to create "isolated" arrays and groups if they need this.
You might also be interested in some of the functions i'm adding over in #2665. The goal is to make creating groups and arrays much more functional and explicit. There is a |
Beta Was this translation helpful? Give feedback.
thanks for this report. the story here is kind of complicated, because the zarr specs don't actually contain a complete definition of what a zarr hierarchy is. The v2 spec says this: