-
-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/batch creation #2665
base: main
Are you sure you want to change the base?
feat/batch creation #2665
Changes from all commits
8faf994
8952911
de3c594
c700e39
986d68b
97b768f
b6bf2dd
57ceb64
181d3d0
e8e6107
4f2c954
dd4174c
cf72834
e2cff8c
0912ecb
04f7922
089feef
116ab87
246f862
e38c1ca
2fb9083
b099fba
64b54bf
4562e86
cdfd5de
036fd2a
787d6bf
d07435b
29ecce7
63dd07f
15c4a7e
645a447
bd9afd1
8be3876
06e5482
37186d6
ed4e846
02ac91d
f6a08a0
9d2f642
ed0d52a
661678f
7a718d5
23bfef5
619eeb5
5282534
6507e43
6b56342
3be878d
774eeda
f3c506f
60379a7
32e06fa
8bd0b57
1bb6578
d05a43c
29bab74
2f02c26
6ab8339
9b97c95
e546519
24eab3a
2b02996
a1e75b9
fff280c
545cacb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Adds functions for concurrently creating multiple arrays and groups. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,17 +7,19 @@ | |
import zarr.api.asynchronous as async_api | ||
import zarr.core.array | ||
from zarr._compat import _deprecate_positional_args | ||
from zarr.abc.store import Store | ||
from zarr.core.array import Array, AsyncArray | ||
from zarr.core.group import Group | ||
from zarr.core.sync import sync | ||
from zarr.core.group import Group, GroupMetadata, _parse_async_node | ||
from zarr.core.sync import _collect_aiterator, sync | ||
|
||
if TYPE_CHECKING: | ||
from collections.abc import Iterable | ||
from collections.abc import Iterable, Iterator | ||
|
||
import numpy as np | ||
import numpy.typing as npt | ||
|
||
from zarr.abc.codec import Codec | ||
from zarr.abc.store import Store | ||
from zarr.api.asynchronous import ArrayLike, PathLike | ||
from zarr.core.array import ( | ||
CompressorsLike, | ||
|
@@ -36,6 +38,7 @@ | |
ShapeLike, | ||
ZarrFormat, | ||
) | ||
from zarr.core.metadata import ArrayV2Metadata, ArrayV3Metadata | ||
from zarr.storage import StoreLike | ||
|
||
__all__ = [ | ||
|
@@ -46,10 +49,14 @@ | |
"copy_store", | ||
"create", | ||
"create_array", | ||
"create_hierarchy", | ||
"create_nodes", | ||
"create_rooted_hierarchy", | ||
"empty", | ||
"empty_like", | ||
"full", | ||
"full_like", | ||
"get_node", | ||
"group", | ||
"load", | ||
"ones", | ||
|
@@ -1132,3 +1139,141 @@ def zeros_like(a: ArrayLike, **kwargs: Any) -> Array: | |
The new array. | ||
""" | ||
return Array(sync(async_api.zeros_like(a, **kwargs))) | ||
|
||
|
||
def create_hierarchy( | ||
store: Store, | ||
path: str, | ||
nodes: dict[str, GroupMetadata | ArrayV2Metadata | ArrayV3Metadata], | ||
overwrite: bool = False, | ||
allow_root: bool = True, | ||
) -> Iterator[Group | Array]: | ||
""" | ||
Create a complete zarr hierarchy from a collection of metadata objects. | ||
|
||
Groups that are implicitly defined by the input will be created as needed. | ||
|
||
This function takes a parsed hierarchy dictionary and creates all the nodes in the hierarchy | ||
concurrently. Arrays and Groups are yielded in the order they are created. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the creation order deterministic? If not, then perhaps state that the order isn't guaranteed. |
||
|
||
Parameters | ||
---------- | ||
store : Store | ||
The storage backend to use. | ||
path : str | ||
The name of the root of the created hierarchy. Every key in ``nodes`` will be prefixed with | ||
``path`` prior to creating nodes. | ||
nodes : dict[str, GroupMetadata | ArrayV3Metadata | ArrayV2Metadata] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The usage example (and I guess the type) will probably make this clear, but it'd be good to clarify whether this is the flat or nested representation. IIUC, it's the flat representation so the keys are like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also the exact syntax of whether or not leading or trailing slashes are expected would be helpful too. |
||
A dictionary defining the hierarchy. The keys are the paths of the nodes | ||
in the hierarchy, and the values are the metadata of the nodes. The | ||
metadata must be either an instance of GroupMetadata, ArrayV3Metadata | ||
or ArrayV2Metadata. | ||
allow_root : bool | ||
Whether to allow a root node to be created. If ``False``, attempting to create a root node | ||
will result in an error. Use this option when calling this function as part of a method | ||
defined on ``AsyncGroup`` instances, because in this case the root node has already been | ||
created. | ||
|
||
Yields | ||
------ | ||
Group | Array | ||
The created nodes in the order they are created. | ||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be worth adding a usage example here. |
||
coro = async_api.create_hierarchy( | ||
store=store, path=path, nodes=nodes, overwrite=overwrite, allow_root=allow_root | ||
) | ||
|
||
for result in sync(_collect_aiterator(coro)): | ||
yield _parse_async_node(result) | ||
|
||
|
||
def create_nodes( | ||
*, store: Store, path: str, nodes: dict[str, GroupMetadata | ArrayV2Metadata | ArrayV3Metadata] | ||
) -> Iterator[Group | Array]: | ||
"""Create a collection of arrays and / or groups concurrently. | ||
|
||
Note: no attempt is made to validate that these arrays and / or groups collectively form a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the main / only difference between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If so, we could just use IIUC the advantage of I'm not too worried about using "unsafe" API in xarray because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
yes. |
||
valid Zarr hierarchy. It is the responsibility of the caller of this function to ensure that | ||
the ``nodes`` parameter satisfies any correctness constraints. | ||
|
||
Parameters | ||
---------- | ||
store : Store | ||
The storage backend to use. | ||
path : str | ||
The name of the root of the created hierarchy. Every key in ``nodes`` will be prefixed with | ||
``path`` prior to creating nodes. | ||
nodes : dict[str, GroupMetadata | ArrayV3Metadata | ArrayV2Metadata] | ||
A dictionary defining the hierarchy. The keys are the paths of the nodes | ||
in the hierarchy, and the values are the metadata of the nodes. The | ||
metadata must be either an instance of GroupMetadata, ArrayV3Metadata | ||
or ArrayV2Metadata. | ||
|
||
Yields | ||
------ | ||
Group | Array | ||
The created nodes. | ||
""" | ||
coro = async_api.create_nodes(store=store, path=path, nodes=nodes) | ||
|
||
for result in sync(_collect_aiterator(coro)): | ||
yield _parse_async_node(result) | ||
|
||
|
||
def create_rooted_hierarchy( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand the use case for this function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this function returns a single zarr array or group (the root of the hierarchy); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the use case is for when someone wants to create an entire hierarchy, and get as a return a value a handle to the root of that hierarchy. I suspect this is actually more typical than users wanting an iterator over everything in the hierarchy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I agree with @jhamman - There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Getting the root is easy, in a computer science sense, but it's also tedious. Look at the source code for |
||
*, | ||
store: Store, | ||
path: str, | ||
nodes: dict[str, GroupMetadata | ArrayV2Metadata | ArrayV3Metadata], | ||
overwrite: bool = False, | ||
) -> Group | Array: | ||
""" | ||
Create a Zarr hierarchy with a root, and return the root node, which could be a ``Group`` | ||
or ``Array`` instance. | ||
|
||
Parameters | ||
---------- | ||
store : Store | ||
The storage backend to use. | ||
path : str | ||
The name of the root of the created hierarchy. Every key in ``nodes`` will be prefixed with | ||
``path`` prior to creating nodes. | ||
nodes : dict[str, GroupMetadata | ArrayV3Metadata | ArrayV2Metadata] | ||
A dictionary defining the hierarchy. The keys are the paths of the nodes | ||
in the hierarchy, and the values are the metadata of the nodes. The | ||
metadata must be either an instance of GroupMetadata, ArrayV3Metadata | ||
or ArrayV2Metadata. | ||
overwrite : bool | ||
Whether to overwrite existing nodes. Default is ``False``. | ||
|
||
Returns | ||
------- | ||
Group | Array | ||
""" | ||
async_node = sync( | ||
async_api.create_rooted_hierarchy(store=store, path=path, nodes=nodes, overwrite=overwrite) | ||
) | ||
return _parse_async_node(async_node) | ||
|
||
|
||
def get_node(store: Store, path: str, zarr_format: ZarrFormat) -> Array | Group: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here. this seems like a helper function but one that may not want to include as part of the public api There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can remove it, but I think a function for getting an array or group is pretty useful to end users |
||
""" | ||
Get an Array or Group from a path in a Store. | ||
|
||
Parameters | ||
---------- | ||
store : Store | ||
The store-like object to read from. | ||
path : str | ||
The path to the node to read. | ||
zarr_format : {2, 3} | ||
The zarr format of the node to read. | ||
|
||
Returns | ||
------- | ||
Array | Group | ||
""" | ||
|
||
return _parse_async_node( | ||
sync(async_api.get_node(store=store, path=path, zarr_format=zarr_format)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that
overwrite
is undocumented here. In other functions it'd documented asCould you update that description to say what happens when an existing node is found with
overwrite=False
? Is an error raised, or is the node not updated?