document dtype extension #3157

d-v-b · 2025-06-19T17:23:42Z

This PR adds a working example of custom dtype creation and registration. because it's a lot of code, I put this in a new top-level directory called examples, which contains the executable python file dtype_example.py. This file uses PEP-723 metadata to declare a ml_dtypes dependency, and it uses a local zarr install, which means it can be tested properly against local changes.

I also expanded the current dtype docs in the user guide to include content about the data type resolution process.

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.rst
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

d-v-b · 2025-06-19T17:25:39Z

cc @nenb @ianhi, since yall were the most involved in this example over in the main dtypes PR.

ianhi

Nice, this is a great improvment. I left some comments and suggested improvements. The other thing I'd wish for is to the format the lines to to 80 characters long. In general I like 100 line length, but when rendered on the docs page you as it currently stands you have to horizontally scroll to read the example code.

ianhi · 2025-06-20T13:10:47Z

examples/custom_dtype.py

+# /// script
+# requires-python = ">=3.11"
+# dependencies = [
+#   "zarr @ {root}",


Is there any way around doing this? When this is being tested is it being called with run or just run via pytest? If the latter then would it be possible to change this to to just zarr? I think that would a small but nice improvement as this is first and foremore for documentation, not testing.

It's essential that these examples be tested as part of the regular test suite. Otherwise, the examples will break, and users will complain, and we will have to fix them, and to check that our fix worked, we will need to write tests...

But I agree that the "zarr @ {root}" declaration sucks. So here is an idea -- our test suite doesn't test this file. Instead, at test time we generate a copy of this file with a transformed /// script header, where the zarr version has been replaced with a reference to the local development version of zarr.

ianhi · 2025-06-20T13:11:34Z

examples/custom_dtype.py

+
+class Int2(ZDType[int2_dtype_cls, int2_scalar_cls]):
+    """
+    This class provides a Zarr compatibility layer around the int2 data type and the int2


Is there a nice link explaining the difference between these? I think I've inferred it but would be nice to make it explicit.

no I don't actually think there is a nice link that explains the data type / scalar type difference. The numpy docs should explain this, but they don't. I can add something to our docs.

ianhi · 2025-06-20T13:13:13Z

examples/custom_dtype.py

+
+    def to_json_scalar(self, data: object, *, zarr_format: ZarrFormat) -> int:
+        """Convert a python object to a scalar."""
+        return int(data)


Can this be more specific to the example? e.g. explain something to the effect of "needs to be int to be compatible with json." and mention int2 somewhere.

ianhi · 2025-06-20T13:25:03Z

examples/custom_dtype.py

+    def _check_scalar(self, data: object) -> TypeGuard[int]:
+        """Check if a python object is a valid scalar"""
+        return isinstance(data, (int, int2_scalar_cls))


why does this check against int instead of just int2?

because _check_scalar(0) should be OK

ianhi · 2025-06-20T13:25:34Z

docs/user-guide/data_types.rst

+Zarr Python defines a collection of Zarr data types. This collection, called a "data type registry",
+is essentially a dict where the keys are strings (a canonical name for each data type), and the values are
+the data type classes themselves. Dynamic data type resolution entails iterating over these data
+type classes, invoking a special class constructor defined on each one, and returning a concrete


Maybe mention the name of method, or link to on ZDType.

d-v-b added 4 commits June 19, 2025 18:46

add int2 example, and expand dtype docs

c1d1550

specify zarr with a direct local file reference for the dtype example

6e4a938

add comment on pep-723 metadata

8d18eed

ignore future warning in docs

bfb2088

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jun 19, 2025

Merge branch 'main' into docs/dtype-docs

60a1e30

dstansby added this to the 3.1.0 milestone Jun 20, 2025

Merge branch 'main' into docs/dtype-docs

5b2a601

ianhi reviewed Jun 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

document dtype extension #3157

document dtype extension #3157

d-v-b commented Jun 19, 2025

Uh oh!

d-v-b commented Jun 19, 2025

Uh oh!

ianhi left a comment

Uh oh!

ianhi Jun 20, 2025

Uh oh!

d-v-b Jun 20, 2025

Uh oh!

ianhi Jun 20, 2025

Uh oh!

d-v-b Jun 20, 2025

Uh oh!

ianhi Jun 20, 2025

Uh oh!

ianhi Jun 20, 2025

Uh oh!

d-v-b Jun 20, 2025

Uh oh!

ianhi Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

document dtype extension #3157

Are you sure you want to change the base?

document dtype extension #3157

Conversation

d-v-b commented Jun 19, 2025

Uh oh!

d-v-b commented Jun 19, 2025

Uh oh!

ianhi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!