Skip to content

document dtype extension #3157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Jun 19, 2025

This PR adds a working example of custom dtype creation and registration. because it's a lot of code, I put this in a new top-level directory called examples, which contains the executable python file dtype_example.py. This file uses PEP-723 metadata to declare a ml_dtypes dependency, and it uses a local zarr install, which means it can be tested properly against local changes.

I also expanded the current dtype docs in the user guide to include content about the data type resolution process.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jun 19, 2025
@d-v-b
Copy link
Contributor Author

d-v-b commented Jun 19, 2025

cc @nenb @ianhi, since yall were the most involved in this example over in the main dtypes PR.

@dstansby dstansby added this to the 3.1.0 milestone Jun 20, 2025
Copy link
Contributor

@ianhi ianhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is a great improvment. I left some comments and suggested improvements. The other thing I'd wish for is to the format the lines to to 80 characters long. In general I like 100 line length, but when rendered on the docs page you as it currently stands you have to horizontally scroll to read the example code.

# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr @ {root}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way around doing this? When this is being tested is it being called with run or just run via pytest? If the latter then would it be possible to change this to to just zarr? I think that would a small but nice improvement as this is first and foremore for documentation, not testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's essential that these examples be tested as part of the regular test suite. Otherwise, the examples will break, and users will complain, and we will have to fix them, and to check that our fix worked, we will need to write tests...

But I agree that the "zarr @ {root}" declaration sucks. So here is an idea -- our test suite doesn't test this file. Instead, at test time we generate a copy of this file with a transformed /// script header, where the zarr version has been replaced with a reference to the local development version of zarr.


class Int2(ZDType[int2_dtype_cls, int2_scalar_cls]):
"""
This class provides a Zarr compatibility layer around the int2 data type and the int2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a nice link explaining the difference between these? I think I've inferred it but would be nice to make it explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I don't actually think there is a nice link that explains the data type / scalar type difference. The numpy docs should explain this, but they don't. I can add something to our docs.


def to_json_scalar(self, data: object, *, zarr_format: ZarrFormat) -> int:
"""Convert a python object to a scalar."""
return int(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be more specific to the example? e.g. explain something to the effect of "needs to be int to be compatible with json." and mention int2 somewhere.

Comment on lines +111 to +113
def _check_scalar(self, data: object) -> TypeGuard[int]:
"""Check if a python object is a valid scalar"""
return isinstance(data, (int, int2_scalar_cls))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this check against int instead of just int2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because _check_scalar(0) should be OK

Zarr Python defines a collection of Zarr data types. This collection, called a "data type registry",
is essentially a dict where the keys are strings (a canonical name for each data type), and the values are
the data type classes themselves. Dynamic data type resolution entails iterating over these data
type classes, invoking a special class constructor defined on each one, and returning a concrete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention the name of method, or link to on ZDType.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants