-
-
Notifications
You must be signed in to change notification settings - Fork 331
document dtype extension #3157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
document dtype extension #3157
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is a great improvment. I left some comments and suggested improvements. The other thing I'd wish for is to the format the lines to to 80 characters long. In general I like 100 line length, but when rendered on the docs page you as it currently stands you have to horizontally scroll to read the example code.
# /// script | ||
# requires-python = ">=3.11" | ||
# dependencies = [ | ||
# "zarr @ {root}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way around doing this? When this is being tested is it being called with run
or just run via pytest? If the latter then would it be possible to change this to to just zarr
? I think that would a small but nice improvement as this is first and foremore for documentation, not testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's essential that these examples be tested as part of the regular test suite. Otherwise, the examples will break, and users will complain, and we will have to fix them, and to check that our fix worked, we will need to write tests...
But I agree that the "zarr @ {root}" declaration sucks. So here is an idea -- our test suite doesn't test this file. Instead, at test time we generate a copy of this file with a transformed /// script
header, where the zarr version has been replaced with a reference to the local development version of zarr.
|
||
class Int2(ZDType[int2_dtype_cls, int2_scalar_cls]): | ||
""" | ||
This class provides a Zarr compatibility layer around the int2 data type and the int2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a nice link explaining the difference between these? I think I've inferred it but would be nice to make it explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no I don't actually think there is a nice link that explains the data type / scalar type difference. The numpy docs should explain this, but they don't. I can add something to our docs.
|
||
def to_json_scalar(self, data: object, *, zarr_format: ZarrFormat) -> int: | ||
"""Convert a python object to a scalar.""" | ||
return int(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be more specific to the example? e.g. explain something to the effect of "needs to be int
to be compatible with json." and mention int2
somewhere.
def _check_scalar(self, data: object) -> TypeGuard[int]: | ||
"""Check if a python object is a valid scalar""" | ||
return isinstance(data, (int, int2_scalar_cls)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this check against int
instead of just int2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because _check_scalar(0)
should be OK
Zarr Python defines a collection of Zarr data types. This collection, called a "data type registry", | ||
is essentially a dict where the keys are strings (a canonical name for each data type), and the values are | ||
the data type classes themselves. Dynamic data type resolution entails iterating over these data | ||
type classes, invoking a special class constructor defined on each one, and returning a concrete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention the name of method, or link to on ZDType.
This PR adds a working example of custom dtype creation and registration. because it's a lot of code, I put this in a new top-level directory called
examples
, which contains the executable python filedtype_example.py
. This file uses PEP-723 metadata to declare aml_dtypes
dependency, and it uses a local zarr install, which means it can be tested properly against local changes.I also expanded the current dtype docs in the user guide to include content about the data type resolution process.
TODO:
docs/user-guide/*.rst
changes/