Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an Array Protocol & improve static typing support #589

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

nstarman
Copy link
Contributor

@nstarman nstarman commented Feb 5, 2023

Continuing from #584, I've done a quick refactor of the array (and dtype) objects to make them typing Protocols.
@rgommers, I agree that making Array generic wrt the dtype is good for a followup.

It's not perfect, but mypy doesn't complain a whole lot, so I think this is a good start. I'm happy to make refinements.

Copy link
Member

@honno honno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this :)

The Array protocl stuff looks grand. And personally I like having a DType class just for consistency, even if practically type hinting an object has __eq__ isn't useful.

On namespace type hints (e.g. _CreatoinFuncs, ArrayAPINamespace): IMO I'm not sure on this yet, as they make contributing to the spec less ergonomic given we repeat the signatures, although they are pretty nifty to have. Is there a world where these could be dynamically construct them from the stubs instead? We might want to put these in a follow-up PR if just to ship the concretely non-controversial stuff first—interested in what others think.


@property
def ndim(self: array) -> int:
def ndim(self) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I see methods/properties which don't return arrays don't get any type hint for their self arg—could these be hinted with Array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static type checkers don't like this because the TypeVar is not properly bound — see https://peps.python.org/pep-0484/#scoping-rules-for-type-variables ("Unbound type variables should not appear in the bodies of generic functions, or in the class bodies apart from method definitions:").
In general, there's no need to type hint the self arg unless the return type depends on the type of self — e.g. see https://peps.python.org/pep-0484/#user-defined-generic-types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice thing about using Protocol is that self is understood by static type checkers to be the Protocol.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this only done for certain methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it for any methods I looked at over the course of the PR. This is by no means comprehensive and should be done: here or elsewhere.

@nstarman
Copy link
Contributor Author

nstarman commented Feb 6, 2023

We might want to put these in a follow-up PR if just to ship the concretely non-controversial stuff first—interested in what others think.

I was thinking the same thing last evening about moving this to a followup PR.

On namespace type hints (e.g. _CreatoinFuncs, ArrayAPINamespace): IMO I'm not sure on this yet, as they make contributing to the spec less ergonomic given we repeat the signatures, although they are pretty nifty to have. Is there a world where these could be dynamically construct them from the stubs instead?

Maybe? I haven't experimented yet but perhaps mypy will be satisfied with:

class ArrayAPINamespace(Protocol):

    zeros_like = staticmethod(creation_funcs.zeros_like)

Also for a future PR: zeros_like could be promoted to a Protocol,

class Array(Protocol):
    def __array_namespace__(self, ...) -> ArrayAPINamespace[Self]: ...


class ArrayAPINamespace(Protocol[Array]):

    zeros_like = staticmethod(zeros_like)


class zeros_like(Protocol[Array]):
    def __call__(self, x: Array, /, *, dtype: Optional[dtype] = None, device: Optional[device] = None) -> Array: ...

The advantage of this is that functions can be checked to see if their signature is compatible with zeros_like.

@rgommers rgommers added RFC Request for comments. Feature requests and proposed changes. topic: Static Typing Static typing. labels Feb 20, 2023
@rgommers rgommers changed the title Array Protocol Add an Array Protocol & improve static typing support Feb 20, 2023
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nstarman, and apologies for the delay in reviewing (your PR came in right at the start of my 2 week holiday). This is looking quite good. I pushed a couple of small commits to resolve some Mypy and Sphinx complaints.

This is close to ready I'd say. One thing that isn't nice is that Self now shows up instead of array in the signatures of the html docs. I tried to fiddle with autodoc_type_aliases in src/_array_api_conf.py, but was not successful in mapping that back to array. Using "self" will be pretty confusing in docs for regular methods that return arrays. I think we'll have to figure that one out.

The DType protocol is fine with me - it won't add much to type-checking, but it should be fine to have it here for consistency.

The Array protocol looks good.

@rgommers
Copy link
Member

@BvB93 if you have any feedback on this PR, that would be great to see.

@rgommers
Copy link
Member

@nstarman there are 3 errors left when running mypy array_api_stubs/_draft/array_object.py (from lots more on main). There are more errors in other files - but if we are going to maintain static typing support in this repo, adding a CI job running Mypy seems feasible. We could start with this file, and expand it to all files later. WDYT?

@nstarman
Copy link
Contributor Author

One thing that isn't nice is that Self now shows up instead of array in the signatures of the html docs.

If the array type is just text in the html docs and not 'clickable', then we can rename Self to array.

adding a CI job running Mypy seems feasible. We could start with this file, and expand it to all files later. WDYT?

That sounds like a good idea. I can try adding to the CI and create an ignore file that can later be whittled down as the package is made more type friendly.

@rgommers
Copy link
Member

If the array type is just text in the html docs and not 'clickable', then we can rename Self to array.

That works. It's not clickable - there is no array object in the standard on purpose, because in the end we don't care whether it's named ndarray, Tensor, or something else.

@nstarman nstarman force-pushed the protocol branch 5 times, most recently from c4d977e to 9d50eb5 Compare August 12, 2023 21:11
@nstarman
Copy link
Contributor Author

nstarman commented Aug 12, 2023

@nstarman nstarman marked this pull request as ready for review August 12, 2023 21:21
@asmeurer
Copy link
Member

Sphinx is trying to create links for the type hints. I think to fix that you need to update the list(s) here https://github.com/data-apis/array-api/blob/main/src/_array_api_conf.py#L52-L77

@nstarman
Copy link
Contributor Author

nstarman commented Aug 14, 2023

I can squash commits if you don't use Squash & Merge (assuming this is approved, of course :) ).
Thanks @asmeurer for the pointer.

@nstarman nstarman requested review from rgommers, honno and kgryte August 14, 2023 20:15
Copy link
Member

@honno honno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!


@property
def shape(self: array) -> Tuple[Optional[int], ...]:
def shape(self) -> tuple[int | None, ...]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to say more for myself than anyone else—I initially wasn't too sure if we wanted to use the py3.9+ type hint features, but:

  • Forgot that from future import __annotations__ makes this work for py3.8
  • py3.8 doesn't get updates end of next year
  • The scientific ecosystem already is adopting py3.9+

@nstarman nstarman requested a review from honno August 18, 2023 03:00
Copy link
Member

@honno honno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my end!

I think pre-commit is a good addition (would love to add hooks for other spec things myself), but probably not too useful until we get a workflow that runs it. That said, I'm not too sure if we'd want to do that just yet 🤔

@nstarman
Copy link
Contributor Author

nstarman commented Aug 18, 2023

I think pre-commit is a good addition (would love to add hooks for other spec things myself), but probably not too useful until we get a workflow that runs it. That said, I'm not too sure if we'd want to do that just yet 🤔

pre-commit.ci can be turned on, sans additional GH workflows.
black, and ruff would be wonderful as well!

@nstarman
Copy link
Contributor Author

Looks good on my end!

Thanks @honno for the reviews and discussion.

@asmeurer
Copy link
Member

Any reason to change array to Array? I think we might prefer to keep it as array like it is now.

@nstarman
Copy link
Contributor Author

nstarman commented Nov 24, 2024

@rgommers. I'd be happy to finish this PR. I don't recall why this stalled out.

nstarman and others added 7 commits November 24, 2024 16:09
Sphinx doesn't set `TYPE_CHECKING`, but does use the type annotations.
`Self` is unknown to Sphinx, so should be filtered out to prevent lots
of errors.
@nstarman nstarman force-pushed the protocol branch 3 times, most recently from 9c47664 to fba7da1 Compare November 24, 2024 21:44
@nstarman
Copy link
Contributor Author

Rebased and applied fixes for things that had changed.

@kgryte
Copy link
Contributor

kgryte commented Nov 25, 2024

@nstarman Would you be interested in coming to the community meeting this week (Nov 28) to discuss static typing and next steps? It would be great to have some perspectives from static typing experts!

@nstarman
Copy link
Contributor Author

nstarman commented Nov 25, 2024

I would be very interested to come! It is Thanksgiving 🦃, but I should be able to join.

@rgommers
Copy link
Member

@rgommers. I'd be happy to finish this PR. I don't recall why this stalled out.

I'm not 100% sure, but I think a combination of some of the question now being discussed in gh-863 and a lack of bandwidth and typing expertise on the part of the main maintainers of this repo. Certainly not due to your lack of effort here!

@NeilGirdhar
Copy link

This PR is awesome. Really looking forward to it being exposed.

Just one question: Why is the preference for using protocols instead of ordinary inheritance?

It's not like there are Array API clients that are going to be reluctant to inherit from your Array base class.

The problem with using protocols is that if you ever change any definition in the protocol, a client array may no longer see itself as compatible, and then a user program will report type errors at every Array API use.

Also, for the clients, they have to be very careful implementing their class to match the protocol or else it won't be compatible. With inheritance, they will instead get type errors if they make a mistake.

Finally, protocols are slower for type checkers to check than ordinary class inheritance.

@TomNicholas
Copy link

Presumably so that array implementation packages can avoid an extra dependency.

@NeilGirdhar
Copy link

NeilGirdhar commented Feb 6, 2025

Presumably so that array implementation packages can avoid an extra dependency.

Right, but are the costs worth it?

And what's the benefit of of avoiding dependencies? Most users are going to depend on the Array API type stubs anyway.

@jorenham
Copy link

jorenham commented Feb 6, 2025

Just one question: Why is the preference for using protocols instead of ordinary inheritance?

Inheritance hurts performance, and the additional import that this would require also would. Array libraries are usually heavily optimized, so this is not a realistic ask. And I'm not sure whether it would even be possible in case of e.g. numpy.ndarray, which is written in C.

@rgommers
Copy link
Member

rgommers commented Feb 6, 2025

It's not performance, it's that a required dependency is simply unacceptable for most libraries. NumPy has zero runtime dependencies, and it's highly unlikely that any will be added over the next few years - and this kind of thing certainly doesn't meet the bar. Other libraries may have similar constraints, plus there's potential issues with requiring different minimum versions of the same dependency.

@jorenham
Copy link

jorenham commented Feb 6, 2025

Also, for the clients, they have to be very careful implementing their class to match the protocol or else it won't be compatible. With inheritance, they will instead get type errors if they make a mistake.

You'll also get a type-error in this case if somewhere in your codebase you try to assign your array type to the array protocol, e.g.

_: Array = MyArray()  # test for array-api compat

Finally, protocols are slower for type checkers to check than ordinary class inheritance.

That's true, but in practice it's done a lot, and you don't even notice that, performance-wise. E.g. Mapping, Sequence, SupportsIndex, etc. are used all over the place in the typeshed stdlib stubs. So there's no need to worry about that.

@ntessore
Copy link

ntessore commented Feb 6, 2025

The problem with using protocols is that if you ever change any definition in the protocol, a client array may no longer see itself as compatible, and then a user program will report type errors at every Array API use.

Isn't this exactly the same either way?

Also, for the clients, they have to be very careful implementing their class to match the protocol or else it won't be compatible. With inheritance, they will instead get type errors if they make a mistake.

Again, isn't this exactly the same either way? The only difference being an assert_type(MyArray, ArrayApiArray)?

Generally speaking, I think having Array API protocols is valuable: it allows the user to take responsibility when using an implementation that is not (yet) fully compliant, instead of putting it solely in the hands of the implementation to declare that it is compliant.

It also means implementations can gradually transition to becoming Array API compliant, and they get told exactly where they currently aren't. I found this very useful with @nstarman's repo.

@NeilGirdhar
Copy link

Sorry, I didn't expect to generate so many replies and flood this PR. I've created an issue to continue the conversation: #899

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comments. Feature requests and proposed changes. topic: Static Typing Static typing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.