Skip to content

✨: add CanArrayX protocols #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open

Conversation

nstarman
Copy link
Collaborator

@nstarman nstarman commented Jun 22, 2025

No description provided.

@nstarman
Copy link
Collaborator Author

Ok This PR is doing too much. Let me pair it down to just a few Protocols and do the rest as a series of followups.

@nstarman nstarman force-pushed the has_x branch 5 times, most recently from 96067a4 to a1be18e Compare June 23, 2025 19:19
@nstarman nstarman marked this pull request as ready for review June 23, 2025 21:59
@nstarman nstarman requested a review from jorenham June 23, 2025 21:59
@nstarman
Copy link
Collaborator Author

Ping @NeilGirdhar, given related discussions.

@nstarman nstarman changed the title ✨: add HasArrayX protocols ✨: add CanArrayX protocols Jun 23, 2025
...


class CanArrayAdd(Protocol):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about parametrizing by dtype. Self, other, output. Bit of a mess. Maybe tackle parametrizing as a followup?

@nstarman
Copy link
Collaborator Author

Should all the Protocols inherit from HasArrayNamespace?
Also should it be rename to CanArrayNamespace ?

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jun 23, 2025

Should all the Protocols inherit from HasArrayNamespace?
Also should it be rename to CanArrayNamespace ?

I don't know what Joren will say, but I would guess no and no? (I think you got it right in this PR?)

Also, I'm guessing you're aware that int | float is float, and you're intentionally specifying both?

@nstarman
Copy link
Collaborator Author

nstarman commented Jun 24, 2025

don't know what Joren will say, but I would guess no

My thought was for building stuff like

class Positive(Protocol):
    def __call__(self, array: CanArrayPos, /) -> CanArrayPos: ...

is wrong.

It should be something like

class Positive(Protocol):
    def __call__(self, array: HasArrayNamespace, /) -> HasArrayNamespace: ...

But I think we want

class Positive(Protocol):
    def __call__(self, array: CanArrayPos, /) -> HasArrayNamespace: ...

Which I think works best if it's

class CanArrayPos(HasArrayNamespace, Protocol): ...

Also, I'm guessing you're aware that int | float is float, and you're intentionally specifying both?

Yes. :).

@NeilGirdhar
Copy link
Contributor

I see, you're kind of using it as a poor man's intersection?

Also, I'm guessing you're aware that int | float is float, and you're intentionally specifying both?

Yes. :).

Okay, is that because you're going to generate some documentation from these annotations? Or you find it less confusing?

Also, are you going to add complex to the union?

@nstarman
Copy link
Collaborator Author

Okay, is that because you're going to generate some documentation from these annotations? Or you find it less confusing?

It's for 2 reasons: the array api does it in their docs and because I think the Python numerical tower is a mess and since ints and floats aren't subclasses of each other, it makes little sense for them to be interchangeable at the static type level. 😤😆

Also, are you going to add complex to the union?

Worth discussing. The array api does not.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jun 24, 2025

It's for 2 reasons: the array api does it in their docs

The docs are that way to help beginners who might be confused. (At least that was the argument that was presented.) But you aren't expecting beginners to read your code, are you?

And, you aren't using this repo to build docs?

The downside of populating the unions unnecessarily is overcomplicated type errors. So from a user standpoint, I think this is worse.

From a developer standpoint, it's a matter of taste. Personally, I think more succinct is easier to understand.

because I think the Python numerical tower is a mess and since ints and floats aren't subclasses of each other, it makes little sense for them to be interchangeable at the static type level.

As much as you might like to turn back time and change the typing decisions that were made, the fact is that the static type int is a subclass of float as far as type checkers are concerned, and that will not change for the foreseeable future.

I think I understand what you're doing and why. I spent years writing if x != 0 for a similar reason. But I think this is a fact that you just have to accept even if you dislike it.

Worth discussing. The array api does not.

Does it not?

array.__add__(other: int | float | complex | array, /) → array

Have I misunderstood the documentation?

@nstarman
Copy link
Collaborator Author

nstarman commented Jun 24, 2025

Ah. We're building towards v2021 first.
A release branch for every major version.
The versions have almost been entirely additive, so it's not too onerous.
This also makes backporting easier.

Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier to use optype for this, as it already provides single-method generic protocols for each of the special dunders:

https://github.com/jorenham/optype/blob/master/optype/_core/_can.py

There's even documentation: https://github.com/jorenham/optype#binary-operations

And of course it's tested and thoroughly type-checked and stuff

@nstarman
Copy link
Collaborator Author

nstarman commented Jun 24, 2025

Sounds good to me...
It's good to have in-house expertise.

CleanShot 2025-06-24 at 10 22 36@2x

@nstarman
Copy link
Collaborator Author

nstarman commented Jul 1, 2025

@jorenham is this prep for using optype?

@nstarman nstarman mentioned this pull request Jul 1, 2025
@jorenham
Copy link
Member

jorenham commented Jul 1, 2025

@jorenham is this prep for using optype?

Yea, pretty much.

@jorenham
Copy link
Member

jorenham commented Jul 9, 2025

@jorenham do you want to switch some of these to be optype objects, or does the Self and docstring mean we should go ahead with rolling our own Protocols ?

I've thought about this, but I'm not sure what the best approach is. I considered four approaches:

  1. Use optype but monkeypatch the __doc__ of the protocols. The downside is that we'd pollute these protocols, which might be annoying for users that use optype for other things as well.
  2. Bundle optype as git submodule, so that we can monkeypatch __doc__ without polluting the "actual" optype protocols.
  3. We write our own protocols (copy-pasting those of optype). This won't pollute optype, but we'd have to do quite a lot of work to write- test- and maintain them.
  4. Use optype, but ignore the docstrings. If we later want docstrings after all, then we can revisit the 3 options above.

Now that I've written these down, I think I feel most for option 4. As far as I'm concerned, docstrings are a "should-have", not a "must-have" (MoSCow jargon). By postponing worrying about docstrings, we can focus on building the actual functionality first. This feels like the most agile approach to me.

Thoughts?

@nstarman
Copy link
Collaborator Author

For magic dunder methods I agree we can start with 4.

What about doing

@modify_docstring("", __float__="")
class CanFloat(opt.CanFloat): ...

@modify_docstring("", __int__="")
class CanInt(opt.CanInt[R]): ...

@jorenham
Copy link
Member

For magic dunder methods I agree we can start with 4.

What about doing

@modify_docstring("", __float__="")
class CanFloat(opt.CanFloat): ...

@modify_docstring("", __int__="")
class CanInt(opt.CanInt[R]): ...

I like that!

@nstarman
Copy link
Collaborator Author

nstarman commented Jul 11, 2025

We still have the problem of Self in the type annotations. `

E.g.

class CanArrayAdd(Protocol):
    def __add__(self, other: Self | int | float, /) -> Self: ...

which isn't compatible with optype.CanAdd .

Edit: the closest I can get is

opt.CanAdd["HasArrayNamespace[NS_contra] | int | float", "Array[NS_contra]"],

Doing

opt.CanAdd["Array[NS_X] | int | float", "Array[NS_X]"], doesn't seem to work.

@nstarman nstarman closed this Jul 11, 2025
@nstarman nstarman reopened this Jul 11, 2025
@jorenham
Copy link
Member

jorenham commented Jul 11, 2025

We still have the problem of Self in the type annotations. `

E.g.

class CanArrayAdd(Protocol):
    def __add__(self, other: Self | int | float, /) -> Self: ...

which isn't compatible with optype.CanAdd .

I'll add them to optype then


update

https://github.com/jorenham/optype/releases/tag/v0.12.0

@nstarman
Copy link
Collaborator Author

nstarman commented Jul 11, 2025

I'll add them to optype then

Awesome, so then it'll be...

CanAddSelf[T, R=Self] = CanAdd[Self | T, Self | R]

so we can do CanAddSelf[int | float] ?

@jorenham
Copy link
Member

Something like this, @nstarman?

class CanAddSelf(Protocol[_T_contra]):
    def __add__(self, rhs: Self | _T_contra, /) -> Self: ...

@nstarman
Copy link
Collaborator Author

Great! I guess the return type probably isn't necessary.

@jorenham
Copy link
Member

Great! I guess the return type probably isn't necessary.

Yea indeed. And if anyone needs it after all, then we can always add it as optional type parameter later on.

@jorenham
Copy link
Member

E.g.

class CanArrayAdd(Protocol):
    def __add__(self, other: Self | int | float, /) -> Self: ...

BTW, this wouldn't work in case of boolean arrays.

@nstarman
Copy link
Collaborator Author

E.g.

class CanArrayAdd(Protocol):
    def __add__(self, other: Self | int | float, /) -> Self: ...

BTW, this wouldn't work in case of boolean arrays.

Yeah. I noticed that. It's in the signature of the Array API, but without a way to detect boolean dtypes, how else do we write this statically?

Also we need CanRAddSelf, etc.

@nstarman
Copy link
Collaborator Author

nstarman commented Jul 11, 2025

I don't think we need to do single-method Protocols now that we're using optype

@docstring_setter(
    __pos__ = """...""",
    ...
)
class Array(
    HasArrayNamespace[NS_co],
    opt.CanPosSelf,
    opt.CanNegSelf,
    opt.CanAddSelf[int | float],
    opt.CanIAddSelf[int | float],
    opt.CanRAddSelf[int | float],
    opt.CanSubSelf[int | float],
    opt.CanISubSelf[int | float],
    opt.CanRSubSelf[int | float],
    opt.CanMulSelf[int | float],
    opt.CanIMulSelf[int | float],
    opt.CanRMulSelf[int | float],
    opt.CanTrueDivSelf[int | float],
    opt.CanRTrueDivSelf[int | float],
    opt.CanFloorDivSelf[int | float],
    opt.CanIFloorDivSelf[int | float],
    opt.CanRFloorDivSelf[int | float],
    opt.CanModSelf[int | float],
    opt.CanIModSelf[int | float],
    opt.CanRModSelf[int | float],
    opt.CanPowSelf[int | float],
    opt.CanIPowSelf[int | float],
    opt.CanRPowSelf[int | float],
    Protocol,
):

@jorenham
Copy link
Member

jorenham commented Jul 11, 2025

It's in the signature of the Array API

Then that should be changed 🤷🏻‍♂️

how else do we write this statically?

I'd make it generic:

class CanAddSelf(Protocol[_T_contra]):
    def __add__(self, rhs: Self | _T_contra, /) -> Self: ...

😏

Also we need CanRAddSelf, etc.

Yea I'll add *Self variants or all binops 👌🏻.

But I'm thinking of leaving out the Self as input for the reflected ops, so it'll be

def __radd__(self, rhs: _T_contra, /) -> Self: ..

because it shouldn't be needed, ...right?

@jorenham
Copy link
Member

jorenham commented Jul 11, 2025

I don't think we need to do single-method Protocols now that we're using optype

We'll still need some for the non-python dunders like __array_namespace_info__ and attributes like dtype

@nstarman
Copy link
Collaborator Author

We'll still need some for the non-python dunders like array_namespace_info and attributes like dtype

Yes, ones that don't have a natural fit in optype.

@jorenham
Copy link
Member

We don't care about __divmod__, right?

@nstarman
Copy link
Collaborator Author

how else do we write this statically?
I'd make it generic:

That's a good idea. We can define a generic Array[InputT] and then also provide some common-sense defaults, like (names TBD)

Array[InputT]
NumericArray = Array[int | float]
BoolArray = Array[bool]

Signed-off-by: Nathaniel Starkman <[email protected]>
@nstarman
Copy link
Collaborator Author

nstarman commented Jul 11, 2025

Pushing a commit that won't work since it references non-existent optype classes, but does most of the things we'll need when those exist.



@docstring_setter(
__pos__="""Evaluates `+self_i` for each element of an array instance.
Copy link
Collaborator Author

@nstarman nstarman Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we push the docstrings to a JSON that gets read in? It would make this

@docstring_setter(**docstrings_json)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted for a toml file since it has nicely formatted multiline raw strings.

…strings from TOML file

Signed-off-by: Nathaniel Starkman <[email protected]>
@jorenham
Copy link
Member

I just released optype 0.12.0 :)

@nstarman
Copy link
Collaborator Author

@jorenham. It works!

op.CanAddSame[T_contra],
op.CanIAddSelf[T_contra],
op.CanRAddSelf[T_contra],
op.CanSubSame[T_contra],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't accept boolean numpy arrays:

>>> import numpy as np
>>> np.array(True) - np.array(False)
Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    np.array(True) - np.array(False)
    ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Suggestions?

op.CanPosSelf,
op.CanNegSelf,
op.CanAddSame[T_contra],
op.CanIAddSelf[T_contra],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+= also works if you just have an __add__ and no __iadd__:

>>> class Thingy:
...     def __add__(self, rhs, /):
...         return self if isinstance(rhs, Thingy) else NotImplemented
...         
>>> a = Thingy()
>>> a + a
<__main__.Thingy object at 0x7f9896498830>
>>> a += a
>>> a
<__main__.Thingy object at 0x7f9896498830>

We already require Can{binop}Same, so can we remove CanI{binop}Self?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you read https://data-apis.org/array-api/2021.12/API_specification/array_object.html#in-place-operators.
In my reading I agree that __iadd__ isn't strictly necessary since x += 2 will fall back to x = x + 2, making a new object.
So yes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "May be implemented" makes it sound like it optional to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess it depends on whether we want xpt.Array to be a flexible utility, or as a array api compliance check for static typing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd opt for removing them. It's really only good for isinstance checks and not type-flow through a program.

op.CanMulSame[T_contra],
op.CanIMulSelf[T_contra],
op.CanRMulSelf[T_contra],
op.CanTruedivSame[T_contra],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CanTruedivSame requires __truediv__: (Self, Self) -> Self. In NumPy, that only holds for np.inexact dtypes (floating and complex). So this would reject integer and boolean arrays:

>>> import numpy as np
>>> np.array([1]) / np.array([1])
array([1.])
>>> np.array([True]) / np.array([True])
array([1.])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to write a more flexible Protocol for Truediv?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that we can't have it return Self, but something like xpt.Array would work I suppose. Something op.CanTruediv[int, xpt.CanArray] could work, but there's currently no optype protocol for __truediv__: (Self, Self) -> T.

If you think we'll need that, I wouldn't mind adding such protocols to optype. I'm not sure what to call them though 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a real shame that Self and TypeVar don't play so nicely together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, and there's no need for that restriction either: https://discuss.python.org/t/self-as-typevar-default/909

op.CanTruedivSame[T_contra],
op.CanITruedivSelf[T_contra],
op.CanRTruedivSelf[T_contra],
op.CanFloordivSame[T_contra],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't hold for boolean numpy arrays:

>>> import numpy as np
>>> np.array([True]) // np.array([True])
array([1], dtype=int8)

op.CanFloordivSame[T_contra],
op.CanIFloordivSelf[T_contra],
op.CanRFloordivSelf[T_contra],
op.CanModSame[T_contra],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mod and floordiv have identical signatures in numpy, so this won't work for boolean arrays:

>>> import numpy as np
>>> np.array([True]) % np.array([True])
array([0], dtype=int8)

op.CanModSame[T_contra],
op.CanIModSelf[T_contra],
op.CanRModSelf[T_contra],
op.CanPowSame[T_contra],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poor boolean arrays:

>>> np.array([True]) ** np.array([True])
array([1], dtype=int8)

###
# Ensure that `np.ndarray` instances are assignable to `xpt.Array`.

arr_array: xpt.Array[Any, Any] = arr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you set the first typar to Never? Because that way, e.g. __add__ becomes (Self, Self | Never) -> Self which reduces to (Self, Self) -> Self.

In theory it shouldn't make a difference here. But I know that pyright has a bug where it (incorrectly) reduces Self | Any to Any in certain situations. So I wouldn't be surprised if mypy would also behave incorrectly in this case.

# Ensure that `np.ndarray` instances are assignable to `xpt.Array`.

arr_array: xpt.Array[Any, Any] = arr
arr_floatarray: xpt.Array[float, Any] = arr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also kinda curious if xpt.Array[float, Any] will reject boolean- and integer arrays.

Comment on lines +18 to +20
arr_array: xpt.Array[Any, Any] = arr
arr_floatarray: xpt.Array[float, Any] = arr
arr_boolarray: xpt.Array[bool, Any] = arr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should probably stay in sync with the ones in test_numpy1.pyi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants