Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: finfo_object, iinfo_object, _array to typing.Protocol #857

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

34j
Copy link

@34j 34j commented Nov 24, 2024

Closes #856

@rgommers rgommers added the topic: Static Typing Static typing. label Nov 24, 2024
@rgommers
Copy link
Member

@34j thanks a lot for working on improving static typing. Did you see gh-589 though? This PR seems to reinvent parts of that PR. It's nontrivial to review a large PR and get static typing right, but I prefer to push gh-589 forward if there is energy for that. Would you be interested in checking if that addresses your issue and needs?

@34j
Copy link
Author

34j commented Nov 24, 2024

@rgommers
Yes, I noticed it an hour ago. I think it is now almost upwardly superior to #589 thanks to @nstarman in that all classes are protocols, with some changes.
I thought stopping the use of Optional and Union should be done by other PRs, so this PR does not incorporate those changes.

@34j 34j force-pushed the fix/dataclass-to-protocol branch 2 times, most recently from 7175de0 to ac20ed5 Compare November 24, 2024 10:13
@lucascolley
Copy link
Contributor

@jorenham you may have opinions on the activity here

@jorenham
Copy link

@jorenham you may have opinions on the activity here

I certainly do. But before I share them, would you mind telling me what the exact purpose is of these "stubs" (which usually refer to .pyi files)?

@lucascolley
Copy link
Contributor

what the exact purpose is of these "stubs" (which usually refer to .pyi files)?

Not sure about the exact purpose, but they aren't (just) type stubs - they're primarily for housing the (sort of typed) signatures and docstrings from which the docs pages for individual functions are generated. I suppose they are "stubs" in the sense of having no function implementations.

@jorenham
Copy link

Not sure about the exact purpose

Then I think that the first step should be to figure that out, and document it.
If you don't know what you are building, then my typing-related opinions will all be based on baseless assumptions and personal preferences, which probably won't be very helpful.
So the only rational opinion I can give you at this point, is that I think that you should at least be able to answer the following questions:

  • Will it be used as a verification suite for the static type annotations of array-api libraries?
  • Will there be a "baseline" package that these protocols should at least be compatible with, e.g. numpy?
  • Should it follow the official python typing specification, and if so, how will that be verified?
  • Will it be distributed as a (standalone- or bundled-) package, and if so, should these types also be usable at runtime?

@lucascolley
Copy link
Contributor

If you don't know what you are building, then my typing-related opinions will all be based on baseless assumptions and personal preferences, which probably won't be very helpful.

Disclaimer: I'm sure the array API maintainers 'know what they are building' - I've contributed to the repo, but not majorly.

@34j
Copy link
Author

34j commented Nov 25, 2024

Perhaps the maintainers are mostly interested in discussing the contents of array-api and this may not be so important.

However, I have noticed a number of issues while working on this implementation and the #858 implementation, and would like to offer some observations.

  • [no-protocol] Frequently abuses TypeVar, as there are so many functions that do not use TypeVar for both arguments and return values; TypeVar could be used correctly (only) if Generic Protocol is used, however according to Array namespace #685 (comment), this has been suspended because it was Protocol is "harder to understand".
    The work that should be done depends on whether the people “difficult to understand” Protocols includes the contributors to this repository.
    • If the contributors can understand the protocol, protocol-based implementation should be done in this repository (the content should be uploaded to PyPI) and then the ci could automatically generate function code and upload them to another “starter” repository.
    • Otherwise, vise versa. (feat: automatically generate Protocol for array-api-namespace #858)
  • [no-separation] If this is the starting point for creating an array-api compatible library, it is bad that the mysterious TypeVars and _type.py with unknown Protocols are included and that no sub-modules other than linalg,fft are hidden (does not have _ prefix unlike array-api-strict). The repository should be split using automatic code generation, preferably as a “template” and stub package for developing array-api.
  • [no-ellipsis] Ellipsis should be added if there is no internal implementation of the function.
  • [no-pypi] Again, since array-api is not uploaded to PyPI, it cannot be used as a stub package or to check if a library is array-api compatible.
  • [pre-commit] pre-commit is not set up properly, making it hard to develop; pyupgrade (for Python3.9+ typing like list[str]) and ruff (or autoflake and isort) should be added.
    image
    https://github.com/asottile/pyupgrade?tab=readme-ov-file#pep-585-typing-rewrites

Thanks for taking a look anyway

@betatim
Copy link
Member

betatim commented Nov 25, 2024

Big 👍 to @jorenham suggestion of answering the question: "What problem does this solve (for users of the array API standard)?" For me "users of the standard" are both people creating array providing libraries like Numpy or array-api-strict, as well as people using such a library to build something else (e.g. scikit-learn).

@asmeurer
Copy link
Member

Just to be clear, the primary function of these "stubs" is so that they can be automatically included in the standard via Sphinx autodoc. We used to just have everything in RST files, but using autodoc is nicer because it lets us write the signatures in pure Python. It also allows people to just copy-paste a signature when implementing a function (e.g., I do this all the time when adding something to array-api-strict or array-api-compat). The test suite also uses this package to automatically generate some tests, although that's something that can be refactored concurrently.

Reusing them for typing is probably fine, as long as that particular usage is maintained. This module isn't really ever "imported" or anything. This package could be refactored into something installable from PyPI (this was discussed previously at #472). It's not clear to me how that would or should work since it's not a runnable package. And there's also complexity there since there are multiple versions of the standard.

@@ -147,3 +145,1229 @@ def dtypes(
"max rank": Optional[int],
},
)


class Array(Protocol[array, "dtype", "device", PyCapsule]): # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a good reason to move this from array_object.py.

@asmeurer
Copy link
Member

Also, since this is primarily used for documentation, the other thing we need to make sure is that the type signatures always remain readable. That means that we should always spell out types exactly (e.g., it would not be preferred to split out common union types into variables since those would be opaque in the documentation), and ideally the types used in the signatures should always match the types spelled out in the docstring.

@@ -90,7 +88,7 @@ def __len__(self, /) -> int:
...


class Info(Protocol):
class Info(Protocol[device]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the [device] here means after Protocol? The Info namespace object itself does not depend on the device (__array_namespace_info__ does not take a device parameter).

@jorenham
Copy link

It also allows people to just copy-paste a signature when implementing a function

Then it's probably a good idea to conform to the official typing specification, and validate this using static type checkers like basedpyright (a stricter pyright fork) and basedmypy (a mypy fork that is less broken)

It's not clear to me how that would or should work since it's not a runnable package. And there's also complexity there since there are multiple versions of the standard.

In an ideal scenario, it could be used like this:

# roughly based on the `scipy.fft._helper` stubs in `scipy-stubs`:
# https://github.com/jorenham/scipy-stubs/blob/master/scipy-stubs/fft/_helper.pyi

from typing import Unpack, overload
from array_api.typing.v2023 as xpt

### utility type aliases

# generic dtype protocol (requires a non-trivial spec), with an
# (invariant) type parameter for its "kind"
type NumericDType = xpt.DType[int] | xpt.DType[float] | xpt.DType[complex]

# shape-types are integer tuples, which is what numpy currently
# uses to annotate the shape-type of `ndarray`
type Size = int  # positive integer
type AtLeast0D = tuple[Size, ...]
type AtLeast1D = tuple[Size, *AtLeast0D]  # rejects `()`

# runtime-checkable generic array protocol, so that instances 
# of e.g. `numpy.ndarray` are assignable to it
type NumericTensor[DeviceT: xpt.Device = xpt.Device] = xpt.Array[
    AtLeast1D,     # shape-type argument
    NumericDType,  # dtype-type argument
    DeviceT,       # device-type _parameter_ (optional)
]

### public signatures 

@overload
def fftshift[T: NumericTensor](x: T, axes: Axes | None = None) -> T: ...
# <four other overloads for numpy array-likes>


# currently impossible; requires higher-kinded-typing
# https://github.com/python/typing/issues/548
@overload
def fftfreq[
    SizeT: Size,
    ArrayT: xpt.Array,
    DTypeT: xpt.DType,
    DeviceT: xpt.DeviceT,
](
    n: SizeT, 
    d: float,
    *,
    # obtain the array- dtype-, and device-types by matching 
    # against the generic array-namespace Protocol, and binding
    # to its (hypothetically generic) type-parameters
    xp: xpt.Namespace[ArrayT, DTypeT, DeviceT],
    device: DeviceT,
) -> ArrayT[        # the array-type from `xp`, that is
    tuple[SizeT],   # 1-dimensional and of size `n`,
    DTypeT[float],  # with a real floating-point dtype, and
    DeviceT[None],  # allocated on the default device
]: ...
# <two other overloads for the default `xp=None` case>

caution: completely off-topic rant ahead

So you could probably already tell; but I've actually been thinking quite a lot about this 😅. I've been planning on building something like this in optype, so that I can add array-api support in scipy-stubs.

But unfortunately, my first attempt at building this has kinda failed, mostly because I didn't realize that the current spec doesn't include a way to distinguish the different kinds of dtypes from one another using static typing. So there's no way to type something like the DType[float] from the example.

Either way, I'll probably give it another go in a couple of months or so. And if this time it'll actually succeed, I'll make sure to open an issue or PR so that we can figure out the next steps.

TLDR-ish

So I guess I'm trying to say that typing the array-api is very difficult, and that I think it would help if the spec itself would be more aware of the static typing challenges. So in that sense, these "stubs" might actually be a very good way to go achieve this. For example by trying to use these "stubs" to annotate a simple toy example (like the one in my example), and in such a way that static type-checkers understand it.

I realize that this might come over as if I'm trying to make this into some school exercise or something. But the reason I'm suggesting this, is because there have been way too many times where I thought that I understood some python-typing concept after reading about it, followed by a realization that I completely misunderstood it (often after introducing several bugs along the way) 😅.

@asmeurer
Copy link
Member

We should consider whether it would more sense to put the installable typing stuff in array-api-strict (or even in a completely new separate package). That way it remains separate from the "stubs" in the standard, which, as I noted, exist primarily for documentation. It would imply some manual copying from the standard into whatever package, but that is already what we have been doing to create array-api-strict and array-api-compat (and even the test suite), and it has worked reasonably well.

The advantage of keeping them separate is we wouldn't have to worry about potential conflicts between typing correctness and readability of the stubs as standard documentation, or potential issues that might arise with Sphinx. It also keeps the stubs here very simple (just functions with basic docstrings), and keeps the code complexity implied by typing stuff elsewhere.

It would also mean that the (English) text of the standard remains the single source of truth about what is specified. If the Python typing stubs end up disagreeing with that somehow, either because of a bug or because Python typing doesn't support some feature, the standard would be the thing that is correct. I think this is important because if there are two sources of truth (the text and typing "implementation" of the standard), there should be one final one in the case of any ambiguity, and I strongly feel that the final source of truth should be English text, not some reference code.

Having it in a separate package would also allow people who are the experts wrt typing to be able to maintain that package more effectively, without necessarily having to have their changes go through the review process of this repo, which tends to be a little more stringent/slower.

@jorenham
Copy link

I totally agree with you on that @asmeurer. And having independent release cycles seems also like a good idea in this case 😛.

@jorenham
Copy link

jorenham commented Nov 25, 2024

Oh and while we're still off-topic:
I think I just figured out a way around the "untype-able dtype-type-type and device-type problem" (without having to change the spec, that is): jorenham/optype#25

@34j 34j marked this pull request as draft November 26, 2024 04:50
@betatim
Copy link
Member

betatim commented Nov 26, 2024

It would also mean that the (English) text of the standard remains the single source of truth about what is specified.

👍 to this. We should not craft the spec in a certain way just because we can't write it down with Python's static type system.

@jorenham
Copy link

It would also mean that the (English) text of the standard remains the single source of truth about what is specified.

👍 to this. We should not craft the spec in a certain way just because we can't write it down with Python's static type system.

Not necessarily, no. But it could help to actively take it into consideration when making design decisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: Static Typing Static typing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why _types.finfo_object and _types.iinfo_object, _array_object._array does not inherit Protocol ?
6 participants