Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unparameterized np.ndarray typings produce "Type of ... is partially unknown" Pyright type errors. #309

Open
DylanLukes opened this issue Apr 15, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@DylanLukes
Copy link

Problem

The type np.ndarray is stubbed in this library as:

class ndarray(_ArrayOrScalarCommon, Generic[_ShapeType, _DType_co]):
    ...

Throughout these stubs, the type np.ndarray is used without provided type parameters, seemingly with the expectation that this is treated as np.ndarray[Any, Any] (or more properly np.ndarray[object. object]). However, Pyright in strict mode alternately interprets this as np.ndarray[Unknown, Unknown].

As a result, every method that involves np.ndarray or a type alias which includes it produces a partially-unknown-type error:

Example Reproduction

For example, if we take the following simple file example.py...

import numpy as np

from typings.sklearn.linear_model import LinearRegression


def example():
    x = np.array([1, 2, 3, 4, 5])
    y = np.array([2, 4, 6, 8, 10])

    linreg = LinearRegression()
    linreg.fit(x, y)
❯ pyright src/path/to/example.py

src/path/to/example.py
  src/path/to/example/example.py:11:5 - error: Type of "fit" is partially unknown
    Type of "fit" is "(X: ndarray[Unknown, Unknown] | DataFrame | spmatrix | Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y: ndarray[Unknown, Unknown] | DataFrame | spmatrix | Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None) -> LinearRegression" 

In this case, the error occurs because the type of fit is:

    def fit(
        self: LinearRegression_Self,
        X: MatrixLike | ArrayLike,
        y: MatrixLike | ArrayLike,
        sample_weight: None | ArrayLike = None,
    ) -> LinearRegression_Self:
        ...

And in turn MatrixLike is a (private) typealias that resolves to:

MatrixLike = np.ndarray | pd.DataFrame | spmatrix

Resolution

At least for this example, changing that type alias as follows resolves the type error.

MatrixLike = np.ndarray | pd.DataFrame | spmatrix

System Details:

OS: MacOS Sonoma 14.1.2
Python: CPython 3.12.1
Pyright: 1.1.358

Pyright configuration:

[tool.pyright]
include = ["./src", "./tests"]
stubPath = "./typings"

typeCheckingMode = "strict"
reportMissingImports = true
reportMissingTypeStubs = true

pythonVersion = "3.12"
@DylanLukes DylanLukes changed the title Bare np.ndarray typings produce "Type of ... is partially unknown" Pyright type errors. Unparameterized np.ndarray typings produce "Type of ... is partially unknown" Pyright type errors. Apr 15, 2024
@DylanLukes
Copy link
Author

DylanLukes commented Apr 15, 2024

Two other things:

  1. It may be the case that np.ndarray[???, PythonScalar] where ??? is something more appropriate for _ShapeType is a better choice. It appears that Numpy defined NDArray = ndarray[Any, dtype[_ScalarType_co]].
  2. There are many other places in the codebase where bare np.ndarray are used, this is just one that covers a lot of cases on account of being a widely used type alias.

I've also tried:

PythonScalar = str | int | float | bool

ArrayLike = numpy.typing.ArrayLike
MatrixLike = numpy.typing.NDArray[PythonScalar] | pd.DataFrame | spmatrix

But this also produces unknown types in ndarray[Any, dtype[Unknown]] on account of:

_ScalarType_co = TypeVar("_ScalarType_co", bound=generic, covariant=True)  # <-- this bit
_DType = TypeVar("_DType", bound=dtype[Any])
_DType_co = TypeVar("_DType_co", covariant=True, bound=dtype[Any])

NDArray = ndarray[Any, dtype[_ScalarType_co]]

So, np.ndarray[object, object] it is...

@debonte
Copy link
Contributor

debonte commented May 7, 2024

When I change MatrixLike to:

MatrixLike = np.ndarray[object, object] | pd.DataFrame | spmatrix

Pylance gives me a diagnostic on the second type argument:

Type "object" cannot be assigned to type variable "_DType_co@ndarray"
  Type "object" is incompatible with bound type "dtype[Any]" for type variable "_DType_co@ndarray"
    "object" is incompatible with "dtype[Any]" Pylance[reportInvalidTypeArguments]

How about using Any which would be equivalent to what we have today but would eliminate the reportUnknownMemberType diagnostic?

MatrixLike = np.ndarray[Any, Any] | pd.DataFrame | spmatrix

Feel free to submit a PR.

@debonte debonte added the bug Something isn't working label May 7, 2024
@heejaechang
Copy link
Contributor

we might need better documentation for these errors? (https://microsoft.github.io/pyright/#/configuration?id=type-check-diagnostics-settings)

reportUnknownParameterType [boolean or string, optional]: Generate or suppress diagnostics for input or return parameters for functions or methods that have an unknown type. The default value for this setting is "none".

reportUnknownArgumentType [boolean or string, optional]: Generate or suppress diagnostics for call arguments for functions or methods that have an unknown type. The default value for this setting is "none".

reportUnknownLambdaType [boolean or string, optional]: Generate or suppress diagnostics for input or return parameters for lambdas that have an unknown type. The default value for this setting is "none".

reportUnknownVariableType [boolean or string, optional]: Generate or suppress diagnostics for variables that have an unknown type. The default value for this setting is "none".

reportUnknownMemberType [boolean or string, optional]: Generate or suppress diagnostics for class or instance variables that have an unknown type. The default value for this setting is "none".

this doc might help to understand about these errors

https://microsoft.github.io/pyright/#/typed-libraries?id=examples-of-known-ambiguous-and-unknown-types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants