-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Narrowing For Truthiness Checks (if x
or if not x
)
#14687
base: main
Are you sure you want to change the base?
[red-knot] Narrowing For Truthiness Checks (if x
or if not x
)
#14687
Conversation
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
W291 | 1 | 1 | 0 | 0 | 0 |
W292 | 1 | 1 | 0 | 0 | 0 |
// TODO: tuple literal should be included | ||
// .add(Type::tuple(db, &[])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure if this is the correct way to represent ()
.
It also seems like ()
hasn’t been considered yet in is_disjoint_from
.
I think it make sense to implement support for ()
first. Adding this code as-is makes the mdtest results much harder to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure if this is the correct way to represent
()
.
I think it is!
It also seems like
()
hasn’t been considered yet inis_disjoint_from
.
That's very possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Ah, my point was that the current implementation without TupleLiteral produces clean results like A | B
. However, if we uncomment the TupleLiteral, it results like A & ~Tuple[()] | B & ~Tuple[()]
, which makes the tests harder to read.
I’ll try to resolve this in a different PR before we uncomment Tuple Literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review yet, sorry -- just some simplification opportunites I spotted from skimming through!
Type::Union(union) => { | ||
let mut builder = UnionBuilder::new(db); | ||
|
||
for element in union.elements(db) { | ||
builder = builder.add(element.exclude_always_truthy(db)); | ||
} | ||
|
||
builder.build() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type::Union(union) => { | |
let mut builder = UnionBuilder::new(db); | |
for element in union.elements(db) { | |
builder = builder.add(element.exclude_always_truthy(db)); | |
} | |
builder.build() | |
} | |
Type::Union(union) => union.map(db, |element| element.exclude_always_truthy(db)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!! This PR, and writeup, is indeed very clarifying.
Submitting a partial review here because something occurred to me mid-review: I think perhaps we can simplify this by going back to Type::Truthy
and Type::Falsy
types, but with a better definition? That is, define Type::Truthy
as what you've called AlwaysTruthy
(the set of all objects known to always be truthy, or in other words, the super-set of all types whose bool()
method returns Truthiness::AlwaysTrue
) and Type::Falsy
as AlwaysFalsy
(the set of all objects known to always be falsy, the super-set of all types whose bool()
method returns Truthiness::AlwaysFalse
).
It seems like if we do that, and define the appropriate handling in Type::is_subtype_of
(any type whose .bool()
evaluates to AlwaysFalsy
is a subtype of Falsy
, and any type whose .bool()
evaluates to AlwaysTruthy
is a subtype of Truthy
) and Type::is_disjoint_from
(Truthy
is disjoint from Falsy
or any subtype of it; Falsy
is disjoint from Truthy
or any subtype of it), we can get the same behavior as in this PR, but with many fewer new codepaths. It will just reuse the existing implementation of union and intersection simplification, and we don't need to add all the additional machinery of DeferredType
etc.
The unsafety in the previous attempt to define Falsy
and Truthy
types arose because we defined those types too broadly: we tried to include any object that evaluates as truthy at any time in the Truthy
type (e.g. a non-empty list, which could become falsy at any time), and we tried to narrow with positive intersections (that is, if x
would narrow by intersecting with Truthy
), which results in conclusions that can easily become unsound due to mutation.
But if we define the Truthy
and Falsy
types in the minimal way (only objects known to always be truthy/falsy), and we intersect negatively (if x:
narrows by intersecting with ~Falsy
), that's sound. And it allows us to do even a bit more than we can in this PR, because we can safely preserve the intersection with Truthy
or Falsy
(for instance, if not x:
when x
is of an ambiguously-truthy type X can be typed as X & ~Truthy
.
Perhaps it will be clearer if we actually name these types Type::AlwaysTruthy
and Type::AlwaysFalsy
.
In most cases not involving truthiness (Type::member
, for example), these two types should both be treated as if they were object
.
def foo() -> list | set | dict: | ||
return list() | ||
|
||
x = foo() | ||
|
||
if x: | ||
reveal_type(x) # revealed: list | set | dict | ||
else: | ||
reveal_type(x) # revealed: list | set | dict | ||
|
||
if x and not x: | ||
reveal_type(x) # revealed: list | set | dict | ||
else: | ||
reveal_type(x) # revealed: list | set | dict | ||
|
||
if x or not x: | ||
reveal_type(x) # revealed: list | set | dict | ||
else: | ||
reveal_type(x) # revealed: list | set | dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small preference for just using class A: ...
, class B: ...
here, so these tests don't need updating when we add generics support.
We can customize the truthiness of a class using a metaclass. Thus, class literals are generally | ||
included in the category of AmbiguousTruthiness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could respect a case where a class has a metaclass with a __bool__
method that returns Literal[True]
or Literal[False]
. Class literal types don't include subclasses, and even with subclasses included (the Type::SubclassOf
case), the metaclass of a subclass must be a subclass of the metaclass of the base class, which can't override __bool__
in an incompatible way.
That said, I think this case is likely extremely rare and I don't think it's important that we handle it; I think it's fine to treat all class literals as ambiguously truthy. But it might be worth updating this comment to recognize that we could be more precise in some cases, we just haven't chosen to bother.
debug_assert!( | ||
!intersection.positive(db).is_empty(), | ||
"negative-only-intersections can not use `exclude_always_truthy`" | ||
); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems brittle (it's the responsibility of every caller to ensure we never call this method on a negation type?) and unnecessary. The result of calling this method on a negation type is not wrong, it's just imprecise; we don't actually exclude anything.
debug_assert!( | |
!intersection.positive(db).is_empty(), | |
"negative-only-intersections can not use `exclude_always_truthy`" | |
); |
// Case 2: Positives are empty | ||
// ~B & Falsy | ||
// = ~(B | AlwaysTruthy) | ||
// ≈ ~(B.include_always_truthy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is confusing because we don't have an include_always_truthy()
method, and it's not self-evident what it would do.
That is a very clear approach! I realised my understanding of Ok! I'll try implementing it using the approach you suggested. Terminology IssuesI think the terms Truthy and Falsy are too easy to misunderstand. Perhaps it would be better to drop the term Truthy altogether and instead use terms like:
This might help reduce confusion. So, I think it would be better to define I also think the name A Slightly Unrelated QuestionWith this approach, it looks like we could have an intersection like class A: ...
x = A()
if x and not x:
y = x
reveal_type(y) # revealed: A & ~Falsy & ~Truthy |
Due to some unexpected events where I live 😢 , PR updates might be delayed by 4–6 days. |
No problem, please take all the time you need! Really appreciate your excellent contributions. Whatever is going on, I hope you are safe and out of harm's way.
Yes, that's the idea! Regarding terminology, I mostly agree with your comments (in particular, that it will be clearer to use
It is possible that we'll need to improve our display of some intersection types, if they occur frequently. This includes both negative intersections with |
This turned out to be a really interesting but much trickier task than I expected. I thought this would be straightforward, but it was not.
While I am unsure if this PR will ultimately be merged, I believe it offers valuable insights.
Before diving in, I recommend reading the comment organized by Carljm (link to the comment).
Most of the content in this document comes from the ideas outlined in that comment.
Notation
We often use the terms "Truthy" and "Falsy," but I think we might interpret them a little differently.
To make sure we’re all on the same page, let’s start by defining these terms. I’ll also include some basic set theory here, but don’t panic—grab a towel, and I’ll keep it as simple as possible.
if x
.if not x
.These sets can include a infinite number of elements. But here are two key takeaways:
1. AmbiguousTruthiness Exists
There are instances that can pass both
if x
andif not x
. Mutable objects likelist
andset
are good examples.We can define AmbiguousTruthiness as:
if x
andif not x
.2.
int & Falsy != Literal[0, False]
Surprisingly, the intersection of
int
andFalsy
cannot be represented by{0, False}
.Why? Because the
int
also includes instances of subclasses, and these subclasses can override__bool__
. Some of them can even return random results. For example:Derived Sets
We can use operations on
Truthy
,Falsy
, andAmbiguousTruthiness
to define some useful sets:These definitions reinforce our intuition
Also:
And:
Key Takeaways
1. Truthy & Falsy Isn’t Empty
The intersection of
Truthy
andFalsy
isn’t an empty set. For example, whileif x and not x
might look like it should reject everything, it can still pass with types likeRandomBooleanInt
.Here’s another example:
2. ~Truthy ≠ Falsy
The complement of
Truthy
isn’t the same asFalsy
.This is because
Truthy
includes ambiguous instances, so the complement ofTruthy
is AlwaysFalsy.This might feel counterintuitive. It even looks like it conflicts with Python code:
But actually, there’s no conflict. Python’s
not
keyword flips the boolean value of an object; it doesn’t mean the set-theoretic complement.Here’s how it works instead:
x
,(not x)
is also ambiguous.x
,(not x)
is Falsy.If we think of
not
as applying to every element of a set, it works like this:So:
Just be careful to distinguish between complements and the
not
operator.On Implementation
The theory sounds great, but it’s not that easy to implement in practice.
For example, instead of directly extracting
int & Falsy
, it’s easier to focus on(int & ~AlwaysTruthy)
—removing the AlwaysTruthy elements from int. A simpler approach is to use Truthy and Falsy Literals.To calculate
A & Truthy
, we can try this:This shows that
A & Truthy
is a subset of(A & ~FalsyLiterals)
. While(A & ~Literals)
might add some extra elements, that’s not a big problem since we can’t fully determine the boolean behavior of non-literal objects anyway.Limitations
Truthy
andFalsy
directly as variants inType
would make the system unnecessarily complicated. As Carljm pointed out, it’s best to keep their use limited to the Narrowing phase.(A & Falsy)
as(A & ~TruthyLiterals)
isn’t even possible. TruthyLiterals is basically an infinite set, Without encountering a proper A, we cannot clearly represent(A & ~TruthyLiterals)
as a type enum. For example, something like~Literal[0] & Falsy
=(~Literal[0] & ~TruthyLiterals)
is just too tricky to represent as a simple variant in Type.More generally, negative-only intersections because
(~B & ~TruthyLiterals)
=~(B | TruthyLiterals)
essentially requires you to union together all the TruthyLiterals. Thankfully, if we stick to using this only during the Narrowing phase, it won’t be a huge problem since the type we’re narrowing will not be negative-only intersections.What This PR Includes
NarrowingConstraints
to handle Truthy and Falsy.binding_ty
.Type::exclude_always_truthy()
andType::exclude_always_falsy()
to implement Truthy and Falsy logic.if x
andif not x
Areas to Review Carefully
salsa
is limited, so I’d appreciate careful reviews of salsa-related code.DeferredType
for lazy evaluation, along withNarrowingUnion
andNarrowingIntersection
. If you have ideas for a better approach, let’s discuss!2 - 1. Another option could be to pass
base_type
directly when creatingNarrowingConstraintsBuilder::new
, which would let us evaluate Truthy or Falsy eagerly. This would eliminate the need forDeferredType
,Union
, andIntersection
, but it might significantly hurt performance by changing the cache key from(expression)
to(base_type, expression)
.