[red-knot] Narrowing For Truthiness Checks (`if x` or `if not x`) #14687

cake-monotone · 2024-11-30T07:21:04Z

This turned out to be a really interesting but much trickier task than I expected. I thought this would be straightforward, but it was not.
While I am unsure if this PR will ultimately be merged, I believe it offers valuable insights.

Before diving in, I recommend reading the comment organized by Carljm (link to the comment).
Most of the content in this document comes from the ideas outlined in that comment.

Notation

We often use the terms "Truthy" and "Falsy," but I think we might interpret them a little differently.
To make sure we’re all on the same page, let’s start by defining these terms. I’ll also include some basic set theory here, but don’t panic—grab a towel, and I’ll keep it as simple as possible.

Truthy: Instances that pass if x.
Falsy: Instances that pass if not x.

These sets can include a infinite number of elements. But here are two key takeaways:

1. AmbiguousTruthiness Exists

There are instances that can pass both if x and if not x. Mutable objects like list and set are good examples.

We can define AmbiguousTruthiness as:

The set of instances that can pass both if x and if not x.

2. `int & Falsy != Literal[0, False]`

Surprisingly, the intersection of int and Falsy cannot be represented by {0, False}.
Why? Because the int also includes instances of subclasses, and these subclasses can override __bool__. Some of them can even return random results. For example:

class RandomBooleanInt(int):
    def __bool__(self) -> bool:
        import random
        return bool(random.randint(0, 1))

Derived Sets

We can use operations on Truthy, Falsy, and AmbiguousTruthiness to define some useful sets:

AlwaysTruthy = Truthy & ~AmbiguousTruthiness
AlwaysFalsy = Falsy & ~AmbiguousTruthiness

These definitions reinforce our intuition

Truthy = AlwaysTruthy | AmbiguousTruthiness
Falsy = AlwaysFalsy | AmbiguousTruthiness

Also:

Truthy & Falsy = AmbiguousTruthiness
Truthy | Falsy = U (U is the Universal Set in set theory)

And:

~Truthy = AlwaysFalsy
~Falsy = AlwaysTruthy

Key Takeaways

1. Truthy & Falsy Isn’t Empty

The intersection of Truthy and Falsy isn’t an empty set. For example, while if x and not x might look like it should reject everything, it can still pass with types like RandomBooleanInt.
Here’s another example:

x = [1]

if x and (x.clear() is None) and not x:
    ...

2. ~Truthy ≠ Falsy

The complement of Truthy isn’t the same as Falsy.
This is because Truthy includes ambiguous instances, so the complement of Truthy is AlwaysFalsy.

This might feel counterintuitive. It even looks like it conflicts with Python code:

x = list()
if x:
    reveal_type(x)  # revealed: list
else:
    reveal_type(x)  # revealed: list

But actually, there’s no conflict. Python’s not keyword flips the boolean value of an object; it doesn’t mean the set-theoretic complement.

Here’s how it works instead:

For any ambiguous x, (not x) is also ambiguous.
For any Truthy x, (not x) is Falsy.

If we think of not as applying to every element of a set, it works like this:

A = {1, 2, 3, ...}
not A = {not 1, not 2, not 3, ...}

So:

not Truthy = Falsy
not Ambiguous = Ambiguous

Just be careful to distinguish between complements and the not operator.

On Implementation

The theory sounds great, but it’s not that easy to implement in practice.

For example, instead of directly extracting int & Falsy, it’s easier to focus on (int & ~AlwaysTruthy)—removing the AlwaysTruthy elements from int. A simpler approach is to use Truthy and Falsy Literals.
To calculate A & Truthy, we can try this:

A & ~FalsyLiterals = A & ~(AlwaysFalsy & Literals)
                    = A & (~AlwaysFalsy | ~Literals)
                    = A & (Truthy | ~Literals)
                    = (A & Truthy) | (A & ~Literals)

This shows that A & Truthy is a subset of (A & ~FalsyLiterals). While (A & ~Literals) might add some extra elements, that’s not a big problem since we can’t fully determine the boolean behavior of non-literal objects anyway.

Limitations

Adding Truthy and Falsy directly as variants in Type would make the system unnecessarily complicated. As Carljm pointed out, it’s best to keep their use limited to the Narrowing phase.
Sometimes, evaluating (A & Falsy) as (A & ~TruthyLiterals) isn’t even possible. TruthyLiterals is basically an infinite set, Without encountering a proper A, we cannot clearly represent (A & ~TruthyLiterals) as a type enum. For example, something like ~Literal[0] & Falsy = (~Literal[0] & ~TruthyLiterals) is just too tricky to represent as a simple variant in Type.
More generally, negative-only intersections because (~B & ~TruthyLiterals) = ~(B | TruthyLiterals) essentially requires you to union together all the TruthyLiterals. Thankfully, if we stick to using this only during the Narrowing phase, it won’t be a huge problem since the type we’re narrowing will not be negative-only intersections.

What This PR Includes

Expanded NarrowingConstraints to handle Truthy and Falsy.
Deferred evaluation of Truthy and Falsy constraints until they’re applied to binding_ty.
Added Type::exclude_always_truthy() and Type::exclude_always_falsy() to implement Truthy and Falsy logic.
Supporting Narrowing for if x and if not x

Areas to Review Carefully

My understanding of salsa is limited, so I’d appreciate careful reviews of salsa-related code.
I used DeferredType for lazy evaluation, along with NarrowingUnion and NarrowingIntersection. If you have ideas for a better approach, let’s discuss!
2 - 1. Another option could be to pass base_type directly when creating NarrowingConstraintsBuilder::new, which would let us evaluate Truthy or Falsy eagerly. This would eliminate the need for DeferredType, Union, and Intersection, but it might significantly hurt performance by changing the cache key from (expression) to (base_type, expression).

Test Plan

New Markdown test for truthiness narrowing narrow/truthiness.md
unit tests in types.rs and builders.rs (cargo test --package red_knot_python_semantic --lib -- types)

github-actions · 2024-11-30T07:26:56Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

crates/red_knot_python_semantic/src/types.rs

AlexWaygood

Not a full review yet, sorry -- just some simplification opportunites I spotted from skimming through!

crates/red_knot_python_semantic/src/types.rs

crates/red_knot_python_semantic/src/types/narrow.rs

crates/red_knot_python_semantic/src/types.rs

carljm

Thank you!! This PR, and writeup, is indeed very clarifying.

Submitting a partial review here because something occurred to me mid-review: I think perhaps we can simplify this by going back to Type::Truthy and Type::Falsy types, but with a better definition? That is, define Type::Truthy as what you've called AlwaysTruthy (the set of all objects known to always be truthy, or in other words, the super-set of all types whose bool() method returns Truthiness::AlwaysTrue) and Type::Falsy as AlwaysFalsy (the set of all objects known to always be falsy, the super-set of all types whose bool() method returns Truthiness::AlwaysFalse).

It seems like if we do that, and define the appropriate handling in Type::is_subtype_of (any type whose .bool() evaluates to AlwaysFalsy is a subtype of Falsy, and any type whose .bool() evaluates to AlwaysTruthy is a subtype of Truthy) and Type::is_disjoint_from (Truthy is disjoint from Falsy or any subtype of it; Falsy is disjoint from Truthy or any subtype of it), we can get the same behavior as in this PR, but with many fewer new codepaths. It will just reuse the existing implementation of union and intersection simplification, and we don't need to add all the additional machinery of DeferredType etc.

The unsafety in the previous attempt to define Falsy and Truthy types arose because we defined those types too broadly: we tried to include any object that evaluates as truthy at any time in the Truthy type (e.g. a non-empty list, which could become falsy at any time), and we tried to narrow with positive intersections (that is, if x would narrow by intersecting with Truthy), which results in conclusions that can easily become unsound due to mutation.

But if we define the Truthy and Falsy types in the minimal way (only objects known to always be truthy/falsy), and we intersect negatively (if x: narrows by intersecting with ~Falsy), that's sound. And it allows us to do even a bit more than we can in this PR, because we can safely preserve the intersection with Truthy or Falsy (for instance, if not x: when x is of an ambiguously-truthy type X can be typed as X & ~Truthy.

Perhaps it will be clearer if we actually name these types Type::AlwaysTruthy and Type::AlwaysFalsy.

In most cases not involving truthiness (Type::member, for example), these two types should both be treated as if they were object.

crates/red_knot_python_semantic/resources/mdtest/narrow/truthiness.md

crates/red_knot_python_semantic/src/types.rs

cake-monotone · 2024-12-05T18:07:59Z

That is a very clear approach! I realised my understanding of Type was a bit limited. This is really interesting, and it makes sense. If I've understood correctly, we wouldn’t even need Type::exclude_always_truthy(), right? is_subtype_of and is_disjoint_from would work properly within the Union and automatically remove all Truthy or Falsy Literals.

Ok! I'll try implementing it using the approach you suggested.

Terminology Issues

I think the terms Truthy and Falsy are too easy to misunderstand.

Perhaps it would be better to drop the term Truthy altogether and instead use terms like:

CanBeTruthy (passes if x)
AlwaysTruthy (bool() is always True)
RandomTruthiness (bool() is undecidable)

This might help reduce confusion. So, I think it would be better to define Type::AlwaysTruthy instead of Type::Truthy in this context.

I also think the name AmbiguousTruthiness wasn’t the quite right. I think RandomTruthiness would be a better fit.
The word "Ambiguous" is already used in Truthiness::Ambiguous within Type::bool() to represent the union of Truthiness::AlwaysTruthy, Truthiness::AlwaysFalsy, and RandomTruthiness. But actually, RandomTruthiness is the intersection of CanBeTruthy and CanBeFalsy.

A Slightly Unrelated Question

With this approach, it looks like we could have an intersection like A & ~AlwaysTruthy & ~AlwaysFalsy for a type A.
Would it be correct to bind such a Negative Intersection type directly to a symbol?
If a new user encounters this type, they might not immediately understand its meaning.

class A: ...

x = A()

if x and not x:
    y = x
    reveal_type(y)  # revealed: A & ~Falsy & ~Truthy

cake-monotone · 2024-12-05T18:30:26Z

Due to some unexpected events where I live 😢 , PR updates might be delayed by 4–6 days.

carljm · 2024-12-05T22:41:21Z

Due to some unexpected events where I live 😢 , PR updates might be delayed by 4–6 days.

No problem, please take all the time you need! Really appreciate your excellent contributions. Whatever is going on, I hope you are safe and out of harm's way.

If I've understood correctly, we wouldn’t even need Type::exclude_always_truthy(), right? is_subtype_of and is_disjoint_from would work properly within the Union and automatically remove all Truthy or Falsy Literals.

Yes, that's the idea!

Regarding terminology, I mostly agree with your comments (in particular, that it will be clearer to use Type::AlwaysTruthy and Type::AlwaysFalsy instead of Type::Truthy and Type::Falsy). I don't love the term RandomTruthiness; would probably prefer MutableTruthiness. But I don't think we necessarily will need to use any term for this "set of objects whose truthiness can change over time" -- it's probably sufficient to call it object & ~AlwaysTruthy & ~AlwaysFalsy if we need to refer to it. I also don't think we'll need to use the term CanBeTruthy or CanBeFalsy in the code, though I don't mind those terms.

With this approach, it looks like we could have an intersection like A & ~AlwaysTruthy & ~AlwaysFalsy for a type A.
Would it be correct to bind such a Negative Intersection type directly to a symbol?

It is possible that we'll need to improve our display of some intersection types, if they occur frequently. This includes both negative intersections with AlwaysTruthy and AlwaysFalsy, as well as possibly intersections with Any/Unknown. I'd rather leave this as a separate topic for future improvement in a holistic way. For now let's just focus on getting the core semantics right, and not worry about seeing some complex types in tests.

…s type narrowing

cake-monotone · 2024-12-14T09:05:00Z

Thanks for waiting!

I had to follow up on some merged changes, so this took a bit longer than expected. Additionally, besides the mdtest, tests have also been added in builder.rs and types.rs.

…for-if-x

carljm

This is excellent, thank you so much! Really no substantive notes, just suggestions of a few testing additions, and a couple additional operations we can support easily.

crates/red_knot_python_semantic/resources/mdtest/narrow/truthiness.md

crates/red_knot_python_semantic/src/types.rs

…for-if-x

carljm

Great work.

cake-monotone requested review from carljm, MichaReiser, AlexWaygood and sharkdp as code owners November 30, 2024 07:21

cake-monotone commented Nov 30, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved

AlexWaygood added red-knot Multi-file analysis & type inference great writeup A wonderful example of a quality contribution labels Nov 30, 2024

AlexWaygood reviewed Nov 30, 2024

View reviewed changes

AlexWaygood reviewed Dec 1, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved

carljm reviewed Dec 3, 2024

View reviewed changes

Add Type::AlwaysTruthy, Type::AlwaysFalsy and implement truthines…

356c8de

…s type narrowing

cake-monotone force-pushed the red-knot-narrowing-for-if-x branch from 4077d49 to 356c8de Compare December 14, 2024 08:59

cake-monotone added 3 commits December 15, 2024 11:46

Additional Tests

a874707

Merge remote-tracking branch 'upstream/main' into red-knot-narrowing-…

7995c4b

…for-if-x

Appending an empty tuple literal to the mdtest

bbf80f5

carljm reviewed Dec 16, 2024

View reviewed changes

cake-monotone added 2 commits December 17, 2024 17:59

Merge remote-tracking branch 'upstream/main' into red-knot-narrowing-…

dcdd92b

…for-if-x

refactor mdtest and resolve some todo for AlwaysTruthy, AlwaysFalsy

7ac1cd1

cake-monotone requested review from carljm and AlexWaygood December 17, 2024 12:00

carljm approved these changes Dec 17, 2024

View reviewed changes

carljm merged commit f463fa7 into astral-sh:main Dec 17, 2024
21 checks passed

This was referenced Dec 17, 2024

[red-knot] more precise types from chained boolean expressions #13632

Closed

[red-knot] add support for more type narrowing forms #13694

Open

cake-monotone deleted the red-knot-narrowing-for-if-x branch December 18, 2024 14:54

sharkdp mentioned this pull request Jan 10, 2025

[red-knot] Simplify unions of T and ~T #15400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] Narrowing For Truthiness Checks (`if x` or `if not x`) #14687

[red-knot] Narrowing For Truthiness Checks (`if x` or `if not x`) #14687

cake-monotone commented Nov 30, 2024 •

edited

Loading

github-actions bot commented Nov 30, 2024 •

edited

Loading

AlexWaygood left a comment

carljm left a comment •

edited

Loading

cake-monotone commented Dec 5, 2024

cake-monotone commented Dec 5, 2024 •

edited

Loading

carljm commented Dec 5, 2024

cake-monotone commented Dec 14, 2024 •

edited

Loading

carljm left a comment

carljm left a comment

[red-knot] Narrowing For Truthiness Checks (if x or if not x) #14687

[red-knot] Narrowing For Truthiness Checks (if x or if not x) #14687

Conversation

cake-monotone commented Nov 30, 2024 • edited Loading

Notation

1. AmbiguousTruthiness Exists

2. int & Falsy != Literal[0, False]

Derived Sets

Key Takeaways

1. Truthy & Falsy Isn’t Empty

2. ~Truthy ≠ Falsy

On Implementation

Limitations

What This PR Includes

Areas to Review Carefully

Test Plan

github-actions bot commented Nov 30, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

AlexWaygood left a comment

Choose a reason for hiding this comment

carljm left a comment • edited Loading

Choose a reason for hiding this comment

cake-monotone commented Dec 5, 2024

Terminology Issues

A Slightly Unrelated Question

cake-monotone commented Dec 5, 2024 • edited Loading

carljm commented Dec 5, 2024

cake-monotone commented Dec 14, 2024 • edited Loading

carljm left a comment

Choose a reason for hiding this comment

carljm left a comment

Choose a reason for hiding this comment

[red-knot] Narrowing For Truthiness Checks (`if x` or `if not x`) #14687

[red-knot] Narrowing For Truthiness Checks (`if x` or `if not x`) #14687

cake-monotone commented Nov 30, 2024 •

edited

Loading

2. `int & Falsy != Literal[0, False]`

github-actions bot commented Nov 30, 2024 •

edited

Loading

`ruff-ecosystem` results

carljm left a comment •

edited

Loading

cake-monotone commented Dec 5, 2024 •

edited

Loading

cake-monotone commented Dec 14, 2024 •

edited

Loading