Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Narrowing For Truthiness Checks (if x or if not x) #14687

Merged
merged 6 commits into from
Dec 17, 2024

Conversation

cake-monotone
Copy link
Contributor

@cake-monotone cake-monotone commented Nov 30, 2024

This turned out to be a really interesting but much trickier task than I expected. I thought this would be straightforward, but it was not.
While I am unsure if this PR will ultimately be merged, I believe it offers valuable insights.

Before diving in, I recommend reading the comment organized by Carljm (link to the comment).
Most of the content in this document comes from the ideas outlined in that comment.


Notation

We often use the terms "Truthy" and "Falsy," but I think we might interpret them a little differently.
To make sure we’re all on the same page, let’s start by defining these terms. I’ll also include some basic set theory here, but don’t panic—grab a towel, and I’ll keep it as simple as possible.

  • Truthy: Instances that pass if x.
  • Falsy: Instances that pass if not x.

These sets can include a infinite number of elements. But here are two key takeaways:

1. AmbiguousTruthiness Exists

There are instances that can pass both if x and if not x. Mutable objects like list and set are good examples.

We can define AmbiguousTruthiness as:

  • The set of instances that can pass both if x and if not x.

2. int & Falsy != Literal[0, False]

Surprisingly, the intersection of int and Falsy cannot be represented by {0, False}.
Why? Because the int also includes instances of subclasses, and these subclasses can override __bool__. Some of them can even return random results. For example:

class RandomBooleanInt(int):
    def __bool__(self) -> bool:
        import random
        return bool(random.randint(0, 1))

Derived Sets

We can use operations on Truthy, Falsy, and AmbiguousTruthiness to define some useful sets:

  • AlwaysTruthy = Truthy & ~AmbiguousTruthiness
  • AlwaysFalsy = Falsy & ~AmbiguousTruthiness

These definitions reinforce our intuition

  • Truthy = AlwaysTruthy | AmbiguousTruthiness
  • Falsy = AlwaysFalsy | AmbiguousTruthiness

Also:

  • Truthy & Falsy = AmbiguousTruthiness
  • Truthy | Falsy = U (U is the Universal Set in set theory)

And:

  • ~Truthy = AlwaysFalsy
  • ~Falsy = AlwaysTruthy

Key Takeaways

1. Truthy & Falsy Isn’t Empty

The intersection of Truthy and Falsy isn’t an empty set. For example, while if x and not x might look like it should reject everything, it can still pass with types like RandomBooleanInt.
Here’s another example:

x = [1]

if x and (x.clear() is None) and not x:
    ...

2. ~Truthy ≠ Falsy

The complement of Truthy isn’t the same as Falsy.
This is because Truthy includes ambiguous instances, so the complement of Truthy is AlwaysFalsy.

This might feel counterintuitive. It even looks like it conflicts with Python code:

x = list()
if x:
    reveal_type(x)  # revealed: list
else:
    reveal_type(x)  # revealed: list

But actually, there’s no conflict. Python’s not keyword flips the boolean value of an object; it doesn’t mean the set-theoretic complement.

Here’s how it works instead:

  • For any ambiguous x, (not x) is also ambiguous.
  • For any Truthy x, (not x) is Falsy.

If we think of not as applying to every element of a set, it works like this:

A = {1, 2, 3, ...}
not A = {not 1, not 2, not 3, ...}

So:

  • not Truthy = Falsy
  • not Ambiguous = Ambiguous

Just be careful to distinguish between complements and the not operator.


On Implementation

The theory sounds great, but it’s not that easy to implement in practice.

For example, instead of directly extracting int & Falsy, it’s easier to focus on (int & ~AlwaysTruthy)—removing the AlwaysTruthy elements from int. A simpler approach is to use Truthy and Falsy Literals.
To calculate A & Truthy, we can try this:

A & ~FalsyLiterals = A & ~(AlwaysFalsy & Literals)
                    = A & (~AlwaysFalsy | ~Literals)
                    = A & (Truthy | ~Literals)
                    = (A & Truthy) | (A & ~Literals)

This shows that A & Truthy is a subset of (A & ~FalsyLiterals). While (A & ~Literals) might add some extra elements, that’s not a big problem since we can’t fully determine the boolean behavior of non-literal objects anyway.


Limitations

  • Adding Truthy and Falsy directly as variants in Type would make the system unnecessarily complicated. As Carljm pointed out, it’s best to keep their use limited to the Narrowing phase.
  • Sometimes, evaluating (A & Falsy) as (A & ~TruthyLiterals) isn’t even possible. TruthyLiterals is basically an infinite set, Without encountering a proper A, we cannot clearly represent (A & ~TruthyLiterals) as a type enum. For example, something like ~Literal[0] & Falsy = (~Literal[0] & ~TruthyLiterals) is just too tricky to represent as a simple variant in Type.
    More generally, negative-only intersections because (~B & ~TruthyLiterals) = ~(B | TruthyLiterals) essentially requires you to union together all the TruthyLiterals. Thankfully, if we stick to using this only during the Narrowing phase, it won’t be a huge problem since the type we’re narrowing will not be negative-only intersections.

What This PR Includes

  1. Expanded NarrowingConstraints to handle Truthy and Falsy.
  2. Deferred evaluation of Truthy and Falsy constraints until they’re applied to binding_ty.
  3. Added Type::exclude_always_truthy() and Type::exclude_always_falsy() to implement Truthy and Falsy logic.
  4. Supporting Narrowing for if x and if not x

Areas to Review Carefully

  1. My understanding of salsa is limited, so I’d appreciate careful reviews of salsa-related code.
  2. I used DeferredType for lazy evaluation, along with NarrowingUnion and NarrowingIntersection. If you have ideas for a better approach, let’s discuss!
    2 - 1. Another option could be to pass base_type directly when creating NarrowingConstraintsBuilder::new, which would let us evaluate Truthy or Falsy eagerly. This would eliminate the need for DeferredType, Union, and Intersection, but it might significantly hurt performance by changing the cache key from (expression) to (base_type, expression).

Test Plan

  • New Markdown test for truthiness narrowing narrow/truthiness.md
  • unit tests in types.rs and builders.rs (cargo test --package red_knot_python_semantic --lib -- types)

Copy link
Contributor

github-actions bot commented Nov 30, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@AlexWaygood AlexWaygood added red-knot Multi-file analysis & type inference great writeup A wonderful example of a quality contribution labels Nov 30, 2024
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review yet, sorry -- just some simplification opportunites I spotted from skimming through!

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/narrow.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/narrow.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/narrow.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/narrow.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!! This PR, and writeup, is indeed very clarifying.

Submitting a partial review here because something occurred to me mid-review: I think perhaps we can simplify this by going back to Type::Truthy and Type::Falsy types, but with a better definition? That is, define Type::Truthy as what you've called AlwaysTruthy (the set of all objects known to always be truthy, or in other words, the super-set of all types whose bool() method returns Truthiness::AlwaysTrue) and Type::Falsy as AlwaysFalsy (the set of all objects known to always be falsy, the super-set of all types whose bool() method returns Truthiness::AlwaysFalse).

It seems like if we do that, and define the appropriate handling in Type::is_subtype_of (any type whose .bool() evaluates to AlwaysFalsy is a subtype of Falsy, and any type whose .bool() evaluates to AlwaysTruthy is a subtype of Truthy) and Type::is_disjoint_from (Truthy is disjoint from Falsy or any subtype of it; Falsy is disjoint from Truthy or any subtype of it), we can get the same behavior as in this PR, but with many fewer new codepaths. It will just reuse the existing implementation of union and intersection simplification, and we don't need to add all the additional machinery of DeferredType etc.

The unsafety in the previous attempt to define Falsy and Truthy types arose because we defined those types too broadly: we tried to include any object that evaluates as truthy at any time in the Truthy type (e.g. a non-empty list, which could become falsy at any time), and we tried to narrow with positive intersections (that is, if x would narrow by intersecting with Truthy), which results in conclusions that can easily become unsound due to mutation.

But if we define the Truthy and Falsy types in the minimal way (only objects known to always be truthy/falsy), and we intersect negatively (if x: narrows by intersecting with ~Falsy), that's sound. And it allows us to do even a bit more than we can in this PR, because we can safely preserve the intersection with Truthy or Falsy (for instance, if not x: when x is of an ambiguously-truthy type X can be typed as X & ~Truthy.

Perhaps it will be clearer if we actually name these types Type::AlwaysTruthy and Type::AlwaysFalsy.

In most cases not involving truthiness (Type::member, for example), these two types should both be treated as if they were object.

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
@cake-monotone
Copy link
Contributor Author

That is a very clear approach! I realised my understanding of Type was a bit limited. This is really interesting, and it makes sense. If I've understood correctly, we wouldn’t even need Type::exclude_always_truthy(), right? is_subtype_of and is_disjoint_from would work properly within the Union and automatically remove all Truthy or Falsy Literals.

Ok! I'll try implementing it using the approach you suggested.

Terminology Issues

I think the terms Truthy and Falsy are too easy to misunderstand.

Perhaps it would be better to drop the term Truthy altogether and instead use terms like:

  • CanBeTruthy (passes if x)
  • AlwaysTruthy (bool() is always True)
  • RandomTruthiness (bool() is undecidable)

This might help reduce confusion. So, I think it would be better to define Type::AlwaysTruthy instead of Type::Truthy in this context.

I also think the name AmbiguousTruthiness wasn’t the quite right. I think RandomTruthiness would be a better fit.
The word "Ambiguous" is already used in Truthiness::Ambiguous within Type::bool() to represent the union of Truthiness::AlwaysTruthy, Truthiness::AlwaysFalsy, and RandomTruthiness. But actually, RandomTruthiness is the intersection of CanBeTruthy and CanBeFalsy.


A Slightly Unrelated Question

With this approach, it looks like we could have an intersection like A & ~AlwaysTruthy & ~AlwaysFalsy for a type A.
Would it be correct to bind such a Negative Intersection type directly to a symbol?
If a new user encounters this type, they might not immediately understand its meaning.

class A: ...

x = A()

if x and not x:
    y = x
    reveal_type(y)  # revealed: A & ~Falsy & ~Truthy

@cake-monotone
Copy link
Contributor Author

cake-monotone commented Dec 5, 2024

Due to some unexpected events where I live 😢 , PR updates might be delayed by 4–6 days.

@carljm
Copy link
Contributor

carljm commented Dec 5, 2024

Due to some unexpected events where I live 😢 , PR updates might be delayed by 4–6 days.

No problem, please take all the time you need! Really appreciate your excellent contributions. Whatever is going on, I hope you are safe and out of harm's way.

If I've understood correctly, we wouldn’t even need Type::exclude_always_truthy(), right? is_subtype_of and is_disjoint_from would work properly within the Union and automatically remove all Truthy or Falsy Literals.

Yes, that's the idea!

Regarding terminology, I mostly agree with your comments (in particular, that it will be clearer to use Type::AlwaysTruthy and Type::AlwaysFalsy instead of Type::Truthy and Type::Falsy). I don't love the term RandomTruthiness; would probably prefer MutableTruthiness. But I don't think we necessarily will need to use any term for this "set of objects whose truthiness can change over time" -- it's probably sufficient to call it object & ~AlwaysTruthy & ~AlwaysFalsy if we need to refer to it. I also don't think we'll need to use the term CanBeTruthy or CanBeFalsy in the code, though I don't mind those terms.

With this approach, it looks like we could have an intersection like A & ~AlwaysTruthy & ~AlwaysFalsy for a type A.
Would it be correct to bind such a Negative Intersection type directly to a symbol?

It is possible that we'll need to improve our display of some intersection types, if they occur frequently. This includes both negative intersections with AlwaysTruthy and AlwaysFalsy, as well as possibly intersections with Any/Unknown. I'd rather leave this as a separate topic for future improvement in a holistic way. For now let's just focus on getting the core semantics right, and not worry about seeing some complex types in tests.

@cake-monotone cake-monotone force-pushed the red-knot-narrowing-for-if-x branch from 4077d49 to 356c8de Compare December 14, 2024 08:59
@cake-monotone
Copy link
Contributor Author

cake-monotone commented Dec 14, 2024

Thanks for waiting!

I had to follow up on some merged changes, so this took a bit longer than expected. Additionally, besides the mdtest, tests have also been added in builder.rs and types.rs.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent, thank you so much! Really no substantive notes, just suggestions of a few testing additions, and a couple additional operations we can support easily.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.

@carljm carljm merged commit f463fa7 into astral-sh:main Dec 17, 2024
21 checks passed
@cake-monotone cake-monotone deleted the red-knot-narrowing-for-if-x branch December 18, 2024 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
great writeup A wonderful example of a quality contribution red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants