Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Narrowing For Truthiness Checks (if x or if not x) #14687

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cake-monotone
Copy link
Contributor

This turned out to be a really interesting but much trickier task than I expected. I thought this would be straightforward, but it was not.
While I am unsure if this PR will ultimately be merged, I believe it offers valuable insights.

Before diving in, I recommend reading the comment organized by Carljm (link to the comment).
Most of the content in this document comes from the ideas outlined in that comment.


Notation

We often use the terms "Truthy" and "Falsy," but I think we might interpret them a little differently.
To make sure we’re all on the same page, let’s start by defining these terms. I’ll also include some basic set theory here, but don’t panic—grab a towel, and I’ll keep it as simple as possible.

  • Truthy: Instances that pass if x.
  • Falsy: Instances that pass if not x.

These sets can include a infinite number of elements. But here are two key takeaways:

1. AmbiguousTruthiness Exists

There are instances that can pass both if x and if not x. Mutable objects like list and set are good examples.

We can define AmbiguousTruthiness as:

  • The set of instances that can pass both if x and if not x.

2. int & Falsy != Literal[0, False]

Surprisingly, the intersection of int and Falsy cannot be represented by {0, False}.
Why? Because the int also includes instances of subclasses, and these subclasses can override __bool__. Some of them can even return random results. For example:

class RandomBooleanInt(int):
    def __bool__(self) -> bool:
        import random
        return bool(random.randint(0, 1))

Derived Sets

We can use operations on Truthy, Falsy, and AmbiguousTruthiness to define some useful sets:

  • AlwaysTruthy = Truthy & ~AmbiguousTruthiness
  • AlwaysFalsy = Falsy & ~AmbiguousTruthiness

These definitions reinforce our intuition

  • Truthy = AlwaysTruthy | AmbiguousTruthiness
  • Falsy = AlwaysFalsy | AmbiguousTruthiness

Also:

  • Truthy & Falsy = AmbiguousTruthiness
  • Truthy | Falsy = U (U is the Universal Set in set theory)

And:

  • ~Truthy = AlwaysFalsy
  • ~Falsy = AlwaysTruthy

Key Takeaways

1. Truthy & Falsy Isn’t Empty

The intersection of Truthy and Falsy isn’t an empty set. For example, while if x and not x might look like it should reject everything, it can still pass with types like RandomBooleanInt.
Here’s another example:

x = [1]

if x and (x.clear() is None) and not x:
    ...

2. ~Truthy ≠ Falsy

The complement of Truthy isn’t the same as Falsy.
This is because Truthy includes ambiguous instances, so the complement of Truthy is AlwaysFalsy.

This might feel counterintuitive. It even looks like it conflicts with Python code:

x = list()
if x:
    reveal_type(x)  # revealed: list
else:
    reveal_type(x)  # revealed: list

But actually, there’s no conflict. Python’s not keyword flips the boolean value of an object; it doesn’t mean the set-theoretic complement.

Here’s how it works instead:

  • For any ambiguous x, (not x) is also ambiguous.
  • For any Truthy x, (not x) is Falsy.

If we think of not as applying to every element of a set, it works like this:

A = {1, 2, 3, ...}
not A = {not 1, not 2, not 3, ...}

So:

  • not Truthy = Falsy
  • not Ambiguous = Ambiguous

Just be careful to distinguish between complements and the not operator.


On Implementation

The theory sounds great, but it’s not that easy to implement in practice.

For example, instead of directly extracting int & Falsy, it’s easier to focus on (int & ~AlwaysTruthy)—removing the AlwaysTruthy elements from int. A simpler approach is to use Truthy and Falsy Literals.
To calculate A & Truthy, we can try this:

A & ~FalsyLiterals = A & ~(AlwaysFalsy & Literals)
                    = A & (~AlwaysFalsy | ~Literals)
                    = A & (Truthy | ~Literals)
                    = (A & Truthy) | (A & ~Literals)

This shows that A & Truthy is a subset of (A & ~FalsyLiterals). While (A & ~Literals) might add some extra elements, that’s not a big problem since we can’t fully determine the boolean behavior of non-literal objects anyway.


Limitations

  • Adding Truthy and Falsy directly as variants in Type would make the system unnecessarily complicated. As Carljm pointed out, it’s best to keep their use limited to the Narrowing phase.
  • Sometimes, evaluating (A & Falsy) as (A & ~TruthyLiterals) isn’t even possible. TruthyLiterals is basically an infinite set, Without encountering a proper A, we cannot clearly represent (A & ~TruthyLiterals) as a type enum. For example, something like ~Literal[0] & Falsy = (~Literal[0] & ~TruthyLiterals) is just too tricky to represent as a simple variant in Type.
    More generally, negative-only intersections because (~B & ~TruthyLiterals) = ~(B | TruthyLiterals) essentially requires you to union together all the TruthyLiterals. Thankfully, if we stick to using this only during the Narrowing phase, it won’t be a huge problem since the type we’re narrowing will not be negative-only intersections.

What This PR Includes

  1. Expanded NarrowingConstraints to handle Truthy and Falsy.
  2. Deferred evaluation of Truthy and Falsy constraints until they’re applied to binding_ty.
  3. Added Type::exclude_always_truthy() and Type::exclude_always_falsy() to implement Truthy and Falsy logic.
  4. Supporting Narrowing for if x and if not x

Areas to Review Carefully

  1. My understanding of salsa is limited, so I’d appreciate careful reviews of salsa-related code.
  2. I used DeferredType for lazy evaluation, along with NarrowingUnion and NarrowingIntersection. If you have ideas for a better approach, let’s discuss!
    2 - 1. Another option could be to pass base_type directly when creating NarrowingConstraintsBuilder::new, which would let us evaluate Truthy or Falsy eagerly. This would eliminate the need for DeferredType, Union, and Intersection, but it might significantly hurt performance by changing the cache key from (expression) to (base_type, expression).

Copy link
Contributor

github-actions bot commented Nov 30, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+2 -0 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

python/typeshed (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select E,F,FA,I,PYI,RUF,UP,W

+ stdlib/asyncio/unix_events.pyi:189:29: W291 Trailing whitespace
+ stdlib/asyncio/unix_events.pyi:189:30: W292 No newline at end of file

Changes by rule (2 rules affected)

code total + violation - violation + fix - fix
W291 1 1 0 0 0
W292 1 1 0 0 0

Comment on lines 1779 to 1780
// TODO: tuple literal should be included
// .add(Type::tuple(db, &[]))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure if this is the correct way to represent ().
It also seems like () hasn’t been considered yet in is_disjoint_from.

I think it make sense to implement support for () first. Adding this code as-is makes the mdtest results much harder to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure if this is the correct way to represent ().

I think it is!

It also seems like () hasn’t been considered yet in is_disjoint_from.

That's very possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Ah, my point was that the current implementation without TupleLiteral produces clean results like A | B. However, if we uncomment the TupleLiteral, it results like A & ~Tuple[()] | B & ~Tuple[()], which makes the tests harder to read.

I’ll try to resolve this in a different PR before we uncomment Tuple Literal.

@AlexWaygood AlexWaygood added red-knot Multi-file analysis & type inference great writeup A wonderful example of a quality contribution labels Nov 30, 2024
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review yet, sorry -- just some simplification opportunites I spotted from skimming through!

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/narrow.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/narrow.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/narrow.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/narrow.rs Outdated Show resolved Hide resolved
Comment on lines +1749 to +1757
Type::Union(union) => {
let mut builder = UnionBuilder::new(db);

for element in union.elements(db) {
builder = builder.add(element.exclude_always_truthy(db));
}

builder.build()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Type::Union(union) => {
let mut builder = UnionBuilder::new(db);
for element in union.elements(db) {
builder = builder.add(element.exclude_always_truthy(db));
}
builder.build()
}
Type::Union(union) => union.map(db, |element| element.exclude_always_truthy(db)),

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!! This PR, and writeup, is indeed very clarifying.

Submitting a partial review here because something occurred to me mid-review: I think perhaps we can simplify this by going back to Type::Truthy and Type::Falsy types, but with a better definition? That is, define Type::Truthy as what you've called AlwaysTruthy (the set of all objects known to always be truthy, or in other words, the super-set of all types whose bool() method returns Truthiness::AlwaysTrue) and Type::Falsy as AlwaysFalsy (the set of all objects known to always be falsy, the super-set of all types whose bool() method returns Truthiness::AlwaysFalse).

It seems like if we do that, and define the appropriate handling in Type::is_subtype_of (any type whose .bool() evaluates to AlwaysFalsy is a subtype of Falsy, and any type whose .bool() evaluates to AlwaysTruthy is a subtype of Truthy) and Type::is_disjoint_from (Truthy is disjoint from Falsy or any subtype of it; Falsy is disjoint from Truthy or any subtype of it), we can get the same behavior as in this PR, but with many fewer new codepaths. It will just reuse the existing implementation of union and intersection simplification, and we don't need to add all the additional machinery of DeferredType etc.

The unsafety in the previous attempt to define Falsy and Truthy types arose because we defined those types too broadly: we tried to include any object that evaluates as truthy at any time in the Truthy type (e.g. a non-empty list, which could become falsy at any time), and we tried to narrow with positive intersections (that is, if x would narrow by intersecting with Truthy), which results in conclusions that can easily become unsound due to mutation.

But if we define the Truthy and Falsy types in the minimal way (only objects known to always be truthy/falsy), and we intersect negatively (if x: narrows by intersecting with ~Falsy), that's sound. And it allows us to do even a bit more than we can in this PR, because we can safely preserve the intersection with Truthy or Falsy (for instance, if not x: when x is of an ambiguously-truthy type X can be typed as X & ~Truthy.

Perhaps it will be clearer if we actually name these types Type::AlwaysTruthy and Type::AlwaysFalsy.

In most cases not involving truthiness (Type::member, for example), these two types should both be treated as if they were object.

Comment on lines +76 to +94
def foo() -> list | set | dict:
return list()

x = foo()

if x:
reveal_type(x) # revealed: list | set | dict
else:
reveal_type(x) # revealed: list | set | dict

if x and not x:
reveal_type(x) # revealed: list | set | dict
else:
reveal_type(x) # revealed: list | set | dict

if x or not x:
reveal_type(x) # revealed: list | set | dict
else:
reveal_type(x) # revealed: list | set | dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small preference for just using class A: ..., class B: ... here, so these tests don't need updating when we add generics support.

Comment on lines +110 to +111
We can customize the truthiness of a class using a metaclass. Thus, class literals are generally
included in the category of AmbiguousTruthiness.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could respect a case where a class has a metaclass with a __bool__ method that returns Literal[True] or Literal[False]. Class literal types don't include subclasses, and even with subclasses included (the Type::SubclassOf case), the metaclass of a subclass must be a subclass of the metaclass of the base class, which can't override __bool__ in an incompatible way.

That said, I think this case is likely extremely rare and I don't think it's important that we handle it; I think it's fine to treat all class literals as ambiguously truthy. But it might be worth updating this comment to recognize that we could be more precise in some cases, we just haven't chosen to bother.

Comment on lines +1742 to +1746
debug_assert!(
!intersection.positive(db).is_empty(),
"negative-only-intersections can not use `exclude_always_truthy`"
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems brittle (it's the responsibility of every caller to ensure we never call this method on a negation type?) and unnecessary. The result of calling this method on a negation type is not wrong, it's just imprecise; we don't actually exclude anything.

Suggested change
debug_assert!(
!intersection.positive(db).is_empty(),
"negative-only-intersections can not use `exclude_always_truthy`"
);

// Case 2: Positives are empty
// ~B & Falsy
// = ~(B | AlwaysTruthy)
// ≈ ~(B.include_always_truthy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is confusing because we don't have an include_always_truthy() method, and it's not self-evident what it would do.

@cake-monotone
Copy link
Contributor Author

That is a very clear approach! I realised my understanding of Type was a bit limited. This is really interesting, and it makes sense. If I've understood correctly, we wouldn’t even need Type::exclude_always_truthy(), right? is_subtype_of and is_disjoint_from would work properly within the Union and automatically remove all Truthy or Falsy Literals.

Ok! I'll try implementing it using the approach you suggested.

Terminology Issues

I think the terms Truthy and Falsy are too easy to misunderstand.

Perhaps it would be better to drop the term Truthy altogether and instead use terms like:

  • CanBeTruthy (passes if x)
  • AlwaysTruthy (bool() is always True)
  • RandomTruthiness (bool() is undecidable)

This might help reduce confusion. So, I think it would be better to define Type::AlwaysTruthy instead of Type::Truthy in this context.

I also think the name AmbiguousTruthiness wasn’t the quite right. I think RandomTruthiness would be a better fit.
The word "Ambiguous" is already used in Truthiness::Ambiguous within Type::bool() to represent the union of Truthiness::AlwaysTruthy, Truthiness::AlwaysFalsy, and RandomTruthiness. But actually, RandomTruthiness is the intersection of CanBeTruthy and CanBeFalsy.


A Slightly Unrelated Question

With this approach, it looks like we could have an intersection like A & ~AlwaysTruthy & ~AlwaysFalsy for a type A.
Would it be correct to bind such a Negative Intersection type directly to a symbol?
If a new user encounters this type, they might not immediately understand its meaning.

class A: ...

x = A()

if x and not x:
    y = x
    reveal_type(y)  # revealed: A & ~Falsy & ~Truthy

@cake-monotone
Copy link
Contributor Author

cake-monotone commented Dec 5, 2024

Due to some unexpected events where I live 😢 , PR updates might be delayed by 4–6 days.

@carljm
Copy link
Contributor

carljm commented Dec 5, 2024

Due to some unexpected events where I live 😢 , PR updates might be delayed by 4–6 days.

No problem, please take all the time you need! Really appreciate your excellent contributions. Whatever is going on, I hope you are safe and out of harm's way.

If I've understood correctly, we wouldn’t even need Type::exclude_always_truthy(), right? is_subtype_of and is_disjoint_from would work properly within the Union and automatically remove all Truthy or Falsy Literals.

Yes, that's the idea!

Regarding terminology, I mostly agree with your comments (in particular, that it will be clearer to use Type::AlwaysTruthy and Type::AlwaysFalsy instead of Type::Truthy and Type::Falsy). I don't love the term RandomTruthiness; would probably prefer MutableTruthiness. But I don't think we necessarily will need to use any term for this "set of objects whose truthiness can change over time" -- it's probably sufficient to call it object & ~AlwaysTruthy & ~AlwaysFalsy if we need to refer to it. I also don't think we'll need to use the term CanBeTruthy or CanBeFalsy in the code, though I don't mind those terms.

With this approach, it looks like we could have an intersection like A & ~AlwaysTruthy & ~AlwaysFalsy for a type A.
Would it be correct to bind such a Negative Intersection type directly to a symbol?

It is possible that we'll need to improve our display of some intersection types, if they occur frequently. This includes both negative intersections with AlwaysTruthy and AlwaysFalsy, as well as possibly intersections with Any/Unknown. I'd rather leave this as a separate topic for future improvement in a holistic way. For now let's just focus on getting the core semantics right, and not worry about seeing some complex types in tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
great writeup A wonderful example of a quality contribution red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants