[red-knot] Introduce `TypeBound` for property tests #16862

cake-monotone · 2025-03-20T06:57:07Z

Summary

This feature was suggested by @AlexWaygood.

Problem with the Previous Approach

In property tests with a premise, the test could not perform valid checks until it randomly generated a type that satisfied the premise. This led to unnecessary computational waste.

This was particularly problematic in tests with three arguments, significantly reducing the efficiency of generating valid test cases.

For example:

type_property_test!(
    union_equivalence_not_order_dependent, db,
    forall types s, t, u.
        s.is_fully_static(db) && t.is_fully_static(db) && u.is_fully_static(db) => <property>
);

Improvements

The type_property_test macro now explicitly requires both the identifiers and their corresponding ArbitraryRule.

type_property_test!(
    intersection_equivalence_not_order_dependent, db,
    forall (s: FullyStaticTy, t: FullyStaticTy, u: FullyStaticTy).
        <property>
);

Now, s, t, and u—which are declared as FullyStaticTy—will only be generated from types that satisfy ty.is_fully_static().

The rules AnyTy, FullyStaticTy, and SingletonTy each maintain their own random pools for generating Ty instances.
For more details, please refer to the changes in arbitrary.rs.

Test Plan

A new ensure_rules module has been added to verify that the types generated under each rule satisfy the expected conditions.

You can run the following command to validate the implementation:

cargo test -p red_knot_python_semantic -- --ignored ensure_rules

github-actions · 2025-03-20T07:02:59Z

`mypy_primer` results

No ecosystem changes detected ✅

carljm

Thank you for the PR!

I see a significant impact on an individual test using SingletonTy (~75% improvement), a smaller impact on tests using FullyStaticTy (~15% improvement), and a fairly small impact on overall runtime of the stable tests (~4% improvement).

This perf improvement is nice, but perhaps not as large overall as I might have hoped for from this change, I think mostly because most of the tests still use AnyTy anyway.

Personally I am not sure this perf improvement is worth the coverage drawbacks I discuss in my inline comment. Would like to hear other opinions.

carljm · 2025-03-20T22:13:25Z

crates/red_knot_python_semantic/src/types/property_tests/arbitrary.rs

+            Ty::Never,
+            Ty::BuiltinsFunction("chr"),
+            Ty::BuiltinsFunction("ascii"),
+            Ty::BuiltinsBoundMethod {
+                class: "str",
+                method: "isascii",
+            },
+            Ty::BuiltinsBoundMethod {
+                class: "int",
+                method: "bit_length",
+            },
+            int_lit,
+            bool_lit,
+            Ty::StringLiteral(""),
+            Ty::StringLiteral("a"),
+            Ty::LiteralString,
+            Ty::BytesLiteral(""),
+            Ty::BytesLiteral("\x00"),
+            Ty::KnownClassInstance(KnownClass::Object),
+            Ty::KnownClassInstance(KnownClass::Str),
+            Ty::KnownClassInstance(KnownClass::Int),
+            Ty::AlwaysFalsy,
+            Ty::AlwaysTruthy,


It looks to me like there are some fully-static types in the above list that are not included here, silently reducing our type coverage for tests using FullyStaticTy. (To pick just one example, BuiltinClassLiteral('str') is both fully-static and a singleton, but is not included either here or below.)

I think this is a significant drawback of this PR. Rather than having a single pool of candidate types, with the actual implemention of is_fully_static and is_singleton deciding which types fall into which categories, we now have to manually categorize the candidate types. And while the new ensure_rules tests help validate that we don't include any type in a category to which it doesn't belong, there's nothing helping us validate that we aren't silently losing coverage by omitting types wrongly.

If we do decide to go ahead with this PR, I would at least prefer if we could not repeat ourselves. That is, have a single list of non-fully-static types, a single list of static types, and a single list of singleton types. FullyStaticTy would combine the latter two lists, AnyTy would combine all three, SingletonTy would use only the latter list.

First of all, I think I should explain that my initial intuition was off. I had assumed that using is_fully_static would make it very unlikely for valid tests—but it turns out that’s not really the case.

From my simulation, the probability that all three variables of AnyTy satisfy the is_fully_static premise is around 65%. That means, out of 100 test runs, about 65 actually test the property, which is higher than I expected.

The key point I was trying to focus on wasn’t performance—it was to remove meaningless tests. If the probability of satisfying a property’s premise approaches zero, the test becomes meaningless. This PR was based on that assumption, so the exact ratio of AnyTy to FullyStaticTy wasn’t my main concern.
But as you've pointed out, FullyStaticTy doesn’t seem to be as effective as I initially thought.

In conclusion, I think it's totally fine to close this PR.

Here are a few more stats from the simulation, which might be helpful for future property test reviews:

Probability that t.is_singleton() holds: ~9%

Probability that !t.is_fully_static() holds: ~10%

Probability that s.is_equivalent_to(db, t) && t.is_equivalent_to(db, u) holds: ~0.1%

Probability that s.is_gradual_equivalent_to(db, t) holds: ~1%

These probabilities will likely decrease as we add more types to the random pool. But for now, it seems we can compensate by increasing the number of tests. If we move forward with similar PRs in the future, I think we should aim to eliminate premises with very low probability, not fully static.

Aha, I hadn't fully processed the fact that if we have a premise of low probability, that doesn't just impact performance of the test, it also impacts usefulness of the test; a premise of 10% probability means we need 10x the QUICKCHECK_TESTS value to get the same number of actual tests of the implication. I agree this is a stronger motivation for the PR, and could be worth the risk of missing coverage of some type that we fail to include in the right list.

I would consider going ahead with this PR if we implement the above suggestion to avoid repetition in the type lists (so each type is listed exactly once, making it easier to evaluate whether each type is in the right place). We could also consider eliminating FullyStaticTy and just separating SingletonTy, but I kind of think if we go down this route we may as well do it fully.

But I would also like to hear @sharkdp 's opinion, so maybe no need to do any updates here until he is back.

This PR looks great! It's something I've been meaning to implement as well. I kind of liked the generic and simple premise => property form, but with the syntactic change to the macro, this reads even better, I think.

with the actual implemention of is_fully_static and is_singleton deciding which types fall into which categories, we now have to manually categorize the candidate types.

Maybe we can maintain that property? We could have a single pool of types and then categorize them into (static, LazyLock) specialized lists for usage in those dedicated Arbitrary implementations?

Sounds good! I’ll go ahead with the PR.

As sharkdp suggested, I think keeping a single pool and letting the Arbitrary implementations handle the categorization makes sense — it should make it easier to support more premises in the future.

sharkdp

Thank you for working on this!

crates/red_knot_python_semantic/src/types/property_tests.rs

crates/red_knot_python_semantic/src/types/property_tests/arbitrary.rs

cake-monotone · 2025-03-26T05:25:28Z

crates/red_knot_python_semantic/src/types/property_tests/type_bound.rs

+    fn generate_type_recursively(g: &mut Gen, size: u32) -> Ty {
+        if size == 0 {
+            Self::generate_singular_type(g)
+        } else {
+            match u32::arbitrary(g) % 4 {
+                0 => Self::generate_singular_type(g),
+                1 => Ty::Union(
+                    (0..*g.choose(&[2, 3]).unwrap())
+                        .map(|_| Self::generate_type_recursively(g, size - 1))
+                        .collect(),
+                ),
+                2 => Ty::Tuple(
+                    (0..*g.choose(&[0, 1, 2]).unwrap())
+                        .map(|_| Self::generate_type_recursively(g, size - 1))
+                        .collect(),
+                ),
+                3 => Ty::Intersection {
+                    pos: (0..*g.choose(&[0, 1, 2]).unwrap())
+                        .map(|_| Self::generate_type_recursively(g, size - 1))
+                        .collect(),
+                    neg: (0..*g.choose(&[0, 1, 2]).unwrap())
+                        .map(|_| Self::generate_type_recursively(g, size - 1))
+                        .collect(),
+                },
+                _ => unreachable!(),


The types returned by generate_singular_type are guaranteed to satisfy a predicate (e.g., is_fully_static).
However, the result of generate_type_recursively may not necessarily satisfy it.
One option is to keep generating random types until the condition is met, but I'm not sure if that's a good approach.

For is_fully_static specifically, that should not be a problem, right? Unions, intersections, and tuples of fully static types are still fully static. And for singleton types, you special-cased generate_type_recursively. So I'm assuming you are worried about future extensions?

Maybe it would make sense to add something like a check_property :: Ty -> bool function to this trait, which we could then use as a post-condition assertion for the return type of generate_type_recursively in BoundedType<Bound>::arbitrary before feeding a type to the actual property test?

You're right — it's not a problem for FullyStaticTy and SingletonTy. The issue came up while I was working on NonFullyStaticTy, where an empty tuple was being generated unexpectedly. I'll apply your suggestion and commit it for now. Thanks!

The issue came up while I was working on NonFullyStaticTy, where an empty tuple was being generated unexpectedly.

I see. Is this why you implemented it as a re-try mechanism now instead of a hard assertion? I wonder if we could turn this into a hard assertion if we would specialize the tuple generation for this specific type? To avoid having to overload generate_type_recursively completely, we could potentially split it up into four functions (five soon, when we add Callables), and then only specialize the tuple-generation part?

I wonder if we also need to take care of this for shrinking? Could we currently accidentally shrink a non-fully static counterexample type to an empty tuple (which would not fulfill the precondition of the test anymore)? And then erroneously report the empty tuple as a counterexample to a test that requires non-fully static types as inputs?

I wonder if we could turn this into a hard assertion if we would specialize the tuple generation for this specific type?

Ah, I see — that does sound like a better approach. I’ll give it a try!

Could we currently accidentally shrink a non-fully static counterexample type to an empty tuple (which would not fulfill the precondition of the test anymore)?

While it's pretty clear that this edge case won’t happen at the moment, I think it’s still something that could totally happen down the line, depending on how the type system evolves. So I think it’s worth ensuring we guard against it now.
In conclusion I think just filtering in the Arbitrary shrink, like in this PR, should be sufficient

Hmm... it's not quite as simple as I thought.

For NonFullyStaticTy, there's another counterexample: Intersection.

Ty::Intersection uses IntersectionBuilder to generate a Type, so the result isn't always an actual intersection type. For example:

object instance from Ty::Intersection(pos: [], neg: [])

Type::Never from cases like:

Ty::Intersection { pos: vec![ Ty::Tuple(vec![Ty::SubclassOfAny]), Ty::Tuple(vec![Ty::Unknown, Ty::Any]), ], neg: vec![], }

...and many more combinations.

So it’s hard to determine whether the result of generate_type_recursively satisfies the precondition, and overloading it probably won’t help here.

To be really honest, I'm starting to think if we should just leave everything as is? This is a significant increase in code complexity and we have identified a few pitfalls that might turn out to cause problems in the future. What is your impression, @cake-monotone?

I totally agree — I had the same thought when I realized this structure keeps introducing more complexity.
Maybe the main takeaway from this PR is that we need to be more mindful of the premises when creating or reviewing property tests.

Going back to the original goal; the main purpose of this PR was to improve the valid-testing rate of property test, right? So let's simply double the QUICKCHECK_TEST, since we already got a 75% performance boost from the earlier PR.

Let's close this PR for now!

Ok — Thank you very much for performing this experiment!

crates/red_knot_python_semantic/src/types/property_tests/type_bound.rs

…trary-rule-for-property-tests

cake-monotone added 3 commits March 20, 2025 15:15

introduce ArbitraryRule for property tests

3da46f0

append ensure_rules module

3fdcae9

fix missed fullystatic premise

9da2be5

cake-monotone requested review from carljm, AlexWaygood, sharkdp and dcreager as code owners March 20, 2025 06:57

cake-monotone changed the title ~~Introduce ArbitraryRule for property tests~~ [red-knot] Introduce ArbitraryRule for property tests Mar 20, 2025

MichaReiser added the red-knot Multi-file analysis & type inference label Mar 20, 2025

carljm reviewed Mar 20, 2025

View reviewed changes

sharkdp reviewed Mar 24, 2025

View reviewed changes

refactor: derive type pool from predicate instead of manual list

40bb5e2

cake-monotone commented Mar 26, 2025

View reviewed changes

cake-monotone requested a review from sharkdp March 26, 2025 05:29

sharkdp reviewed Mar 26, 2025

View reviewed changes

crates/red_knot_python_semantic/src/types/property_tests/type_bound.rs Outdated Show resolved Hide resolved

AlexWaygood removed their request for review March 26, 2025 13:27

cake-monotone added 2 commits March 27, 2025 16:26

add property_checks to TypeBound

1c0d359

Merge remote-tracking branch 'upstream/main' into cake/introduce-arbi…

7488b13

…trary-rule-for-property-tests

cake-monotone changed the title ~~[red-knot] Introduce ArbitraryRule for property tests~~ [red-knot] Introduce TypeBound for property tests Mar 27, 2025

cake-monotone requested a review from sharkdp March 27, 2025 07:31

sharkdp closed this Mar 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] Introduce `TypeBound` for property tests #16862

[red-knot] Introduce `TypeBound` for property tests #16862

cake-monotone commented Mar 20, 2025 •

edited

Loading

github-actions bot commented Mar 20, 2025 •

edited

Loading

carljm left a comment •

edited

Loading

carljm Mar 20, 2025 •

edited

Loading

cake-monotone Mar 21, 2025 •

edited

Loading

carljm Mar 21, 2025

sharkdp Mar 24, 2025

cake-monotone Mar 25, 2025

sharkdp left a comment

cake-monotone Mar 26, 2025

sharkdp Mar 26, 2025

cake-monotone Mar 27, 2025

sharkdp Mar 27, 2025

cake-monotone Mar 28, 2025 •

edited

Loading

cake-monotone Mar 28, 2025 •

edited

Loading

sharkdp Mar 28, 2025

cake-monotone Mar 29, 2025

sharkdp Mar 29, 2025

[red-knot] Introduce TypeBound for property tests #16862

[red-knot] Introduce TypeBound for property tests #16862

Conversation

cake-monotone commented Mar 20, 2025 • edited Loading

Summary

Problem with the Previous Approach

Improvements

Test Plan

github-actions bot commented Mar 20, 2025 • edited Loading

mypy_primer results

carljm left a comment • edited Loading

Choose a reason for hiding this comment

carljm Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

cake-monotone Mar 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sharkdp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cake-monotone Mar 28, 2025 • edited Loading

Choose a reason for hiding this comment

cake-monotone Mar 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[red-knot] Introduce `TypeBound` for property tests #16862

[red-knot] Introduce `TypeBound` for property tests #16862

cake-monotone commented Mar 20, 2025 •

edited

Loading

github-actions bot commented Mar 20, 2025 •

edited

Loading

`mypy_primer` results

carljm left a comment •

edited

Loading

carljm Mar 20, 2025 •

edited

Loading

cake-monotone Mar 21, 2025 •

edited

Loading

cake-monotone Mar 28, 2025 •

edited

Loading

cake-monotone Mar 28, 2025 •

edited

Loading