-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Statically known branches #14759
base: main
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #14759 will degrade performances by 21.62%Comparing Summary
Benchmarks breakdown
|
This comment was marked as resolved.
This comment was marked as resolved.
caa4c88
to
66524e7
Compare
## Summary `typing_extensions` has a `>=3.13` re-export for the `typing.NoDefault` singleton, but not for `typing._NoDefaultType`. This causes problems as soon as we understand `sys.version_info` branches, so we explicity switch to `typing._NoDefaultType` for Python 3.13 and later. This is a part of #14759 that I thought might make sense to break out and merge in isolation. ## Test Plan New test that will become more meaningful with #12700 --------- Co-authored-by: Micha Reiser <[email protected]>
66524e7
to
fa5d2b5
Compare
4ae7a5b
to
4bd513d
Compare
581f787
to
668fb66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressively comprehensive tests, thank you!!
Per request, I only looked at the mdtests.
Just a few comments, nothing significant.
crates/red_knot_python_semantic/resources/mdtest/annotations/never.md
Outdated
Show resolved
Hide resolved
crates/red_knot_python_semantic/resources/mdtest/known_constants.md
Outdated
Show resolved
Hide resolved
crates/red_knot_python_semantic/resources/mdtest/known_constants.md
Outdated
Show resolved
Hide resolved
crates/red_knot_python_semantic/resources/mdtest/statically-known-branches.md
Outdated
Show resolved
Hide resolved
crates/red_knot_python_semantic/resources/mdtest/statically-known-branches.md
Outdated
Show resolved
Hide resolved
crates/red_knot_python_semantic/resources/mdtest/statically-known-branches.md
Outdated
Show resolved
Hide resolved
5ae6a58
to
4ac3737
Compare
match 0: | ||
case 1: | ||
y = 2 | ||
case _: | ||
y = 3 | ||
|
||
reveal_type(y) # revealed: Literal[2, 3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are "too smart" now and infer Literal[3]
here, as we can see that the first pattern can never match. To keep the original intent of the test, I changed the match target to an unknown int
.
crates/red_knot_python_semantic/resources/mdtest/loops/while_loop.md
Outdated
Show resolved
Hide resolved
|
||
# TODO: ideally, this should be Literal[2] | ||
reveal_type(x) # revealed: Literal[2, 3] | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this is problematic in cases with multiple "always True" patterns. At runtime, we'll take the first of these branches. But in our current implementation here, since we always walk definitions backwards, we'll take the last instead:
x = 1
match "a":
case "a" if 1 == 1:
x = 2
case "a":
x = 3
reveal_type(x) # we infer Literal[3], but it should be Literal[2]
Before I attempt to fix this, it would be good to get some feedback on whether or not we want static-truthiness checking in match
statements at all. I thought it was a nice stretch-task, but maybe it causes more trouble than it's worth. The examples presented here are not particularly useful, but there might be some use cases involving enums? Not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should spend a lot of extra time on implementation work that is specific to statically-known branches with match patterns; it's nice to have, but not clear how often it will be relevant.
I'm curious why this seems to affect match statements with multiple always-matching cases, but not elif
chains with multiple always-true branches? The two would seem to have a lot in common. In both cases the key fact is that an always-true condition in an elif
or case
does not mean "we definitely take this path" (we might have already taken a previous one instead). It only means "we definitely do not take any branch after this one."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why this seems to affect match statements with multiple always-matching cases, but not
elif
chains with multiple always-true branches?
It currently works for elif
chains but not for match patterns because of negative constraints. The way it currently works is that we record all constraints (including negative ones) when a binding is created:
if flag:
x = 1 # conditional on flag
elif True:
x = 2 # conditional on ~flag and True
elif True:
x = 3 # conditional on ~flag, ~True and True
The x = 3
binding is therefore not unconditionally visible, which currently leads us to correctly infer Literal[1, 2]
for this case.
But now that I write it out like this, I also realize that this reveals a problem with my approach. Consider this example:
x = 1
if flag:
x = 2 # conditional on flag
elif True:
x = 3 # conditional on ~flag, True
Here, we currently infer Literal[1, 2, 3]
, while it would be more correct to infer Literal[2, 3]
. The problem is that we have the negative ~flag
condition in the x = 3
binding, which prevents us from detecting it as "unconditionally visible". And if we would detect it as unconditionally visible, we would run into the problem you mentioned. We would block off any previous bindings and incorrectly infer Literal[3]
.
We could clearly do better.
I'm currently thinking that there are two problems:
- It seems to me like the set of constraints that we apply for static truthiness analysis needs to be different from the set of constraints that we apply for narrowing. For narrowing, we need that
~flag
constraint. But for static truthiness analysis, we don't care ifflag
has ambiguous truthiness or not, we always take anelif True
branch. - When iterating the list of bindings in reverse, we should not stop at the first "unconditionally visible" binding, we need to go back to the earliest possible "unconditionally visible" binding.
14b357c
to
58a07ae
Compare
f82d2b6
to
0a0e5e6
Compare
Summary
Rendered version of the test suite including a proper introduction to the topic / motivation: click.
This changeset adds support for precise type-inference and boundness-handling of definitions inside control-flow branches with statically-known conditions, i.e. test-expressions whose truthiness we can unambiguously infer as always false or always true. In code:
and:
One option to implement this would have been to add special handling for a limited set of test-expressions in semantic index-building. We would then analyze expressions like
sys.version_info >= (x, y)
,typing.TYPE_CHECKING
,True
andFalse
without any type inference and consequently close down (or unconditionally open) branches whose truthiness we can analyze in this way. This would simplify the implementation, but is much less general than the approach taken here.Instead, we collect all necessary information during semantic index building, and then re-analyze the control flow during type-checking. The way this works is by recording the list of 'active' branching conditions for each introduced binding. Consider this example:
In semantic-index building, when we reach the
use(x)
statement, we have three live bindings. But when we later analyze the test-conditions in type inference, we may potentially conclude that some of them are not present/visible.One possibility to do so is to detect test-conditions to be always false. For example, if
test3
is statically known to be false, we can infer that thex = 3
branch is never taken. This binding is then not considered in type inference and in boundness-handling. For thex = 2
binding, we need to consider the possibility that eithertest1
ortest2
can be statically false. In summary, if any of the active branching conditions is statically false, the binding is cancelled.Another possibility is that we can infer a test-condition to be always true. This can also hide bindings. For example, if we can infer
test2
to always be true, we know that thex = 1
binding is never visible. This is slightly more complex to handle, as we haven't even seen thetest2
condition when we analyze thex = 1
binding in semantic-index building.The way we solve this in type-inference is by going through all live bindings in reverse order (starting at
x = 3
), stopping whenever we detect a binding that is unconditionally visible. A binding is unconditionally visible if all of its active conditions are statically known to be true. This is slightly complicated by the fact that we have unconditional branching as well. For example, consider:This binding is not unconditionally visible, even if
test1
is always true, as the loop body may never be executed. This is currently handled by adding aBranchingCondition::Unconditional
marker to the active conditions of the binding, which always evaluates to an ambiguous truthiness.Some of this might sound similar to what we achieve with the
may_be_unbound
andmay_be_undeclared
flags. However, the new mechanism does not allow us to get rid of these flags, as there are also cases like this:The symbol
x
is always bound, but seeing thex = 1
orx = 2
definitions with a branching condition oftest: bool
is not enough to see that. We also need the global information that these two branches will be merged (where we logically OR the twomay_be_unbound = false
flags to a globalmay_be_unbound = false
).This branch also includes:
typing.TYPE_CHECKING
while
loopstarget-version
requirements in some Markdown tests which were now required due to the understanding ofsys.version_info
branches.closes #12700
closes #14170
closes #14861
Test plan
statically-known-branches.md
typing.TYPE_CHECKING
inknown_constants.md
while
-loop narrowing innarrow/while.md
EllipsisType
,Never
, etcPerformance investigation
(thanks to @MichaReiser for some input!)
Measurements
I can reproduce the regression locally. It's not as pronounced as on codspeed, which could be attributed to the fact that the codspeed results do not involve any I/O overhead (they are based on CPU counters, not wall time). So the local results might be "diluted" a bit:
tomllib, -13%
./red_knot_main
./red_knot_feature
Black codebase, -7%
./red_knot_main
./red_knot_feature
Analysis (ongoing)
Type::bool
on it (which may involve looking up other symbols). This can entail loading entire new modules. For example: in thetomllib
benchmark, we do not need to resolvesys
on main, but we do on this branch. The reason for this are varioussys.version_info >= (M, m)
conditions inbuiltins.pyi
which now need to be resolved. I tried to account for this by adding aimport sys; sys.version_info
line in a local copy oftomllib
, so the version on main would also have to importsys
. This reduces the difference from 13% to 11%, but the comparison is still not fair. There are a lot more symbol-lookups on this feature branch (mostly__bool__
).To do