-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] partially migrate to new Diagnostic
types, delete old diagnostic code
#17130
Conversation
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thank you
|
||
fn severity(&self) -> Severity { | ||
self.severity | ||
pub(crate) fn to_diagnostic(&self) -> Diagnostic { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be into_diagnostic
to avoid cloning self
or is it to_diagnostic
because most salsa queries only return refs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Diagnostic
API doesn't really support accepting ownership of things, so you have to clone anyway. For example:
/// Attach a message to this annotation.
///
/// An annotation without a message will still have a presence in
/// rendering. In particular, it will highlight the span association with
/// this annotation in some way.
///
/// When a message is attached to an annotation, then it will be associated
/// with the highlighted span in some way during rendering.
pub fn message<'a>(self, message: impl std::fmt::Display + 'a) -> Annotation {
let message = Some(message.to_string().into_boxed_str());
Annotation { message, ..self }
}
We may want to add alternative APIs to Diagnostic
(and its constituent types) that accept ownership to avoid clones, but this isn't doing any more work AFAIK than the status quo:
ruff/crates/red_knot_python_semantic/src/types/context.rs
Lines 153 to 160 in a15404a
self.diagnostics.borrow_mut().push(TypeCheckDiagnostic { | |
file: self.file, | |
id, | |
message: message.to_string(), | |
range: ranged.range(), | |
severity, | |
secondary_messages, | |
}); |
I know we've been talking about an IntoDiagnostic
trait, but it might actually make more sense for it to be a ToDiagnostic
trait. Or, alternatively, an IntoDiagnostic
trait if we decide to add ownership consuming construction APIs to Diagnostic
. I think I'd prefer that decision to be lead by profiling/benchmarking. (I guess this could matter in cases where Red Knot reports an avalanche of diagnostics. But even then, I'm skeptical that these sorts of copies are going to be anything but marginal compared to whatever other work Red Knot is doing in cases like that. But I'm mostly just guessing.)
let config = ruff_db::diagnostic::DisplayDiagnosticConfig::default(); | ||
for d in &*self.diagnostics.borrow_mut() { | ||
d.print(self.db.upcast(), &config, vec![]).unwrap(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary anymore, now that we removed the "drop bomb" from diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! I thought I caught everything. This was actually scaffolding code I was using to aide the refactor. Nice catch. Removed.
pub(crate) fn push(&mut self, diagnostic: TypeCheckDiagnostic) { | ||
self.diagnostics.push(Arc::new(diagnostic)); | ||
pub(crate) fn push(&mut self, diagnostic: Diagnostic) { | ||
self.diagnostics.push(diagnostic); | ||
} | ||
|
||
pub(super) fn extend(&mut self, other: &TypeCheckDiagnostics) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the solution for this is but it may still be desired to wrap the Diagnostic
s in self.diagnostics
in an Arc
because we now create multiple deep-clones of every diagnostic, which I suspect can be fairly expensive (diagnostics aren't lightweight structs).
The diagnostics in Red Knot are collected as part of type inference and type inference is done at three different levels:
-
Per scope (calls into expressions and definition inference)
-
Per definition (class, function, variable), calls into expression inference
-
Per expression (only some expressions)
-
and 2. both have to "copy over" the diagnostics from the inner inferences operations. That means, a diagnostic created when infering an expression may be copied twice: Once to aggregate the diagnostics at the definition level and once when aggregating the diagnostics at a scope level.
We later perform one last copy (clone) inside check_types
This is why the old implementation used an Arc
to make those clones as cheap as possible. Alternatives are to use salsa::interned
(but that comes with extra bookkeeping but it has the advantage that it's arena allocated) or salsa accumulated (which now avoids cloning internally and only requires cloning at the boundaries)
Unless we say that cloning Diagnostic
s isn't a performance concern relative to the cost of generating and printing them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm through with my review: I see that you made Diagnostic
use an Arc
which addresses this concern. I liked that the "old" new diagnostic was salsa agnositc in the sense that it only used the Arc
where necessary but I think this also makes sense.
It does have the downside that Ruff's diagnostics get Arc
ed where this isn't technically necessary. It will be interesting to see how this impacts performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I like sticking with Arc
for now. I do this sort of thing all the time in my Rust libraries. It generally works very well.
You are of course correct that using Arc
for these sorts of things isn't free. But I try to look at what the marginal cost of the Arc
is. You're only getting a new Arc
when you create a diagnostic, and creating a diagnostic is already likely going to require allocs and formatting machinery and all that stuff.
I will be very happy to eat crow and optimize if we can find use cases for which getting rid of these allocs helps perf. But I'll note that, at some point, you do have to actually create strings somewhere in order to create a diagnostic, even if it had ownership accepting APIs. e.g., Diagnostic::new(blah, blah, format_args!("some message about {foo}"))
has to do a to_string()
on that std::fmt::Arguments
somewhere. And unless we want a lifetime parameter infecting the Diagnostic
types (which I would guess would be unworkable), it probably makes the most sense to do it at construction time.
I think even in alternative designs (e.g., a Diagnostic
trait), there have to be clones and formatting happening somewhere, although precisely where might get shifted around a bit.
crates/red_knot_test/src/lib.rs
Outdated
) -> String { | ||
let display_config = DisplayDiagnosticConfig::default().color(false); | ||
|
||
let mut snapshot = String::new(); | ||
let mut snapshot: Vec<u8> = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you mentioned in one of your reviews that it was a very intentional decision to implement print
over std::io::Write
. I don't remember the details but I noticed that it results in many places where it's necessary to use Vec<u8>
in combination with a String::from_utf8().unwrap
which is somewhat unfortunate.
I don't really have a solution for this, it's just something I noticed (e.g. also in my tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... right... good point. The reason was related to the linear type fakery. Namely, using std::io::Write
as a destination makes it "more likely" that printing is actually going to stdout or similar, and not just some string in memory that could get accidentally ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I have any strong feelings about reverting this back to just a Display
impl or some such (the internal renderer already works that way, so it's an overall small change). One note though is that the unwrap()
calls you're referencing I believe only occur in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards offering a Display
impl. But might do that in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note though is that the unwrap() calls you're referencing I believe only occur in tests.
This was wrong! See: #17130 (comment)
@@ -13,7 +13,7 @@ license.workspace = true | |||
[dependencies] | |||
red_knot_python_semantic = { workspace = true, features = ["serde"] } | |||
red_knot_vendored = { workspace = true } | |||
ruff_db = { workspace = true, features = ["testing"] } | |||
ruff_db = { workspace = true, features = ["os", "testing"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me what changed that the os
feature is now required or is this a feature we already missed declaring as dependency before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was missed before. It came up when I was trying to run tests via -p red_knot_test
I believe. Sorry, I meant to call this out in its own commit, but it slipped through.
.borrow() | ||
.print(workspace.db.upcast(), &config, &mut buf) | ||
.unwrap(); | ||
String::from_utf8(buf).unwrap().into() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also change the return type to Result
without breaking the JS Api if you want to avoid the unwrap
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, and this is a non-test instance of the unwrap()
that I previously claimed was only in tests. Nice.
OK, that convinces me to switch back to a Display
impl, kinda like how the old diagnostics work.
But I'll do that in a follow-up PR if that's okay with you. (Which will fix this too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
/// at time of writing, `ruff_db` depends on `ruff_python_parser` instead of | ||
/// the other way around. And since we want to do this conversion in a couple | ||
/// places, it makes sense to centralize it _somewhere_. So it's here for now. | ||
pub fn create_parse_diagnostic(file: File, err: &ruff_python_parser::ParseError) -> Diagnostic { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A use case for your IntoDiagnostic
trait ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. I don't think this on its own asks for generics. If it weren't for the dependency graph (which we briefly talked about fixing at the on-site), then this could "just" be a concrete method on ParseError
and things would be fine.
I removed this to see how much code was depending internally on the `&[Arc<TypeCheckDiagnostic>]` representation. Thankfully, it was just one place. So I just removed the `Deref` impl in favor of adding an explicit `iter` method. In general, I think using `Deref` for things like this is _somewhat_ of an abuse. The tip-off is if there are `&self` or `&mut self` methods on the type, then it's probably not a good candidate for `Deref`.
I did this mostly because it wasn't buying us much, and I'm trying to simplify the public API of the types I'd like to refactor in order to make the refactor simpler. If we really want something like this, we can re-add it later.
…ages This is temporary to scaffold the refactor. The main idea is that we want to take the `InferContext` API, *as it is*, and migrate that to the new diagnostic data model *internally*. Then we can rip out the old stuff and iterate on the API.
The switch to `Arc` was done because Salsa sometimes requires cloning a `Diagnostic` (or something that contains a `Diagnostic`). And so it probably makes sense to make this cheap. Since `Diagnostic` exposes a mutable API, we adopt "clone on write" semantics. Although, it's more like, "clone on write when the `Arc` has more than one reference." In the common case of creating a `Diagnostic` and then immediately mutating it, no additional copies should be made over the status quo. We also drop the linear type fakery. Its interaction with Salsa is somewhat awkward, and it has been suggested that there will be points where diagnostics will be dropped unceremoniously without an opportunity to tag them as having been ignored. Moreover, this machinery was added out of "good sense" and isn't actually motivated by real world problems with accidentally ignoring diagnostics. So that makes it easier, I think, to just kick this out entirely instead of trying to find a way to make it work.
Now that we don't need to update the `printed` flag, this can just be an immutable borrow. (Arguably this should have been an immutable borrow even initially, but I didn't want to introduce interior mutability without a more compelling justification.)
This change just brings diagnostic rendering into parity with the status quo.
Previously, this was only available in the old renderer. To avoid regressions, we just copy it to the new renderer. We don't bother with DRY because the old renderer will be deleted very soon.
This replaces things like `TypeCheckDiagnostic` with the new Diagnostic` type. This is a "surgical" replacement where we retain the existing API of of diagnostic reporting such that _most_ of Red Knot doesn't need to be changed to support this update. But it will enable us to start using the new diagnostic renderer and to delete the old renderer. It also paves the path for exposing the new `Diagnostic` data model to the broader Red Knot codebase.
We do keep around `OldSecondaryDiagnosticMessage`, since that's part of the Red Knot `InferContext` API. But it's a rather simple type, and we'll be able to delete it entirely once `InferContext` exposes the new `Diagnostic` type directly. Since we aren't consuming `OldSecondaryDiagnosticMessage` any more, we can now accept a slice instead of a vec. (Thanks Clippy.)
I put this in its own commit in case all of the information removed here was controversial. But it *looks* stale to me. At the very least, `TypeCheckDiagnostic` no longer exists, so that would need to be fixed. And it doesn't really make sense to me (at this point) to make `Diagnostic` a Salsa struct, particularly since we are keen on using it in Ruff (at some point).
d248dc4
to
3db884d
Compare
This just adds an extra blank line. I think these tests were written against the new renderer before it was used by Red Knot's `main` function. Once I did that, I saw that it was missing a blank line, and so I added it to match the status quo. But that means these snapshots have become stale. So this commit updates them.
It was already using this approach internally, so this is "just" a matter of rejiggering the public API of `Diagnostic`. We were previously writing directly to a `std::io::Write` since it was thought that this worked better with the linear typing fakery. Namely, it increased confidence that the diagnostic rendering was actually written somewhere useful, instead of just being converted to a string that could potentially get lost. For reasons discussed in #17130, the linear type fakery was removed. And so there is less of a reason to require a `std::io::Write` implementation for diagnostic rendering. Indeed, this would sometimes result in `unwrap()` calls when one wants to convert to a `String`.
It was already using this approach internally, so this is "just" a matter of rejiggering the public API of `Diagnostic`. We were previously writing directly to a `std::io::Write` since it was thought that this worked better with the linear typing fakery. Namely, it increased confidence that the diagnostic rendering was actually written somewhere useful, instead of just being converted to a string that could potentially get lost. For reasons discussed in #17130, the linear type fakery was removed. And so there is less of a reason to require a `std::io::Write` implementation for diagnostic rendering. Indeed, this would sometimes result in `unwrap()` calls when one wants to convert to a `String`.
It was already using this approach internally, so this is "just" a matter of rejiggering the public API of `Diagnostic`. We were previously writing directly to a `std::io::Write` since it was thought that this worked better with the linear typing fakery. Namely, it increased confidence that the diagnostic rendering was actually written somewhere useful, instead of just being converted to a string that could potentially get lost. For reasons discussed in #17130, the linear type fakery was removed. And so there is less of a reason to require a `std::io::Write` implementation for diagnostic rendering. Indeed, this would sometimes result in `unwrap()` calls when one wants to convert to a `String`.
It was already using this approach internally, so this is "just" a matter of rejiggering the public API of `Diagnostic`. We were previously writing directly to a `std::io::Write` since it was thought that this worked better with the linear typing fakery. Namely, it increased confidence that the diagnostic rendering was actually written somewhere useful, instead of just being converted to a string that could potentially get lost. For reasons discussed in #17130, the linear type fakery was removed. And so there is less of a reason to require a `std::io::Write` implementation for diagnostic rendering. Indeed, this would sometimes result in `unwrap()` calls when one wants to convert to a `String`.
This "surgically" replaces Red Knot's use of "old" diagnostic types
with the new
Diagnostic
type. It is surgical because it doesn'tchange how Red Knot type checking actually creates diagnostics.
That is, the
InferContext
APIs to report diagnostics all remainthe same. What this PR does is change out the implementation of the
InferContext
APIs to use the newDiagnostic
types. This laysthe ground work for refactoring
InferContext
to expose the newDiagnostic
data model.(I had originally planned for this refactor to include moving Red Knot
over to the new
Diagnostic
data model as well, but this got ratherbig on its own. So I thought it would be better to split the work up.)
As discussed with @MichaReiser, this PR also removes the "fake" linear
typing for
Diagnostic
. That is, you'll no longer get a panic if youdrop a
Diagnostic
without printing it. It is thought that Salsaspecifically can make this akward to deal with, and there may be cases
in the future where diagnostics get dropped without us having an
opportunity to explicitly ignore them. In any case, while it is perhaps
still a good thing to have, it is not well motivated. Notably, Ruff
doesn't seem to have a problem of accidentally ignoring diagnostics. If
we do end up seeing this become a problem, we can think more carefully
about how to re-introduce the linear typing fakery.
Ref #16809