Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] partially migrate to new Diagnostic types, delete old diagnostic code #17130

Merged
merged 11 commits into from
Apr 2, 2025

Conversation

BurntSushi
Copy link
Member

This "surgically" replaces Red Knot's use of "old" diagnostic types
with the new Diagnostic type. It is surgical because it doesn't
change how Red Knot type checking actually creates diagnostics.
That is, the InferContext APIs to report diagnostics all remain
the same. What this PR does is change out the implementation of the
InferContext APIs to use the new Diagnostic types. This lays
the ground work for refactoring InferContext to expose the new
Diagnostic data model.

(I had originally planned for this refactor to include moving Red Knot
over to the new Diagnostic data model as well, but this got rather
big on its own. So I thought it would be better to split the work up.)

As discussed with @MichaReiser, this PR also removes the "fake" linear
typing for Diagnostic. That is, you'll no longer get a panic if you
drop a Diagnostic without printing it. It is thought that Salsa
specifically can make this akward to deal with, and there may be cases
in the future where diagnostics get dropped without us having an
opportunity to explicitly ignore them. In any case, while it is perhaps
still a good thing to have, it is not well motivated. Notably, Ruff
doesn't seem to have a problem of accidentally ignoring diagnostics. If
we do end up seeing this become a problem, we can think more carefully
about how to re-introduce the linear typing fakery.

Ref #16809

@BurntSushi BurntSushi added red-knot Multi-file analysis & type inference diagnostics Related to reporting of diagnostics. labels Apr 1, 2025
Copy link
Contributor

github-actions bot commented Apr 1, 2025

mypy_primer results

No ecosystem changes detected ✅

Copy link
Contributor

github-actions bot commented Apr 1, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you


fn severity(&self) -> Severity {
self.severity
pub(crate) fn to_diagnostic(&self) -> Diagnostic {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be into_diagnostic to avoid cloning self or is it to_diagnostic because most salsa queries only return refs?

Copy link
Member Author

@BurntSushi BurntSushi Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Diagnostic API doesn't really support accepting ownership of things, so you have to clone anyway. For example:

    /// Attach a message to this annotation.
    ///
    /// An annotation without a message will still have a presence in
    /// rendering. In particular, it will highlight the span association with
    /// this annotation in some way.
    ///
    /// When a message is attached to an annotation, then it will be associated
    /// with the highlighted span in some way during rendering.
    pub fn message<'a>(self, message: impl std::fmt::Display + 'a) -> Annotation {
        let message = Some(message.to_string().into_boxed_str());
        Annotation { message, ..self }
    }

We may want to add alternative APIs to Diagnostic (and its constituent types) that accept ownership to avoid clones, but this isn't doing any more work AFAIK than the status quo:

self.diagnostics.borrow_mut().push(TypeCheckDiagnostic {
file: self.file,
id,
message: message.to_string(),
range: ranged.range(),
severity,
secondary_messages,
});

I know we've been talking about an IntoDiagnostic trait, but it might actually make more sense for it to be a ToDiagnostic trait. Or, alternatively, an IntoDiagnostic trait if we decide to add ownership consuming construction APIs to Diagnostic. I think I'd prefer that decision to be lead by profiling/benchmarking. (I guess this could matter in cases where Red Knot reports an avalanche of diagnostics. But even then, I'm skeptical that these sorts of copies are going to be anything but marginal compared to whatever other work Red Knot is doing in cases like that. But I'm mostly just guessing.)

Comment on lines 204 to 208
let config = ruff_db::diagnostic::DisplayDiagnosticConfig::default();
for d in &*self.diagnostics.borrow_mut() {
d.print(self.db.upcast(), &config, vec![]).unwrap();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary anymore, now that we removed the "drop bomb" from diagnostics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! I thought I caught everything. This was actually scaffolding code I was using to aide the refactor. Nice catch. Removed.

pub(crate) fn push(&mut self, diagnostic: TypeCheckDiagnostic) {
self.diagnostics.push(Arc::new(diagnostic));
pub(crate) fn push(&mut self, diagnostic: Diagnostic) {
self.diagnostics.push(diagnostic);
}

pub(super) fn extend(&mut self, other: &TypeCheckDiagnostics) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the solution for this is but it may still be desired to wrap the Diagnostics in self.diagnostics in an Arc because we now create multiple deep-clones of every diagnostic, which I suspect can be fairly expensive (diagnostics aren't lightweight structs).

The diagnostics in Red Knot are collected as part of type inference and type inference is done at three different levels:

  1. Per scope (calls into expressions and definition inference)

  2. Per definition (class, function, variable), calls into expression inference

  3. Per expression (only some expressions)

  4. and 2. both have to "copy over" the diagnostics from the inner inferences operations. That means, a diagnostic created when infering an expression may be copied twice: Once to aggregate the diagnostics at the definition level and once when aggregating the diagnostics at a scope level.

We later perform one last copy (clone) inside check_types

This is why the old implementation used an Arc to make those clones as cheap as possible. Alternatives are to use salsa::interned (but that comes with extra bookkeeping but it has the advantage that it's arena allocated) or salsa accumulated (which now avoids cloning internally and only requires cloning at the boundaries)

Unless we say that cloning Diagnostics isn't a performance concern relative to the cost of generating and printing them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm through with my review: I see that you made Diagnostic use an Arc which addresses this concern. I liked that the "old" new diagnostic was salsa agnositc in the sense that it only used the Arc where necessary but I think this also makes sense.

It does have the downside that Ruff's diagnostics get Arced where this isn't technically necessary. It will be interesting to see how this impacts performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like sticking with Arc for now. I do this sort of thing all the time in my Rust libraries. It generally works very well.

You are of course correct that using Arc for these sorts of things isn't free. But I try to look at what the marginal cost of the Arc is. You're only getting a new Arc when you create a diagnostic, and creating a diagnostic is already likely going to require allocs and formatting machinery and all that stuff.

I will be very happy to eat crow and optimize if we can find use cases for which getting rid of these allocs helps perf. But I'll note that, at some point, you do have to actually create strings somewhere in order to create a diagnostic, even if it had ownership accepting APIs. e.g., Diagnostic::new(blah, blah, format_args!("some message about {foo}")) has to do a to_string() on that std::fmt::Arguments somewhere. And unless we want a lifetime parameter infecting the Diagnostic types (which I would guess would be unworkable), it probably makes the most sense to do it at construction time.

I think even in alternative designs (e.g., a Diagnostic trait), there have to be clones and formatting happening somewhere, although precisely where might get shifted around a bit.

) -> String {
let display_config = DisplayDiagnosticConfig::default().color(false);

let mut snapshot = String::new();
let mut snapshot: Vec<u8> = Vec::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you mentioned in one of your reviews that it was a very intentional decision to implement print over std::io::Write. I don't remember the details but I noticed that it results in many places where it's necessary to use Vec<u8> in combination with a String::from_utf8().unwrap which is somewhat unfortunate.

I don't really have a solution for this, it's just something I noticed (e.g. also in my tests).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... right... good point. The reason was related to the linear type fakery. Namely, using std::io::Write as a destination makes it "more likely" that printing is actually going to stdout or similar, and not just some string in memory that could get accidentally ignored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I have any strong feelings about reverting this back to just a Display impl or some such (the internal renderer already works that way, so it's an overall small change). One note though is that the unwrap() calls you're referencing I believe only occur in tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards offering a Display impl. But might do that in a follow-up PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note though is that the unwrap() calls you're referencing I believe only occur in tests.

This was wrong! See: #17130 (comment)

@@ -13,7 +13,7 @@ license.workspace = true
[dependencies]
red_knot_python_semantic = { workspace = true, features = ["serde"] }
red_knot_vendored = { workspace = true }
ruff_db = { workspace = true, features = ["testing"] }
ruff_db = { workspace = true, features = ["os", "testing"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what changed that the os feature is now required or is this a feature we already missed declaring as dependency before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was missed before. It came up when I was trying to run tests via -p red_knot_test I believe. Sorry, I meant to call this out in its own commit, but it slipped through.

.borrow()
.print(workspace.db.upcast(), &config, &mut buf)
.unwrap();
String::from_utf8(buf).unwrap().into()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also change the return type to Result without breaking the JS Api if you want to avoid the unwrap here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and this is a non-test instance of the unwrap() that I previously claimed was only in tests. Nice.

OK, that convinces me to switch back to a Display impl, kinda like how the old diagnostics work.

But I'll do that in a follow-up PR if that's okay with you. (Which will fix this too.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

/// at time of writing, `ruff_db` depends on `ruff_python_parser` instead of
/// the other way around. And since we want to do this conversion in a couple
/// places, it makes sense to centralize it _somewhere_. So it's here for now.
pub fn create_parse_diagnostic(file: File, err: &ruff_python_parser::ParseError) -> Diagnostic {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A use case for your IntoDiagnostic trait ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. I don't think this on its own asks for generics. If it weren't for the dependency graph (which we briefly talked about fixing at the on-site), then this could "just" be a concrete method on ParseError and things would be fine.

@AlexWaygood AlexWaygood removed their request for review April 2, 2025 13:13
I removed this to see how much code was depending internally on the
`&[Arc<TypeCheckDiagnostic>]` representation. Thankfully, it was just
one place. So I just removed the `Deref` impl in favor of adding an
explicit `iter` method.

In general, I think using `Deref` for things like this is _somewhat_ of
an abuse. The tip-off is if there are `&self` or `&mut self` methods on
the type, then it's probably not a good candidate for `Deref`.
I did this mostly because it wasn't buying us much, and I'm
trying to simplify the public API of the types I'd like to
refactor in order to make the refactor simpler.

If we really want something like this, we can re-add it
later.
…ages

This is temporary to scaffold the refactor.

The main idea is that we want to take the `InferContext` API,
*as it is*, and migrate that to the new diagnostic data model
*internally*. Then we can rip out the old stuff and iterate
on the API.
The switch to `Arc` was done because Salsa sometimes requires cloning a
`Diagnostic` (or something that contains a `Diagnostic`). And so it
probably makes sense to make this cheap.

Since `Diagnostic` exposes a mutable API, we adopt "clone on write"
semantics. Although, it's more like, "clone on write when the `Arc` has
more than one reference." In the common case of creating a `Diagnostic`
and then immediately mutating it, no additional copies should be made
over the status quo.

We also drop the linear type fakery. Its interaction with Salsa is
somewhat awkward, and it has been suggested that there will be points
where diagnostics will be dropped unceremoniously without an opportunity
to tag them as having been ignored. Moreover, this machinery was added
out of "good sense" and isn't actually motivated by real world problems
with accidentally ignoring diagnostics. So that makes it easier, I
think, to just kick this out entirely instead of trying to find a way to
make it work.
Now that we don't need to update the `printed` flag, this can just be an
immutable borrow.

(Arguably this should have been an immutable borrow even initially, but
I didn't want to introduce interior mutability without a more compelling
justification.)
This change just brings diagnostic rendering into parity
with the status quo.
Previously, this was only available in the old renderer.
To avoid regressions, we just copy it to the new renderer.
We don't bother with DRY because the old renderer will be
deleted very soon.
This replaces things like `TypeCheckDiagnostic` with the new Diagnostic`
type.

This is a "surgical" replacement where we retain the existing API of
of diagnostic reporting such that _most_ of Red Knot doesn't need to be
changed to support this update. But it will enable us to start using the
new diagnostic renderer and to delete the old renderer. It also paves
the path for exposing the new `Diagnostic` data model to the broader Red
Knot codebase.
We do keep around `OldSecondaryDiagnosticMessage`, since that's part of
the Red Knot `InferContext` API. But it's a rather simple type, and
we'll be able to delete it entirely once `InferContext` exposes the new
`Diagnostic` type directly.

Since we aren't consuming `OldSecondaryDiagnosticMessage` any more, we
can now accept a slice instead of a vec. (Thanks Clippy.)
I put this in its own commit in case all of the information removed here
was controversial. But it *looks* stale to me. At the very least,
`TypeCheckDiagnostic` no longer exists, so that would need to be fixed.
And it doesn't really make sense to me (at this point) to make
`Diagnostic` a Salsa struct, particularly since we are keen on using it
in Ruff (at some point).
@BurntSushi BurntSushi force-pushed the ag/red-knot-new-diagnostics branch from d248dc4 to 3db884d Compare April 2, 2025 13:44
This just adds an extra blank line. I think these tests were written
against the new renderer before it was used by Red Knot's `main`
function. Once I did that, I saw that it was missing a blank line, and
so I added it to match the status quo. But that means these snapshots
have become stale. So this commit updates them.
@BurntSushi BurntSushi merged commit 28c7e72 into main Apr 2, 2025
23 checks passed
@BurntSushi BurntSushi deleted the ag/red-knot-new-diagnostics branch April 2, 2025 14:10
BurntSushi added a commit that referenced this pull request Apr 2, 2025
It was already using this approach internally, so this is "just" a
matter of rejiggering the public API of `Diagnostic`.

We were previously writing directly to a `std::io::Write` since it was
thought that this worked better with the linear typing fakery. Namely,
it increased confidence that the diagnostic rendering was actually
written somewhere useful, instead of just being converted to a string
that could potentially get lost.

For reasons discussed in #17130, the linear type fakery was removed.
And so there is less of a reason to require a `std::io::Write`
implementation for diagnostic rendering. Indeed, this would sometimes
result in `unwrap()` calls when one wants to convert to a `String`.
BurntSushi added a commit that referenced this pull request Apr 2, 2025
It was already using this approach internally, so this is "just" a
matter of rejiggering the public API of `Diagnostic`.

We were previously writing directly to a `std::io::Write` since it was
thought that this worked better with the linear typing fakery. Namely,
it increased confidence that the diagnostic rendering was actually
written somewhere useful, instead of just being converted to a string
that could potentially get lost.

For reasons discussed in #17130, the linear type fakery was removed.
And so there is less of a reason to require a `std::io::Write`
implementation for diagnostic rendering. Indeed, this would sometimes
result in `unwrap()` calls when one wants to convert to a `String`.
BurntSushi added a commit that referenced this pull request Apr 2, 2025
It was already using this approach internally, so this is "just" a
matter of rejiggering the public API of `Diagnostic`.

We were previously writing directly to a `std::io::Write` since it was
thought that this worked better with the linear typing fakery. Namely,
it increased confidence that the diagnostic rendering was actually
written somewhere useful, instead of just being converted to a string
that could potentially get lost.

For reasons discussed in #17130, the linear type fakery was removed.
And so there is less of a reason to require a `std::io::Write`
implementation for diagnostic rendering. Indeed, this would sometimes
result in `unwrap()` calls when one wants to convert to a `String`.
BurntSushi added a commit that referenced this pull request Apr 2, 2025
It was already using this approach internally, so this is "just" a
matter of rejiggering the public API of `Diagnostic`.

We were previously writing directly to a `std::io::Write` since it was
thought that this worked better with the linear typing fakery. Namely,
it increased confidence that the diagnostic rendering was actually
written somewhere useful, instead of just being converted to a string
that could potentially get lost.

For reasons discussed in #17130, the linear type fakery was removed.
And so there is less of a reason to require a `std::io::Write`
implementation for diagnostic rendering. Indeed, this would sometimes
result in `unwrap()` calls when one wants to convert to a `String`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
diagnostics Related to reporting of diagnostics. red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants