Make recording of query cache hits in self-profiler much cheaper #142978

Kobzol · 2025-06-24T18:20:32Z

Self-profile can record various types of things, some of them are not enabled, like query cache hits. Rustc currently records cache hits as "instant" measureme events, which records the thread ID, current timestamp, and constructs an individual event for each such cache hit. This is incredibly expensive, in a small hello world benchmark that just depends on serde, it makes compilation with nightly go from ~3s (with -Zself-profile) to ~15s (with -Zself-profile -Zself-profile-events=default,query-cache-hit).

We'd like to add query cache hits to rustc-perf (rust-lang/rustc-perf#2168), but there we only need the actualy cache hit counts, not the timestamp/thread ID metadata associated with it.

This PR changes the behavior of the query-cache-hit event. Instead of generating individual instant events, it simply aggregates cache hit counts per query invocation (so a combination of a query and its arguments, if I understand it correctly) using an atomic counter. At the end of the compilation session, these counts are then dumped to the self-profile log using integer events (in a similar fashion as how we record artifact sizes). I suppose that we could dedup the query invocations in rustc directly, but I don't think it's really required. In local experiments with the hello world + serde case, the query invocation records generated ~30 KiB more data in the self-profile, which was ~10% increase in this case.

This changes the behavior of an existing event. But it's ofc unstable, and tbh I doubt that anyone uses this flag, when it makes compilation so much slower. I think that it will be more useful when it actually records the most useful subset of the previously gathered data (the actual query cache hit counts) with a fraction of the overhead. An alternative would be to create a new event. I used a different event kind though, so that old analyzeme won't choke on newly generated profiles

With this PR, the overhead of -Zself-profile-events=default,query-cache-hit seems to be miniscule vs just -Zself-profile, so I also enabled query cache hits by default when self profiling is enabled.

We should also modify analyzeme, specifically this, and make it load the integer events with query cache hit counts instead. I can do that as a follow-up, it's not required to be done in sync with this PR, and it doesn't require changes in rustc.

CC @cjgillot

r? @oli-obk

Kobzol · 2025-06-24T18:25:44Z

https://github.com/search?type=code&q=query-cache-hits looks like no one used this anyway.. 😆

Mark-Simulacrum · 2025-06-25T00:49:33Z

compiler/rustc_data_structures/src/profiling.rs

+    /// With this approach, we don't know the individual thread IDs and timestamps
+    /// of cache hits, but it has very little overhead on top of `-Zself-profile`.
+    /// Recording the cache hits as individual events made compilation 3-5x slower.
+    query_hits: RwLock<FxHashMap<QueryInvocationId, AtomicU64>>,


Could you switch this to using a dense map, e.g. IndexVec? QueryInvocationId should be monotonically assigned I think and so this should end up dense.

Are they allocated monotonically in the order of executed queries though? 🤔 We don't know before the start of rustc how many invocations there will be (I assume, since it includes queries combined with the unique argument combinations), so we can't preallocate it. So the only thing we could do is .push() on demand (if the new ID is one larger than the size of the vec), and lookup by index. Is that what you meant?

It doesn't look like they are strictly monotonic:

ID: 2 ID: 2 ID: 4 ID: 1 ID: 0 ID: 4 ID: 7 ID: 8 ID: 1 ID: 9 ID: 10 ID: 11 ID: 12 ID: 13 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1 ID: 1

I guess that it depends on the invocations being cached or not, loaded from disk, etc. I don't think we can count on them actually arriving in order.

That being said, instead of push, I suppose that we could do something like query_hits.resize(new_observed_max_id, 0). Do you want me to do that?

Mark-Simulacrum · 2025-06-25T00:53:34Z

compiler/rustc_data_structures/src/profiling.rs

-                |profiler| profiler.query_cache_hit_event_kind,
-                query_invocation_id,
-            );
+            profiler_ref.profiler.as_ref().unwrap().increment_query_cache_hit(query_invocation_id);
        }

        if unlikely(self.event_filter_mask.contains(EventFilter::QUERY_CACHE_HITS)) {


Any reason to change the existing event rather than adding a new one that only tracks counts?

I think this is losing the information for the query "tree" that was previously present, right? It used to be possible to generate a flamegraph of queries but now since there's no timing/thread information we can't track the parent relationships.

That doesn't seem consistently useful, but it also doesn't seem useless to me...

Well, I figured that it wasn't really used in practice (haven't found anything on GitHub code search), and it was quite expensive. A practical reason to avoid adding a new filter event was to avoid having two mask checks in this very hot function. But the cost of that (with -Zself-profile enabled) is probably still miniscule in comparison to what was happening before, and without self-profiling, we could just ask if QUERY_CACHE_HITS | QUERY_CACHE_HITS_COUNT is enabled, to keep a single check in the fast path, so probably it would be fine.

Happy to add a new filter event though, should be simple enough, and wouldn't break backwards compatibility.

How do you generate such a flamegraph that takes query hits into account, btw?

Make recording of query cache hits in self-profiler much cheaper

6c541c0

rustbot assigned oli-obk Jun 24, 2025

Kobzol mentioned this pull request Jun 24, 2025

Add query cache hits to detailed query table rust-lang/rustc-perf#2168

Draft

Mark-Simulacrum reviewed Jun 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make recording of query cache hits in self-profiler much cheaper #142978

Make recording of query cache hits in self-profiler much cheaper #142978

Kobzol commented Jun 24, 2025

Uh oh!

Kobzol commented Jun 24, 2025

Uh oh!

Mark-Simulacrum Jun 25, 2025

Uh oh!

Kobzol Jun 25, 2025

Uh oh!

Kobzol Jun 25, 2025

Uh oh!

Mark-Simulacrum Jun 25, 2025

Uh oh!

Kobzol Jun 25, 2025

Uh oh!

Uh oh!

Make recording of query cache hits in self-profiler much cheaper #142978

Are you sure you want to change the base?

Make recording of query cache hits in self-profiler much cheaper #142978

Conversation

Kobzol commented Jun 24, 2025

Uh oh!

Kobzol commented Jun 24, 2025

Uh oh!

Mark-Simulacrum Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Kobzol Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Kobzol Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Mark-Simulacrum Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Kobzol Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!