Generalize structured log parsers #12

jamesjwu · 2024-04-02T16:04:26Z

This PR tries to generalize structured log parsers to make it easier to add new analyses for people less familiar with tlparse internals. This also makes it easier for us to have internal only analyses later.

Basic design:

All analyses implement a trait called StructuredLogParser. To implement a new parser, simply implement the trait with:

A function that extracts the Metadata out of the Envelope
A function that, given your specific Metadata and other information, produces an output vec

Afterwards, just add the trait to all_parsers.

I've gone ahead and implemented the simpler ones quickly and will do the rest in a bit. I know it looks like a bunch of boilerplate with lifetimes (needing to track the lifetime of the envelope each parser is associated with, for example), but honestly after this diff new additions are relatively painless.

Test:

Run on a complicated job before and after this PR, then diff the two output directories. See no difference.

jamesjwu · 2024-04-02T16:10:09Z

@aorenste idk why it doesn't let me add as reviewer unless I tag

jamesjwu · 2024-04-02T18:43:21Z

Going to merge this but would still love any comments and nits about the structure/design

ezyang · 2024-04-02T20:10:11Z

src/lib.rs

+            }
+        }
+
+        // TODO: implement this as StructuredLogParser


I think the abstraction you've introduced is good for artifacts, but I think there's a class of other things that we could potentially get from the trace log which don't create a file. For example, imagine if we started emitting traditional traces (start + end events) to the log, and then wanted to visualize these together. These would be strewn across many different trace messages and you wouldn't want to make a file per each one.

I think my primary ask here is to avoid overabstracting, at least for now. I think there's a decent case to be made for generalizing artifacts (but note that artifacts as currently implemented have some funny problems, e.g., when ddp optimize is on, you'll get multiple copies of the same artifact in the same compile id), but I would hesitate to say that we have an abstraction that works for arbitrary analyses you might want to do. (If you really wanted to design it, you'd probably want some sort of state machine per analysis, with enough structure in the input parsing so you can efficiently dispatch to the correct analyses that actually care about a given token without having to O(N) loop through all analyses... and that's not getting into if there's ever shared information that wants to be saved over other analyses. It probably also matters if we want streaming or if we can just assume everything fits in RAM.)

Yeah, it might be that analyses that need to span across multiple envelopes and events, such as this one, won't fit into this model, which is fine. I think the case I want to cover is the most common one where someone wants to log a some type of event that occurs a constant number of times per compilation, that can be rendered per event (i.e. compilation metrics)

Trying to abstract the idea of "log collects global information into a single template or UI" like the one for the stack trie gets to the point of being cumbersome

ezyang · 2024-04-02T20:12:17Z

src/parsers.rs

+        Box::new(SentinelFileParser::new("aot_joint_graph", |e| e.aot_joint_graph.as_ref())),
+        Box::new(SentinelFileParser::new("inductor_post_grad_graph", |e| e.inductor_post_grad_graph.as_ref())),
+        Box::new(DynamoOutputGraphParser),
+        Box::new(DynamoGuardParser { tt }),


I wonder if the individual analyses should be responsible for their own template instances themselves

jamesjwu added 2 commits April 2, 2024 12:04

Generalize parsers using trait and enum

25c2914

Add comments

d6fce48

jamesjwu force-pushed the jjwu_generalize_parsers branch from 705de9e to d6fce48 Compare April 2, 2024 16:08

jamesjwu requested a review from ezyang April 2, 2024 16:09

jamesjwu added 2 commits April 2, 2024 13:54

Rewrite some more parsers in new type

65ec2a9

Error logging

f5cace0

jamesjwu force-pushed the jjwu_generalize_parsers branch from 571c08b to f5cace0 Compare April 2, 2024 18:20

jamesjwu merged commit 2ac593a into main Apr 2, 2024
12 checks passed

ezyang reviewed Apr 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize structured log parsers #12

Generalize structured log parsers #12

jamesjwu commented Apr 2, 2024 •

edited

Loading

jamesjwu commented Apr 2, 2024

jamesjwu commented Apr 2, 2024

ezyang Apr 2, 2024

jamesjwu Apr 3, 2024

ezyang Apr 2, 2024

Generalize structured log parsers #12

Generalize structured log parsers #12

Conversation

jamesjwu commented Apr 2, 2024 • edited Loading

jamesjwu commented Apr 2, 2024

jamesjwu commented Apr 2, 2024

ezyang Apr 2, 2024

Choose a reason for hiding this comment

jamesjwu Apr 3, 2024

Choose a reason for hiding this comment

ezyang Apr 2, 2024

Choose a reason for hiding this comment

jamesjwu commented Apr 2, 2024 •

edited

Loading