-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize structured log parsers #12
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,208 @@ | ||
use crate::types::*; | ||
use std::path::PathBuf; | ||
use std::ffi::{OsStr, OsString}; | ||
use std::path::Path; | ||
use tinytemplate::TinyTemplate; | ||
|
||
/** | ||
* StructuredLogParser | ||
* Parses a structured log and returns a vec of file outputs. | ||
* Implement this trait to add your own analyses. | ||
* | ||
* 'e is the lifetime of the envelope being parsed | ||
*/ | ||
pub trait StructuredLogParser { | ||
// If this returns Some value, the parser will be run on that metadata. | ||
// Otherwise, it will be skipped. | ||
fn get_metadata<'e>(&self, e: &'e Envelope) -> Option<Metadata<'e>>; | ||
|
||
// Take a log input and the metadata you asked for, return a set of files to write | ||
fn parse<'e>(&self, | ||
lineno: usize, // Line number from log | ||
metadata: Metadata<'e>, // Metadata from get_metadata | ||
rank: Option<u32>, // Rank of the log | ||
compile_id: &Option<CompileId>, // Compile ID of the envelope | ||
payload: &str // Payload from the log (empty string when None) | ||
) -> anyhow::Result<ParseOutput>; | ||
|
||
// Name of the parser, for error logging | ||
fn name(&self) -> &'static str; | ||
} | ||
|
||
// Takes a filename and a payload and writes that payload into a the file | ||
fn simple_file_output( | ||
filename: &str, | ||
lineno: usize, | ||
compile_id: &Option<CompileId>, | ||
payload: &str | ||
) -> anyhow::Result<ParseOutput> { | ||
let compile_id_dir: PathBuf = compile_id | ||
.as_ref() | ||
.map_or( | ||
format!("unknown_{lineno}"), | ||
|CompileId { | ||
frame_id, | ||
frame_compile_id, | ||
attempt, | ||
}| { format!("{frame_id}_{frame_compile_id}_{attempt}") }, | ||
) | ||
.into(); | ||
let subdir = PathBuf::from(compile_id_dir); | ||
let f = subdir.join(filename); | ||
Ok(Vec::from([(f, String::from(payload))])) | ||
} | ||
|
||
/** | ||
* Parser for simple output dumps where the metadata is a sentinel {} | ||
*/ | ||
pub struct SentinelFileParser { | ||
filename: &'static str, | ||
get_sentinel: fn (&Envelope) -> Option<&EmptyMetadata>, | ||
} impl SentinelFileParser { | ||
pub fn new(filename: &'static str, get_sentinel: fn (&Envelope) -> Option<&EmptyMetadata>) -> Self { | ||
Self { filename, get_sentinel } | ||
} | ||
} | ||
impl StructuredLogParser for SentinelFileParser { | ||
fn name(&self) -> &'static str { | ||
self.filename | ||
} | ||
fn get_metadata<'e>(&self, e: &'e Envelope) -> Option<Metadata<'e>> { | ||
(self.get_sentinel)(e).map(|m| Metadata::Empty(m)) | ||
} | ||
fn parse<'e>(&self, | ||
lineno: usize, | ||
_metadata: Metadata<'e>, | ||
_rank: Option<u32>, | ||
compile_id: &Option<CompileId>, | ||
payload: &str | ||
) -> anyhow::Result<ParseOutput> { | ||
simple_file_output(&format!("{}.txt",self.filename), lineno, compile_id, payload) | ||
} | ||
} | ||
|
||
// Same as SentinelFileParser, but can log the size of the graph | ||
pub struct DynamoOutputGraphParser; | ||
impl StructuredLogParser for DynamoOutputGraphParser { | ||
fn name(&self) -> &'static str { | ||
"dynamo_output_graph" | ||
} | ||
fn get_metadata<'e>(&self, e: &'e Envelope) -> Option<Metadata<'e>> { | ||
e.dynamo_output_graph.as_ref().map(|m| Metadata::DynamoOutputGraph(m)) | ||
} | ||
fn parse<'e>(&self, | ||
lineno: usize, | ||
_metadata: Metadata<'e>, // TODO: log size of graph | ||
_rank: Option<u32>, | ||
compile_id: &Option<CompileId>, | ||
payload: &str | ||
) -> anyhow::Result<ParseOutput> { | ||
simple_file_output("dynamo_output_graph.txt", lineno, compile_id, payload) | ||
} | ||
} | ||
|
||
pub struct DynamoGuardParser<'t> { | ||
tt: &'t TinyTemplate<'t>, | ||
} | ||
impl StructuredLogParser for DynamoGuardParser<'_> { | ||
fn name(&self) -> &'static str { | ||
"dynamo_guards" | ||
} | ||
fn get_metadata<'e>(&self, e: &'e Envelope) -> Option<Metadata<'e>> { | ||
e.dynamo_guards.as_ref().map(|m| Metadata::Empty(m)) | ||
} | ||
fn parse<'e>(&self, | ||
lineno: usize, | ||
_metadata: Metadata<'e>, | ||
_rank: Option<u32>, | ||
compile_id: &Option<CompileId>, | ||
payload: &str | ||
) -> anyhow::Result<ParseOutput> { | ||
let filename = format!("{}.html", self.name()); | ||
let guards = serde_json::from_str::<Vec<DynamoGuard>>(payload)?; | ||
let guards_context = DynamoGuardsContext { guards }; | ||
let output = self.tt.render(&filename, &guards_context)?; | ||
simple_file_output(&filename, lineno, compile_id, &output) | ||
} | ||
} | ||
|
||
pub struct InductorOutputCodeParser; | ||
impl StructuredLogParser for InductorOutputCodeParser { | ||
fn name(&self) -> &'static str { | ||
"inductor_output_code" | ||
} | ||
fn get_metadata<'e>(&self, e: &'e Envelope) -> Option<Metadata<'e>> { | ||
e.inductor_output_code.as_ref().map(|m| Metadata::InductorOutputCode(m)) | ||
} | ||
|
||
fn parse<'e>(&self, | ||
lineno: usize, | ||
metadata: Metadata<'e>, | ||
_rank: Option<u32>, | ||
compile_id: &Option<CompileId>, | ||
payload: &str | ||
) -> anyhow::Result<ParseOutput> { | ||
if let Metadata::InductorOutputCode(metadata) = metadata { | ||
let filename = metadata | ||
.filename | ||
.as_ref() | ||
.and_then(|p| Path::file_stem(p)) | ||
.map_or_else( | ||
|| PathBuf::from("inductor_output_code.txt"), | ||
|stem| { | ||
let mut r = OsString::from("inductor_output_code_"); | ||
r.push(stem); | ||
r.push(OsStr::new(".txt")); | ||
r.into() | ||
}, | ||
); | ||
simple_file_output(&filename.to_string_lossy(), lineno, compile_id, payload) | ||
} else { | ||
Err(anyhow::anyhow!("Expected InductorOutputCode metadata")) | ||
} | ||
} | ||
} | ||
|
||
pub struct OptimizeDdpSplitChildParser; | ||
impl StructuredLogParser for OptimizeDdpSplitChildParser { | ||
fn name(&self) -> &'static str { | ||
"optimize_ddp_split_child" | ||
} | ||
fn get_metadata<'e>(&self, e: &'e Envelope) -> Option<Metadata<'e>> { | ||
e.optimize_ddp_split_child.as_ref().map(|m| Metadata::OptimizeDdpSplitChild(m)) | ||
} | ||
|
||
fn parse<'e>(&self, | ||
lineno: usize, | ||
metadata: Metadata<'e>, | ||
_rank: Option<u32>, | ||
compile_id: &Option<CompileId>, | ||
payload: &str | ||
) -> anyhow::Result<ParseOutput> { | ||
if let Metadata::OptimizeDdpSplitChild(m) = metadata { | ||
let filename = format!("optimize_ddp_split_child_{}.txt", m.name); | ||
simple_file_output(&filename, lineno, compile_id, payload) | ||
} else { | ||
Err(anyhow::anyhow!("Expected OptimizeDdpSplitChild metadata")) | ||
} | ||
} | ||
} | ||
|
||
// Register your parser here | ||
pub fn all_parsers<'t>(tt: &'t TinyTemplate<'t>) -> Vec<Box<dyn StructuredLogParser + 't>> { | ||
// We need to use Box wrappers here because vecs in Rust need to have known size | ||
let result : Vec<Box<dyn StructuredLogParser>> = vec![ | ||
Box::new(SentinelFileParser::new("optimize_ddp_split_graph", |e| e.optimize_ddp_split_graph.as_ref())), | ||
Box::new(SentinelFileParser::new("compiled_autograd_graph", |e| e.compiled_autograd_graph.as_ref())), | ||
Box::new(SentinelFileParser::new("aot_forward_graph", |e| e.aot_forward_graph.as_ref())), | ||
Box::new(SentinelFileParser::new("aot_backward_graph", |e| e.aot_backward_graph.as_ref())), | ||
Box::new(SentinelFileParser::new("aot_joint_graph", |e| e.aot_joint_graph.as_ref())), | ||
Box::new(SentinelFileParser::new("inductor_post_grad_graph", |e| e.inductor_post_grad_graph.as_ref())), | ||
Box::new(DynamoOutputGraphParser), | ||
Box::new(DynamoGuardParser { tt }), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if the individual analyses should be responsible for their own template instances themselves |
||
Box::new(InductorOutputCodeParser), | ||
Box::new(OptimizeDdpSplitChildParser), | ||
]; | ||
|
||
result | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the abstraction you've introduced is good for artifacts, but I think there's a class of other things that we could potentially get from the trace log which don't create a file. For example, imagine if we started emitting traditional traces (start + end events) to the log, and then wanted to visualize these together. These would be strewn across many different trace messages and you wouldn't want to make a file per each one.
I think my primary ask here is to avoid overabstracting, at least for now. I think there's a decent case to be made for generalizing artifacts (but note that artifacts as currently implemented have some funny problems, e.g., when ddp optimize is on, you'll get multiple copies of the same artifact in the same compile id), but I would hesitate to say that we have an abstraction that works for arbitrary analyses you might want to do. (If you really wanted to design it, you'd probably want some sort of state machine per analysis, with enough structure in the input parsing so you can efficiently dispatch to the correct analyses that actually care about a given token without having to O(N) loop through all analyses... and that's not getting into if there's ever shared information that wants to be saved over other analyses. It probably also matters if we want streaming or if we can just assume everything fits in RAM.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it might be that analyses that need to span across multiple envelopes and events, such as this one, won't fit into this model, which is fine. I think the case I want to cover is the most common one where someone wants to log a some type of event that occurs a constant number of times per compilation, that can be rendered per event (i.e. compilation metrics)
Trying to abstract the idea of "log collects global information into a single template or UI" like the one for the stack trie gets to the point of being cumbersome