Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compiled_autograd_id to CompileId #83

Merged
merged 5 commits into from
Dec 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 5 additions & 9 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -378,14 +378,7 @@ pub fn parse_path(path: &PathBuf, config: ParseConfig) -> anyhow::Result<ParseOu
let compile_id_dir: PathBuf = e
.compile_id
.as_ref()
.map_or(
format!("unknown_{lineno}"),
|CompileId {
frame_id,
frame_compile_id,
attempt,
}| { format!("{frame_id}_{frame_compile_id}_{attempt}") },
)
.map_or(format!("unknown_{lineno}"), |cid| cid.as_directory_name())
.into();
let parser: Box<dyn StructuredLogParser> =
Box::new(crate::parsers::CompilationMetricsParser {
Expand Down Expand Up @@ -450,7 +443,10 @@ pub fn parse_path(path: &PathBuf, config: ParseConfig) -> anyhow::Result<ParseOu
}
let mut cid = e.compile_id.clone();
if let Some(c) = cid.as_mut() {
c.attempt = 0;
if let Some(_frame_id) = c.frame_compile_id {
// data migration for old logs that don't have attempt
c.attempt = Some(0);
}
}
metrics_index.entry(cid).or_default().push(m.clone());
}
Expand Down
14 changes: 5 additions & 9 deletions src/parsers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -56,14 +56,7 @@ fn simple_file_output(
) -> anyhow::Result<ParserResults> {
let compile_id_dir: PathBuf = compile_id
.as_ref()
.map_or(
format!("unknown_{lineno}"),
|CompileId {
frame_id,
frame_compile_id,
attempt,
}| { format!("{frame_id}_{frame_compile_id}_{attempt}") },
)
.map_or(format!("unknown_{lineno}"), |cid| cid.as_directory_name())
.into();
let subdir = PathBuf::from(compile_id_dir);
let f = subdir.join(filename);
Expand Down Expand Up @@ -380,7 +373,10 @@ impl StructuredLogParser for CompilationMetricsParser<'_> {
.map_or("(unknown) ".to_string(), |c| format!("{cid} ", cid = c));
let mut cid = compile_id.clone();
if let Some(c) = cid.as_mut() {
c.attempt = 0;
if let Some(_frame_id) = c.frame_compile_id {
// data migration for old logs that don't have attempt
c.attempt = Some(0);
}
}
let stack_html = self
.stack_index
Expand Down
5 changes: 5 additions & 0 deletions src/templates.rs
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,11 @@ common cause of recompilation is a graph break in an inlined function call, whic
and avoid inlining the function in the first place.
</p>
<p>
When compiled autograd is enabled, the compile id will include a prefix signifier <code>[!a/x/y]</code>,
where a is the <strong>compiled autograd id</strong>. For instance, <code>[!0/-/-]</code> refers
to the first graph captured by compiled autograd. It is then traced by torch.compile as <code>[!0/x/y_z]</code>.
</p>
<p>
Here is a high level description of PT2's compilation phases, and the intermediate products each
phase generates:
</p>
Expand Down
41 changes: 35 additions & 6 deletions src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -128,21 +128,50 @@ impl StackTrieNode {

#[derive(Eq, PartialEq, Hash, Deserialize, Serialize, Debug, Clone)]
pub struct CompileId {
pub frame_id: u32,
pub frame_compile_id: u32,
pub attempt: u32,
pub compiled_autograd_id: Option<u32>,
pub frame_id: Option<u32>,
pub frame_compile_id: Option<u32>,
pub attempt: Option<u32>,
}

impl fmt::Display for CompileId {
// NOTE: If you want to elide an id e.g. attempt, compiled_autograd_id, you need to ensure
// the representation remains unique. One way is to use a unique prefix.

fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "[{}/{}", self.frame_id, self.frame_compile_id)?;
if self.attempt != 0 {
write!(f, "_{}", self.attempt)?;
write!(f, "[")?;
if let Some(compiled_autograd_id) = self.compiled_autograd_id {
write!(f, "!{}/", compiled_autograd_id)?;
}
let frame_id = self.frame_id.map_or("-".to_string(), |v| v.to_string());
let frame_compile_id = self
.frame_compile_id
.map_or("-".to_string(), |v| v.to_string());
write!(f, "{}/{}", frame_id, frame_compile_id)?;
if let Some(attempt) = self.attempt {
if attempt != 0 {
write!(f, "_{}", attempt)?;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so this is for the display, and I think it's worth bikeshedding this part a little. My main concern is that if we render as 0/1/0_0, people are not going to know what this means. In the old days, I designed the compile id to be compact so that I could put it in front of every log message so you knew what compile id it was associated with. But now in the tlparse era it is less important for it to be compact (though I guess it is still nice for TORCH_TRACE to be compact). I feel like there are two ways we can address this. One is to have some expanded syntax for compiled autograd like "compiled autograd 0: 1/0_0" which makes it obvious it's compiled autograd. The other is to make this an HTML renderer and then add a tooltip so when you mouse over you see the complete thing. I'm happy to defer to implementor's privilege here but at the very minimum, if you are going to stick to this syntax, you need to update the help text in tlparse to explain what it does.

By the way, when all three latter fields are empty, I think it's still helpful to just elide the rhs entirely, so you just have "compiled autograd 0". You can't do this without a special compiled autograd sigil though, because o/w it's ambiguous!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should render the id as hierarchical in the tlparse, and the different id fields would be displayed separately:

image

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok too! But it is nice to have a globally unique id too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want a globally unique id that's visible to the user, and that's consistent with how we save it in our datastores, i think we should just keep using [x/y/z_a] and [y/z_a] out of simplicity.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have another question for you about future proofing. Another place we may need to insert another level of hierarchy is for DDPOptimizer, which splits a single Dynamo graph into multiple subgraphs. Here, this would be done most logically by adding another level of hierarchy below z, i.e. [y/z_a/h]. When a is 0 (and therefore elided), how do we disambiguate between [x/y/z] and [y/z/h]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok regathering the requirements:

  1. We want CompileId to be a program-level uid
  2. We want this displayed to the user so it should be short. It's not necessary for tlparse, but it's very nice to have for TORCH_TRACE and internal dashboards
  3. We want this used in downstream data pipelines so it should be easily parsable from a string

One option is only allow eliding arguments that use prefix/suffix unique tokens: [!0/1/0] vs [0/1/@0]. The compact form is favorable to be directly used by internal dashboards widegets where it might be hard to have custom rendering

We probably want compile directory to also be unique, but special tokens don't work well with filesystems, so we could continue with a no eliding approach: 0_0_0_0_0

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the filesystem directory I really don't care haha, just as long as it's unique

}
write!(f, "]")
}
}

impl CompileId {
pub fn as_directory_name(&self) -> String {
let compiled_autograd_id_str = self
.compiled_autograd_id
.map_or("-".to_string(), |v| v.to_string());
let frame_id_str = self.frame_id.map_or("-".to_string(), |v| v.to_string());
let frame_compile_id_str = self
.frame_compile_id
.map_or("-".to_string(), |v| v.to_string());
let attempt_str = self.attempt.map_or("-".to_string(), |v| v.to_string());

format!("{compiled_autograd_id_str}_{frame_id_str}_{frame_compile_id_str}_{attempt_str}")
}
}

#[derive(Default, Debug)]
pub struct Stats {
pub ok: u64,
Expand Down
Loading
Loading