Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compiled_autograd_id to CompileId #83

Merged
merged 5 commits into from
Dec 21, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -381,10 +381,22 @@ pub fn parse_path(path: &PathBuf, config: ParseConfig) -> anyhow::Result<ParseOu
.map_or(
format!("unknown_{lineno}"),
|CompileId {
compiled_autograd_id,
frame_id,
frame_compile_id,
attempt,
}| { format!("{frame_id}_{frame_compile_id}_{attempt}") },
}| {
let frame_id_str = frame_id.map_or("-".to_string(), |v| v.to_string());
let frame_compile_id_str =
frame_compile_id.map_or("-".to_string(), |v| v.to_string());
let attempt_str = attempt.map_or("-".to_string(), |v| v.to_string());

if let Some(ca_id) = compiled_autograd_id {
format!("{ca_id}_{frame_id_str}_{frame_compile_id_str}_{attempt_str}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the file naming here isn't load bearing, I wonder if we should do something wordier like "compiled_autograd_{ca_id}_..." so it's obvious that compiled autograd is involved here. Well, I guess the filename doesn't matter that much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use abbreviations like ca{id}_... to keep things compact and to stick under the 255 char limit for windows

} else {
format!("{frame_id_str}_{frame_compile_id_str}_{attempt_str}")
}
},
)
.into();
let parser: Box<dyn StructuredLogParser> =
Expand Down Expand Up @@ -450,7 +462,10 @@ pub fn parse_path(path: &PathBuf, config: ParseConfig) -> anyhow::Result<ParseOu
}
let mut cid = e.compile_id.clone();
if let Some(c) = cid.as_mut() {
c.attempt = 0;
if let Some(_frame_id) = c.frame_compile_id {
// data migration for old logs that don't have attempt
c.attempt = Some(0);
}
}
metrics_index.entry(cid).or_default().push(m.clone());
}
Expand Down
19 changes: 17 additions & 2 deletions src/parsers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,22 @@ fn simple_file_output(
.map_or(
format!("unknown_{lineno}"),
|CompileId {
compiled_autograd_id,
frame_id,
frame_compile_id,
attempt,
}| { format!("{frame_id}_{frame_compile_id}_{attempt}") },
}| {
let frame_id_str = frame_id.map_or("-".to_string(), |v| v.to_string());
let frame_compile_id_str =
frame_compile_id.map_or("-".to_string(), |v| v.to_string());
let attempt_str = attempt.map_or("-".to_string(), |v| v.to_string());

if let Some(ca_id) = compiled_autograd_id {
format!("{ca_id}_{frame_id_str}_{frame_compile_id_str}_{attempt_str}")
} else {
format!("{frame_id_str}_{frame_compile_id_str}_{attempt_str}")
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and your earlier site have to be updated in lockstep, so now that it is this complicated you should actually factor this out to a dedicated function.

)
.into();
let subdir = PathBuf::from(compile_id_dir);
Expand Down Expand Up @@ -380,7 +392,10 @@ impl StructuredLogParser for CompilationMetricsParser<'_> {
.map_or("(unknown) ".to_string(), |c| format!("{cid} ", cid = c));
let mut cid = compile_id.clone();
if let Some(c) = cid.as_mut() {
c.attempt = 0;
if let Some(_frame_id) = c.frame_compile_id {
// data migration for old logs that don't have attempt
c.attempt = Some(0);
}
}
let stack_html = self
.stack_index
Expand Down
23 changes: 17 additions & 6 deletions src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -128,16 +128,27 @@ impl StackTrieNode {

#[derive(Eq, PartialEq, Hash, Deserialize, Serialize, Debug, Clone)]
pub struct CompileId {
pub frame_id: u32,
pub frame_compile_id: u32,
pub attempt: u32,
pub compiled_autograd_id: Option<u32>,
pub frame_id: Option<u32>,
pub frame_compile_id: Option<u32>,
pub attempt: Option<u32>,
}

impl fmt::Display for CompileId {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "[{}/{}", self.frame_id, self.frame_compile_id)?;
if self.attempt != 0 {
write!(f, "_{}", self.attempt)?;
write!(f, "[")?;
if let Some(compiled_autograd_id) = self.compiled_autograd_id {
write!(f, "{}/", compiled_autograd_id)?;
}
let frame_id = self.frame_id.map_or("-".to_string(), |v| v.to_string());
let frame_compile_id = self
.frame_compile_id
.map_or("-".to_string(), |v| v.to_string());
write!(f, "{}/{}", frame_id, frame_compile_id)?;
if let Some(attempt) = self.attempt {
if attempt != 0 {
write!(f, "_{}", attempt)?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so this is for the display, and I think it's worth bikeshedding this part a little. My main concern is that if we render as 0/1/0_0, people are not going to know what this means. In the old days, I designed the compile id to be compact so that I could put it in front of every log message so you knew what compile id it was associated with. But now in the tlparse era it is less important for it to be compact (though I guess it is still nice for TORCH_TRACE to be compact). I feel like there are two ways we can address this. One is to have some expanded syntax for compiled autograd like "compiled autograd 0: 1/0_0" which makes it obvious it's compiled autograd. The other is to make this an HTML renderer and then add a tooltip so when you mouse over you see the complete thing. I'm happy to defer to implementor's privilege here but at the very minimum, if you are going to stick to this syntax, you need to update the help text in tlparse to explain what it does.

By the way, when all three latter fields are empty, I think it's still helpful to just elide the rhs entirely, so you just have "compiled autograd 0". You can't do this without a special compiled autograd sigil though, because o/w it's ambiguous!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should render the id as hierarchical in the tlparse, and the different id fields would be displayed separately:

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok too! But it is nice to have a globally unique id too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want a globally unique id that's visible to the user, and that's consistent with how we save it in our datastores, i think we should just keep using [x/y/z_a] and [y/z_a] out of simplicity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have another question for you about future proofing. Another place we may need to insert another level of hierarchy is for DDPOptimizer, which splits a single Dynamo graph into multiple subgraphs. Here, this would be done most logically by adding another level of hierarchy below z, i.e. [y/z_a/h]. When a is 0 (and therefore elided), how do we disambiguate between [x/y/z] and [y/z/h]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok regathering the requirements:

  1. We want CompileId to be a program-level uid
  2. We want this displayed to the user so it should be short. It's not necessary for tlparse, but it's very nice to have for TORCH_TRACE and internal dashboards
  3. We want this used in downstream data pipelines so it should be easily parsable from a string

One option is only allow eliding arguments that use prefix/suffix unique tokens: [!0/1/0] vs [0/1/@0]. The compact form is favorable to be directly used by internal dashboards widegets where it might be hard to have custom rendering

We probably want compile directory to also be unique, but special tokens don't work well with filesystems, so we could continue with a no eliding approach: 0_0_0_0_0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the filesystem directory I really don't care haha, just as long as it's unique

}
write!(f, "]")
}
Expand Down
Loading
Loading