Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect and print/save timings #276

Merged
merged 13 commits into from
Oct 9, 2024
Merged

Collect and print/save timings #276

merged 13 commits into from
Oct 9, 2024

Conversation

johnomotani
Copy link
Collaborator

@johnomotani johnomotani commented Oct 8, 2024

Useful to monitor for performance regressions.

Timers are printed at the end of a run (this can be disabled by setting display_timing_info=false in the [output] section), and saved to the output file. They can be plotted using makie_post_processing, see the docs.

Also, the Profile package seemed to be interacting badly with MPI in one case I tried - the run was much slower with profiling on, with a lot of time spent in MPI calls - so 'standard' profiling isn't necessarily representative and this feature provides an alternative. Hopefully this is a temporary bug in some external package(s) which will go away at some point, because sampling profiles are still pretty useful!

Provides an option to switch on extra 'debug timers', see the docs.

@johnomotani johnomotani added the enhancement New feature or request label Oct 8, 2024
@johnomotani
Copy link
Collaborator Author

One thing to be aware of with this PR - in order to save all the timers that get created to the output files, we have to gather a list of them, possibly update the list for the first couple of output steps, and MPI-communicate the list to all the MPI ranks that have to write to the output file. The code that does all that could be a little bit fragile if something happens that wasn't anticipated when it was written. Hopefully in getting all the tests to pass I've worked out those kinks, but if you get errors that come from write_timing_data() in file_io.jl or from format_global_timer() in timer_utils.jl, then this PR was almost certainly the cause. At worst I think those errors would cause a crash, or write incorrect timing data, there shouldn't be anything in this PR that could modify physics output of the code!

@johnomotani johnomotani merged commit 9284bd9 into master Oct 9, 2024
21 checks passed
@johnomotani johnomotani deleted the timings branch October 9, 2024 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collect timing information by default?
1 participant