Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect timing information by default? #273

Open
johnomotani opened this issue Oct 3, 2024 · 7 comments · Fixed by #276
Open

Collect timing information by default? #273

johnomotani opened this issue Oct 3, 2024 · 7 comments · Fixed by #276

Comments

@johnomotani
Copy link
Collaborator

Proposal: collect and print some high-level timing data by default. TimerOutputs.jl (which we've used a little bit, but not routinely) has some nice features, like being able to annotate function definitions to gather timing data each time that function is called.

Output from a run might look something like this:

julia> run_moment_kinetics("examples/recycling-fraction/wall-bc_recyclefraction0.5_split3_SSPRK4.toml")
Starting setup   13:04:10
beginning time advance   13:04:11
writing moments output 2     t = 0.001    nstep = 100      13:04:24    residuals:  density 1.962       density_neutral 11.14      
writing distribution functions output 2     t = 0.001    nstep = 100      13:04:24
writing moments output 3     t = 0.002    nstep = 200      13:04:37    residuals:  density 1.891       density_neutral 9.027      
writing distribution functions output 3     t = 0.002    nstep = 200      13:04:37
writing moments output 4     t = 0.003    nstep = 300      13:04:50    residuals:  density 1.81        density_neutral 9.435      
writing distribution functions output 4     t = 0.003    nstep = 300      13:04:50
writing moments output 5     t = 0.004    nstep = 400      13:05:04    residuals:  density 1.822       density_neutral 9.399      
writing distribution functions output 5     t = 0.004    nstep = 400      13:05:04
writing moments output 6     t = 0.005    nstep = 500      13:05:17    residuals:  density 1.805       density_neutral 9.449      
writing distribution functions output 6     t = 0.005    nstep = 500      13:05:17
writing moments output 7     t = 0.006    nstep = 600      13:05:30    residuals:  density 1.812       density_neutral 9.521      
writing distribution functions output 7     t = 0.006    nstep = 600      13:05:30
writing moments output 8     t = 0.007    nstep = 700      13:05:43    residuals:  density 1.789       density_neutral 9.225      
writing distribution functions output 8     t = 0.007    nstep = 700      13:05:43
writing moments output 9     t = 0.008    nstep = 800      13:05:56    residuals:  density 1.793       density_neutral 9.333      
writing distribution functions output 9     t = 0.008    nstep = 800      13:05:56
writing moments output 10    t = 0.009    nstep = 900      13:06:10    residuals:  density 1.76        density_neutral 9.758      
writing distribution functions output 10    t = 0.009    nstep = 900      13:06:10
writing moments output 11    t = 0.01     nstep = 1000     13:06:23    residuals:  density 1.749       density_neutral 8.969      
writing distribution functions output 11    t = 0.01     nstep = 1000     13:06:23
finished file io         13:06:23
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                    Time                    Allocations      
                                                           ───────────────────────   ────────────────────────
                     Tot / % measured:                           133s / 100.0%           1.99GiB / 100.0%    

 Section                                           ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────
 moment_kinetics                                        1     133s  100.0%    133s   1.99GiB  100.0%  1.99GiB
   setup_moment_kinetics                                1    335ms    0.3%   335ms   89.3MiB    4.4%  89.3MiB
   time_advance!                                        1     132s   99.7%    132s   1.90GiB   95.6%  1.90GiB
     ssp_rk!                                        1.00k     132s   99.6%   132ms   1.87GiB   93.9%  1.92MiB
       euler_time_advance!                          4.00k     104s   78.2%  25.9ms   1.66GiB   83.6%   436KiB
         vpa_advection!                             4.00k    14.1s   10.6%  3.52ms    688KiB    0.0%     176B
         z_advection!                               4.00k    16.3s   12.3%  4.07ms    688KiB    0.0%     176B
         source_terms!                              4.00k    1.30s    1.0%   325μs    688KiB    0.0%     176B
         neutral_advection_z!                       4.00k    14.3s   10.8%  3.58ms    688KiB    0.0%     176B
         neutral_advection_vz!                      4.00k    13.5s   10.2%  3.37ms    688KiB    0.0%     176B
         source_terms_neutral!                      4.00k    771ms    0.6%   193μs    688KiB    0.0%     176B
         ion_charge_exchange_collisions_1V!         4.00k    12.7s    9.6%  3.18ms    565MiB   27.7%   145KiB
         neutral_charge_exchange_collisions_1V!     4.00k    9.49s    7.2%  2.37ms    565MiB   27.7%   145KiB
         ion_ionization_collisions_1V!              4.00k    12.7s    9.5%  3.16ms    565MiB   27.7%   145KiB
         krook_collisions!                          4.00k    3.99s    3.0%   998μs    688KiB    0.0%     176B
         external_ion_source!                       4.00k    4.41s    3.3%  1.10ms    688KiB    0.0%     176B
       apply_all_bcs_constraints_update_moments!    4.00k    22.7s   17.1%  5.67ms    201MiB    9.9%  51.5KiB
     write_data_to_ascii                               10   3.35μs    0.0%   335ns     0.00B    0.0%    0.00B
     write_all_moments_data_to_binary                  10   68.3ms    0.1%  6.83ms   5.20MiB    0.3%   532KiB
     write_all_dfns_data_to_binary                     10    116ms    0.1%  11.6ms   28.2MiB    1.4%  2.82MiB
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────

To get that output I've filtered out some function calls that took a very short amount of time, to avoid cluttering up the screen.

Questions:

  • would people find this useful or just annoying?
  • should it be something that we turn on optionally, or just printed by default?
  • do we want more detailed timing, e.g. include lower level functions? If we go too far with this, the overhead of doing timing might become noticeable though...
  • Is the output too long? We could exclude everything that takes less that 1% of the total run time for example.

With the current setup, the extra cost of timing is not noticable. I did an @benchmark on a run that takes about 13.5s, and the I couldn't see any difference difference with/without timing, on a sample of 18 runs of each.

There is an option for 'debug timers' that use a macro that will be removed by the compiler if 'debug timing' is not explicitly activated (so 'debug timers' have zero overhead for production runs). I'll add some of these to (optionally) time some MPI calls and LU-solves.

If we add this as a feature, I'll probably try to write out the timing information for each MPI process into the output file in case we want to look at it later. That will require a bit of work to gather all the timing info and write it in a sensible format to HDF5.

The motivation for this was that in preparation for trying to parallelise the kinetic electron solver a bit better, I wanted to be able to profile the code. Using Julia's @profile with MPI was giving me strange results - the run with profiling was a lot slower than a normal run, with an unexpectedly large amount of time spent in MPI calls. I think I've used @profile before and not had this problem, so not sure what's going on, but in thinking about a workaround it seemed like collecting some timing data might be generically useful.

@mrhardman
Copy link
Collaborator

mrhardman commented Oct 3, 2024

Questions:

  • would people find this useful or just annoying?

This looks very useful for expert users. However, I don't think this information is useful for a user that isn't also developing -- unless we at some point get to problems where specific choices of processor count are faster than others because of different splits. I don't think we are there yet because the distributed memory MPI is only for r and z and only one set of magic numbers are permitted.

  • should it be something that we turn on optionally, or just printed by default?

I would go for printing this optionally.

  • do we want more detailed timing, e.g. include lower level functions? If we go too far with this, the overhead of doing timing might become noticeable though...

I would not do this by default, and I would also be concerned that the timer would slow down the code.

  • Is the output too long? We could exclude everything that takes less that 1% of the total run time for example.

The present output looks acceptable. If data is removed, perhaps work noting that anything <1% is hidden.

@LucasMontoya4
Copy link
Collaborator

I agree with Michael. I do really like the idea of having this as an option (though I suppose the default should be that it is turned off). I sometimes worry about whether I've written an inefficient function or caused a memory leak, and this would be a very quick way of checking that compared to using @profile!

@johnomotani
Copy link
Collaborator Author

There is the point of view that at the moment all users are developers, so we could print by default for now, and make it optional at some point in the future. If you are a developer and regularly changing code, the timings are a useful sanity check that you didn't do something awful!

@mrhardman
Copy link
Collaborator

That depends on how difficult it is to make it optional. If you really want this as a default feature, it would be nice to have an option to turn off this output so that any future changes are limited to input option defaults.

@johnomotani
Copy link
Collaborator Author

That depends on how difficult it is to make it optional. If you really want this as a default feature, it would be nice to have an option to turn off this output so that any future changes are limited to input option defaults.

That would be an easy and sensible compromise. I'd probably stick an extra flag in the [output] section.

@johnomotani johnomotani linked a pull request Oct 8, 2024 that will close this issue
@mrhardman
Copy link
Collaborator

Not sure where best to record this question, so reopening this issue:

Why does the timer present strange information regarding total runtime in the two examples below? Specifically, why is the total measured field seemingly filled with numbers modified from 100% by an undetermined prefactor? Everything in the details moment_kinetics rows and below, seems consistent with the physical time of the run, but the header information does not seem clear.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                         Time                    Allocations
                                                                ───────────────────────   ────────────────────────
                       Tot / % measured:                             17.1s / 875.4%           5.92GiB /  21.9%

Section                                                 ncalls     time    %tot     avg     alloc    %tot      avg
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
moment_kinetics                                              1     149s  100.0%    149s   1.29GiB  100.0%  1.29GiB
────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                   Time                    Allocations
                                                          ───────────────────────   ────────────────────────
                    Tot / % measured:                        5124096h /   0.0%          5.67GiB /  18.4%

Section                                           ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────────────────────────────
moment_kinetics                                        1    22.2s  100.0%   22.2s   1.04GiB  100.0%  1.04GiB

@mrhardman mrhardman reopened this Nov 27, 2024
@johnomotani
Copy link
Collaborator Author

Don't know, don't think I've ever seen that...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants