-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance scaling #94
Comments
From the internal UM timer diagnostics, e.g.
Calculated standard deviations of the routines. Sorted values are
Notable that most of the standard deviation comes from just two relatively inexpensive routines. Excluding these removes most of the variation. SFEXCH includes boundary layer calculations w/o any communications so it's a mystery why it should be so variable. |
Occasional extreme cases can occur. The following results were for a month long simulation configured to write separate UM output each day, run on 2024-10-04. The exact same simulation was then repeated on 2024-10-08: 2024-10-04:
2024-10-08:
Differences in mean routine times across the PEs are
Plots of the same thing: It's mostly the same routines from before causing the slowdown. However |
@aidanheerdegen suggested some extra sanity checks to test the whether writing diagnostics contributes to the wall time variation. The previous simulations were run using the "standard" presets, apart from the last comment, where I used the draft "spinup" preset to save on space. I've done two additional runs – one with the "standard" preset, and one with no output – both of which were started at the same time. Both increased the atmosphere to 240 processors, as I'd like to confirm whether this configuration is generally faster. The following shows the wall times for the two runs: Both were essentially stable, with only two single-year jumps in the diagnostic saving run. The routine behind the largest jump In any case, it looks like under "normal/good" circumstances, increasing the atmosphere to 240 processors does cut down the walltime without a cost in SUs. @MartinDix and @aidanheerdegen, would you be in favour of changing the decomposition in our released configurations? |
Seems like a no-brainer to me. Some other points:
|
Thanks @aidanheerdegen. I took a quick look through the issue:
|
Sorry @manodeep, but confusingly in ESM world "diagnostics" often (and in this case) refer to output data fields. They are "diagnosed" (derived) from the physical (prognostic) fields, e.g. for ocean these prognostic fields are temperature, salinity and velocity (might have missed something). Everything else is derived from them, and there are a huge number of diagnostic variables, so they're never all saved, as some are for very specific use cases. Also as the resolution increases the amount of disk space these variables consume increases dramatically, so often only very specific diagnostic variables are saved. |
Hahahaha - I should have known better than to assume terminology in a entirely different field 😄! 10% for i/o seems reasonable - presumably the io is (somewhat) parallel? |
Well it gets even more confusing, because I know the MOM5 (ocean) model has It's almost intentionally confusing! :)
Different models differ. The ocean model gathers ranks and outputs a number of tiles from separate PEs in parallel, and then stitches them together in a post-processing step. CICE (ice model) is serial, and becomes a serious bottleneck for higher resolutions, so the OM2 model uses PIO to parallelise the IO. Not sure about the atmosphere. Again at higher core counts it uses an IO server ( The problem with IO isn't the overhead so much, it's the incredible variability when the lustre file system gets overloaded. It means we have to put in some wall time padding to allow for this, which affects the queue time, though I've not quantified how much difference it makes. |
I have certainly encountered issues with the metadata servers being overloaded by other jobs, and resulting in significant differences in runtime. One option could be to write to node-level filesystems (typically |
From Spencer's 20240827-release-preindustrial+concentrations-run with 192 atmosphere, 180 ocean, 16 ice PEs
Walltime from UM
atm.fort6.pe0
filesDifference between the PBS time and the UM time is around 60 s. Variation here will make scaling analysis trickier.
The text was updated successfully, but these errors were encountered: