Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure to restart-reproduce if using a restart from 15th of month #2588

Open
DeniseWorthen opened this issue Feb 3, 2025 · 7 comments
Open
Labels
bug Something isn't working

Comments

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Feb 3, 2025

Description

As part of debugging Issue #2562, I was passed a run directory for the SFS C192mx025 by @ShanSunNOAA. While working on that issue, I found I was not able to restart-reproduce if I used a restart file from the middle of the month (specifically on 2005-11-15-00).

I then set up a test case using a modified cpld_control_sfs test and the HR4 tag (fcc9f84). The modifications were to align w/ the run-directory I was debugging for C192-mx025 (no waves, atm-thread=2).

I ran that test case out long enough to capture restarts every 24h through to 2021-04-26-06 . I found that I was able to reproduce using the restart at 04-14-06, but not at 04-15-06.

To enable easier debugging, I set up cpld_control_sfs cases using artificially advanced start times---ie, I set the start year/date to 04-13-06 and wrote restarts every 6 hours. I found was able to restart-reproduce using the restart at 04-14-18 but not at 04-15-00.

I repeated the test using an executable which did not have -D32BIT=ON -DHYDRO=ON and the restart again failed to reproduce using a restart on the 04-15-00.

Using mediator history files, I find that that the none of the fields imported from the ATM on restart are B4B using a restart from 04-15-00.

To Reproduce:

Currently all test cases reside in my own sandboxes on hera /scratch1/NCEPDEV/stmp2/Denise.Worthen/sfs.restart

Additional context

I am currently testing the develop branch using the control_c48, control_p8 and the cpld_control_p8 tests.

@DeniseWorthen DeniseWorthen added the bug Something isn't working label Feb 3, 2025
@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Feb 4, 2025

I've created a reproducer branch which reproduces this error in the control_p8 test.

https://github.com/DeniseWorthen/ufs-weather-model/tree/bugfix/d15restart

It can be run using ./rt.sh -ek -l rt.rst15 -a nems >output 2>&1 &. This will run a control and then three restart tests, one using the 041418 restarts, one using the 041500 and one using the 041506. It doesn't depend on creating a baseline first, but that means that the files need to be manually compared afterwards. For example:

nccmp -d -S -q -f -g -B --Attribute=checksum --warn=format control_p8_intel/RESTART/20210416.000000.sfc_data.tile1.nc control_restart_p8_1418_intel/RESTART/20210416.000000.sfc_data.tile1.nc

will compare restarts from the control vs the 1418 runs.

And

nccmp -d -S -q -f -g -B --Attribute=checksum --warn=format control_p8_intel/RESTART/20210416.000000.sfc_data.tile1.nc control_restart_p8_1500_intel/RESTART/20210416.000000.sfc_data.tile1.nc

will compare the control and the 1500 runs. In this case, the nccmp result shows

Variable      Group Count          Sum      AbsSum          Min          Max       Range         Mean      StdDev
tsea          /      9193     -21.9127      358.67     -3.74133      3.96841     7.70974  -0.00238363    0.174327
sheleg        /        65    -0.188072    0.959371    -0.191116     0.183119    0.374235  -0.00289341   0.0386586
zorl          /      7314      3.22824      35.123    -0.509531      1.02241     1.53194  0.000441378   0.0325821
canopy        /       929     0.776428     35.4243    -0.525819      1.34609     1.87191  0.000835767   0.0960558
f10m          /      9216   -0.0735146     1.88722   -0.0115725    0.0101339   0.0217064 -7.97684e-06 0.000564475
t2m           /      9216     -28.2394     872.518     -2.74664      2.91128     5.65791  -0.00306417    0.199814
....

@LarissaReames-NOAA
Copy link
Collaborator

@yangfanglin Since @DeniseWorthen's tests suggest that this only happens with restarts on the 15th, which is the date climatology fields are read in, do you think this might be some bug related to climo file read logic?

@HelinWei-NOAA
Copy link
Collaborator

Good catch. During the middle of month the GVF will be updated with a new value based on the monthly climatology. It is very likely when you restart from 15th of month, the model will bypass that step.

@yangfanglin Since @DeniseWorthen's tests suggest that this only happens with restarts on the 15th, which is the date climatology fields are read in, do you think this might be some bug related to climo file read logic?

@HelinWei-NOAA
Copy link
Collaborator

Just for a test, they should reproduce if you set wei1m to 1 in sfcsub.f (not change any fixed fields on the 15th of month)

@DeniseWorthen
Copy link
Collaborator Author

Would you be able to do any debugging on this issue? The reproducer branch is all set up to use control_p8 and then run 3 different FHROT values.

@HelinWei-NOAA
Copy link
Collaborator

for warmstart, the model will assume you can get anything from restart files and won't go through sfcsub.f

@DeniseWorthen
Copy link
Collaborator Author

These are all restarting using the checkpoint restarts written at a particular time. They are using 'warm_start=true'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants