Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GEFS regression test suite from EP5r2 configuration/case #2442

Open
wants to merge 105 commits into
base: develop
Choose a base branch
from

Conversation

NickSzapiro-NOAA
Copy link
Collaborator

@NickSzapiro-NOAA NickSzapiro-NOAA commented Sep 19, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR updates the cpld_bmark_p8 tests to a prototype GEFS test case of fully coupled s2swa+IAU+stochastics in atmosphere and ocean, with configuration and warm starts from restarts of EP5r2 ensemble member 1 for 2021-03-25 06Z. The EP5r2 test case was kindly provided by @bingfu-NOAA via @junwang-noaa with aerosol input data and configurations from @lipan-NOAA.

A separate INPUTDATA_ROOT_BMIC is no longer needed and is removed.

The regression test suite samples basic reproducibility/quality checks, particularly:

  • control reproduces itself
  • restart reproduces control
  • changing number of tasks reproduces control
  • changing number of threads reproduces control
  • Intel debug version reproduces itself
  • GNU debug version reproduces itself

All tests do not pass across all platforms, summarized in this regression test suite matrix:
image
and summary slides on pre-test and common issues.

Some tests fail in common. This commit helps share reproducers to follow up on remaining issues. User needs to uncomment these tests to run. Failures may require library/platform support. Hopefully committing this test suite as work in progress facilitates collaborative development particularly in:

  • Platforms with GNU support for coupled model
  • Sensitivity to spack-stack updates (including dcp failure & splitting ESMF_MeshCreate in WW3 and intel-debug failure and HDF update
  • Derecho test failures

Note that there are three intentional differences from GEFS workflow configuration (please inform if you see other differences): 1) aerosols are 1-way coupled in diagnostic mode, 2) wave element mask has been modified as discussed in NOAA-EMC/WW3#1328. 3) ice restart has been quality controlled as discussed in #2562

In the future, depending on aerosol coupling, GOCART .rc files and ExtData directory structure may be revised for consistency with global-workflow. This benchmark configuration and case may be updated as well, particularly with GEFS reforecast or UFS case study.

TODO: Scripts need finalizing once filepaths are in shared space.
Input data is currently in user space on hera:
/scratch1/NCEPDEV/nems/Nick.Szapiro/tasks/input_data/gefs.v13/RT_GEFS/
It makes sense to organize 2021032506 and ExtData into a new @[INPUTDATA_ROOT]/GEFS/ subdirectory. And copy WW3/mesh.glo_025.nc and WW3/mod_def.glo_025 into @[INPUTDATA_ROOT_WW3] for new mesh.

Commit Message:

* UFSWM - Add GEFS regression test suite from EP5r2 configuration/case

Priority:

  • High: Intended to support GEFS reforecast

Git Tracking

UFSWM:

Sub component Pull Requests:

  • None

UFSWM Blocking Dependencies:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.

Input data Changes:

  • New input data.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

NickSzapiro-NOAA and others added 30 commits May 6, 2024 06:24
# decomposition gefs test
#

source ${PATHRT}/tests/cpld_control_gefs
Copy link
Collaborator

@DeniseWorthen DeniseWorthen Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty big departure from how tests are set up currently. The advantage of the current system is that you can just diff two tests and more easily see the differences from defaults. Here the non-default settings are one-step removed from the default_vars. The disadvantage w/ the current way we do is is that you have a lot of settings carried over. @BrianCurtis-NOAA do you have any comments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Maybe I should mention this more follows the ORTs:
https://github.com/ufs-community/ufs-weather-model/blob/develop/tests/opnReqTests/dcp.sh
but it's not clear current ORT setup is being/will be maintained

@NickSzapiro-NOAA
Copy link
Collaborator Author

@JessicaMeixner-NOAA The rectilinear WW3 mesh.glo_025.nc used here is new for the RTs. Maybe it would be better to put mesh and mod_def in WW3_input_data_20250114 with the rest?

I have the mesh.glo_025.nc (with modified element mask) and ww3_grid.inp.glo_025 on hera at /scratch1/NCEPDEV/nems/Nick.Szapiro/tasks/updateToEP5/WW3_inputdata/fix_2025-01

@JessicaMeixner-NOAA
Copy link
Collaborator

I just remembered today that I still haven't made those new grids for you to try. Sorry about that. I don't know when I'm going to have time because of the other bugs/issues - how urgent is that?

I have no objections to getting this added with everything else in the RTs. We should probably coordinate getting updated fix files in the global-workflow. The last I heard about that @sbanihash was waiting to coordinate with @bingfu-NOAA - They are the two you should probably coordinate getting this into rt.sh staged places with.

@NickSzapiro-NOAA
Copy link
Collaborator Author

@JessicaMeixner-NOAA I don't think testing of new grids is urgent. @sbanihash I had to re-make mod_def.glo_025 (now in /scratch1/NCEPDEV/nems/Nick.Szapiro/tasks/updateToEP5/WW3_inputdata/) for RTs to run again after WW3 update. I'm hopeful we can try to commit this PR next week

I currently have WW3 mesh and mod_def together with new GEFS input data but it makes more sense to keep with other WW3_input_data

@NickSzapiro-NOAA NickSzapiro-NOAA added New Input Data Req'd This PR requires new data to be sync across platforms New Baselines New baselines will be added to project. labels Feb 4, 2025
export WW3_MODDEF=mod_def.${WW3_DOMAIN}

# set component and coupling timesteps
export DT_ATMOS=300 #Reset bc of bug in export_ugwpv1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still required? The bug has been fixed, yes?

@@ -0,0 +1,13 @@
COMPILE | s2swa | intel | -DAPP=S2SWA -D32BIT=ON -DCCPP_SUITES=FV3_GFS_v17_coupled_p8_ugwpv1 | | fv3 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be sure this file isn't commited.

cp ${FV3_IC}/fv*.nc ./INPUT
cp ${FV3_IC}/sfc_data*.nc ./INPUT
cp ${FV3_IC}/phy_data*.nc ./INPUT
#cp ${FV3_IC}/coupler.res ./INPUT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clean up what isn't required (the commented sections)

@DeniseWorthen
Copy link
Collaborator

Is the table of failures in the PR still valid or have some been resolved/not?

@NickSzapiro-NOAA
Copy link
Collaborator Author

NickSzapiro-NOAA commented Feb 5, 2025

@DeniseWorthen Passes on control and restart are up-to-date. Splitting ESMF_MeshCreate in WW3 fixes dcp failures.

It would be nice to use one hash to complete HPC x ORT summary. Then I can gather results and use reproducers to follow up. @jkbk2004 Can I ask epic to help with stage/runs of suite_gefs.conf at your convenience? I can also run pre-tests.

Some fails on the RT summary table look like open issues. I believe work is in progress to resolve update conflicts in spack-stack. There is open issue on PPN variable on derecho

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Feb 5, 2025

@NickSzapiro-NOAA Thanks. I'm most concerned about the failure to pass on debug, after working on the dvice issue. Am I right---your failure is that it runs in debug mode, but does not reproduce?

Regarding the WW3 problem w/ "splitting" the mesh create, I still this this is un-explainable. Have you tried not using ESMF_THREADING=true?

@NickSzapiro-NOAA
Copy link
Collaborator Author

@DeniseWorthen Yes, intel debug runs but does not reproduce. More confusing is that it did reproduce itself on hera months ago. You're also right to question the ice ICs...no ice over land but yes phantom ice. In particular:

18 indices where (aicen<puny)*(vicen>0)
[[   0  905 1096]
 [   0  964  968]
 [   0  995  983]
 [   0 1076 1229]
 [   1   40  410]
 [   1   60  769]
 [   1   61  768]
 [   1  909 1108]
 [   2   40  410]
 [   2   60  769]
 [   2   61  768]
 [   3   40  410]
 [   3   60  769]
 [   3   61  768]
 [   4   40  410]
 [   4   60  769]
 [   4   61  768]
 [   4  716 1353]]
aicen:  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
vicen:  [1.06209860e-09 3.60893860e-10 4.53326883e-09 2.49121786e-07
 2.33271200e-06 2.10496975e-09 2.10497006e-09 7.65360689e-10
 5.68389138e-06 1.89713260e-09 1.89713288e-09 6.30358120e-06
 1.03750615e-09 1.03750630e-09 3.55266707e-06 8.75847901e-09
 8.75848028e-09 5.05040428e-06]
 
Also 10 indices where (aicen>0)*(vicen<puny). Note that hi>hi_min still for the ones here though
    icat nj  ni
[[   0  114  654]
 [   0  131  741]
 [   0  734  990]
 [   0  788  965]
 [   0  790  556]
 [   0  791  541]
 [   0  802  429]
 [   0  803  518]
 [   0  862  884]
 [   0 1078  804]]
aicen:  [1.24891504e-11 1.06729659e-11 3.23251880e-11 1.22020587e-11
 3.46434786e-10 1.28413583e-11 1.35487054e-11 2.40998011e-10
 1.54399031e-11 2.34170491e-11]
vicen:  [4.37275353e-12 1.94558977e-12 1.69867381e-12 2.45780992e-12
 3.46434786e-12 3.17485665e-13 7.80879222e-13 6.94311616e-12
 9.91289377e-12 4.93844678e-12]

Let me clean up and try another round

If I remember correctly, ESMF_THREADING=false didn't fix anything but I can re-check

@NickSzapiro-NOAA
Copy link
Collaborator Author

@DeniseWorthen As in your tests, the intel debug failures to reproduce are sensitive to phantom ice in the CICE restart file. I've tried a few flavors of your ncap2 QCs. For all QCs I tried, cpld_control_gefs, cpld_restart_gefs, cpld_dcp_gefs reproduce own baselines but still have reproducibility problems in gocart for cpld_debug_gefs

I'm hopeful WIP updates to spack-stack + MAPL/GOCART may help as well. I will continue on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPIC Support Requested New Baselines New baselines will be added to project. New Input Data Req'd This PR requires new data to be sync across platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update cpld_bmark_p8 with GEFSv13 EP5 configuration Add RT test for gocart_on, gccpp_on, nasa_on
10 participants