Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't run coupled LND RTs on WCOSS2 #2598

Open
DeniseWorthen opened this issue Feb 10, 2025 · 18 comments
Open

can't run coupled LND RTs on WCOSS2 #2598

DeniseWorthen opened this issue Feb 10, 2025 · 18 comments
Labels
bug Something isn't working

Comments

@DeniseWorthen
Copy link
Collaborator

Description

Coupled configurations which include FV3 and the LND component model cannot run on WCOSS2. The most recent PR impacted was #2387, which reports the same error seen earlier (see #2232, #2319).

PET200 (lnd_comp_io): (read_tiled_file) adding land_frac to FB
PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile1.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile2.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile3.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile4.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile5.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile6.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
PET200.ESMF_LogFile:20250127 170001.779 ERROR            PET200 ESMCI_PIO_Handler.C:617 ESMCI::PIO_Handler::arrayReadOne Unable to read from file  - file not open
PET200.ESMF_LogFile:20250127 170001.779 ERROR            PET200 ESMCI_IO_Handler.C:405 ESMCI::IO_Handler::arrayRead() Unable to read from file  - Internal subroutine call returned Error
PET200.ESMF_LogFile:20250127 170001.779 ERROR            PET200 ESMCI_IO.C:382 ESMCI::IO::read() Unable to read from file  - Internal subroutine call returned Error
PET200.ESMF_LogFile:20250127 170001.779 ERROR            PET200 ESMCI_IO.C:282 ESMCI::IO::read() Unable to read from file  - Internal subroutine call returned Error
PET200.ESMF_LogFile:20250127 170001.779 ERROR            PET200 ESMCI_IO_F.C:210 c_esmc_ioread() Unable to read from file  - Internal subroutine call returned Error
PET200.ESMF_LogFile:20250127 170001.779 ERROR            PET200 ESMF_IO.F90:397 ESMF_IOAddArray() Unable to read from file  - Internal subroutine call returned Error
PET200.ESMF_LogFile:20250127 170001.779 ERROR            PET200 ESMF_FieldBundle.F90:14436 ESMF_FieldBundleRead() Unable to read from file  - Internal subroutine call returned Error
PET200.ESMF_LogFile:20250127 170001.780 ERROR            PET200 lnd_comp_io.F90:869 Unable to read from file  - Passing error in return code
PET200.ESMF_LogFile:20250127 170001.780 ERROR            PET200 lnd_comp_domain.F90:220 Unable to read from file  - Passing error in return code
PET200.ESMF_LogFile:20250127 170001.780 ERROR            PET200 lnd_comp_nuopc.F90:425 Unable to read from file  - Passing error in return code
PET200.ESMF_LogFile:20250127 170001.780 ERROR            PET200 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2901 Unable to read from file  - Phase 'IPDv01p3' Initialize for modelComp 7: LND did not return ESMF_SUCCESS
PET200.ESMF_LogFile:20250127 170001.780 ERROR            PET200 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:1985 Unable to read from file  - Passing error in return code
PET200.ESMF_LogFile:20250127 170001.780 ERROR            PET200 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:489 Unable to read from file  - Passing error in return code
@DeniseWorthen DeniseWorthen added the bug Something isn't working label Feb 10, 2025
@DeniseWorthen
Copy link
Collaborator Author

@DeniseWorthen
Copy link
Collaborator Author

@Hang-Lei-NOAA Would you please repeat the information you included in the email to @LarissaReames-NOAA so that we can track it here ? Thanks.

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Feb 14, 2025 via email

@DeniseWorthen
Copy link
Collaborator Author

@Hang-Lei-NOAA Thanks. Just to be clear, you've confirmed that if you edit rt.conf and remove all the "-wcoss2" instances from the configurations using the LND component in rt.conf, that all tests run to completion?

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Feb 14, 2025 via email

@DeniseWorthen
Copy link
Collaborator Author

@DusanJovic-NOAA Could you please comment here? Were you able to turn on and run all the LND configs in rt.conf w/ your esmf/mapl branch?

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Feb 14, 2025 via email

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Feb 14, 2025 via email

@DeniseWorthen
Copy link
Collaborator Author

These are what we need to confirm are now running on WCOSS2 using Dusan's branch:

iff --git a/tests/rt.conf b/tests/rt.conf
index d164e31c..de386106 100644
--- a/tests/rt.conf
+++ b/tests/rt.conf
@@ -57,8 +57,8 @@ RUN | cpld_bmark_p8                                     | - s4 jet acorn noaaclo
 RUN | cpld_restart_bmark_p8                             | - s4 jet acorn noaacloud             |          | cpld_bmark_p8

 COMPILE | s2swal | intel | -DAPP=S2SWAL -DCCPP_SUITES=FV3_GFS_v17_coupled_p8,FV3_GFS_v17_coupled_p8_ugwpv1 | | fv3 |
-RUN | cpld_control_p8_lnd                               | - noaacloud wcoss2                   | baseline |
-RUN | cpld_restart_p8_lnd                               | - noaacloud wcoss2                   |          | cpld_control_p8_lnd
+RUN | cpld_control_p8_lnd                               | - noaacloud                          | baseline |
+RUN | cpld_restart_p8_lnd                               | - noaacloud                          |          | cpld_control_p8_lnd

 # Aerosol, no Wave
 RUN | cpld_s2sa_p8                                      | - noaacloud                          | baseline |
@@ -292,9 +292,9 @@ RUN | datm_cdeps_control_cfsr_faster                    | - wcoss2

 ### CDEPS Data Atmosphere tests with LND ###
 COMPILE | datm_cdeps_land | intel | -DAPP=LND | - wcoss2 | fv3 |
-RUN | datm_cdeps_lnd_gswp3                              | - wcoss2                             | baseline |
-RUN | datm_cdeps_lnd_era5                               | - wcoss2                             | baseline |
-RUN | datm_cdeps_lnd_era5_rst                           | - wcoss2 noaacloud                   |          | datm_cdeps_lnd_era5
+RUN | datm_cdeps_lnd_gswp3                              |                                      | baseline |
+RUN | datm_cdeps_lnd_era5                               |                                      | baseline |
+RUN | datm_cdeps_lnd_era5_rst                           | - noaacloud                          |          | datm_cdeps_lnd_era5

 ### CDEPS Data Atmosphere tests with LM4 ###
 COMPILE | datm_cdeps_lm4 | intel | -DAPP=LND-LM4 | + hera orion gaeac5 | fv3 |
@@ -309,11 +309,11 @@ RUN | atm_ds2s_docn_dice                                | - noaacloud wcoss2 aco

 ### ATM-LND tests ###
 COMPILE | atml | intel | -DAPP=ATML -DCCPP_SUITES=FV3_GFS_v16,FV3_GFS_v16_flake,FV3_GFS_v17_p8,FV3_GFS_v17_p8_rrtmgp,FV3_GFS_v15_thompson_mynn_lam3km,FV3_WoFS_v0,FV3_GFS_v17_p8_mynn,FV3_GFS_v17_p8_ugwpv1 -D32BIT=ON | | fv3 |
-RUN | control_p8_atmlnd                                 | - noaacloud wcoss2                   | baseline |
-RUN | control_restart_p8_atmlnd                         | - noaacloud wcoss2                   |          | control_p8_atmlnd
+RUN | control_p8_atmlnd                                 | - noaacloud                          | baseline |
+RUN | control_restart_p8_atmlnd                         | - noaacloud                          |          | control_p8_atmlnd

 COMPILE | atml_debug | intel | -DAPP=ATML -DCCPP_SUITES=FV3_GFS_v16,FV3_GFS_v16_flake,FV3_GFS_v17_p8,FV3_GFS_v17_p8_rrtmgp,FV3_GFS_v15_thompson_mynn_lam3km,FV3_WoFS_v0,FV3_GFS_v17_p8_mynn,FV3_GFS_v17_p8_ugwpv1 -D32BIT=ON -DDEBUG=ON | | fv3 |
-RUN | control_p8_atmlnd_debug                           | - noaacloud wcoss2                   | baseline |
+RUN | control_p8_atmlnd_debug                           | - noaacloud                          | baseline |

@DusanJovic-NOAA
Copy link
Collaborator

I still see this error:

20250214 162208.309 WARNING          PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile1.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20250214 162208.336 WARNING          PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile2.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20250214 162208.346 WARNING          PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile3.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20250214 162208.361 WARNING          PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile4.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20250214 162208.381 WARNING          PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile5.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20250214 162208.396 WARNING          PET200 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF  Unable to open existing file: INPUT/oro_data.tile6.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20250214 162208.396 ERROR            PET200 ESMCI_PIO_Handler.C:617 ESMCI::PIO_Handler::arrayReadOne Unable to read from file  - file not open
20250214 162208.396 ERROR            PET200 ESMCI_IO_Handler.C:405 ESMCI::IO_Handler::arrayRead() Unable to read from file  - Internal subroutine call returned Error
20250214 162208.396 ERROR            PET200 ESMCI_IO.C:382 ESMCI::IO::read() Unable to read from file  - Internal subroutine call returned Error
20250214 162208.396 ERROR            PET200 ESMCI_IO.C:282 ESMCI::IO::read() Unable to read from file  - Internal subroutine call returned Error
20250214 162208.396 ERROR            PET200 ESMCI_IO_F.C:210 c_esmc_ioread() Unable to read from file  - Internal subroutine call returned Error
20250214 162208.396 ERROR            PET200 ESMF_IO.F90:397 ESMF_IOAddArray() Unable to read from file  - Internal subroutine call returned Error
20250214 162208.396 ERROR            PET200 ESMF_FieldBundle.F90:14441 ESMF_FieldBundleRead() Unable to read from file  - Internal subroutine call returned Error
20250214 162208.396 ERROR            PET200 lnd_comp_io.F90:869 Unable to read from file  - Passing error in return code
20250214 162208.396 ERROR            PET200 lnd_comp_domain.F90:220 Unable to read from file  - Passing error in return code
20250214 162208.396 ERROR            PET200 lnd_comp_nuopc.F90:425 Unable to read from file  - Passing error in return code
20250214 162208.396 ERROR            PET200 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2918 Unable to read from file  - Phase 'IPDv01p3' Initialize for modelComp 3: LND did not return ESMF_SUCCESS
20250214 162208.396 ERROR            PET200 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2001 Unable to read from file  - Passing error in return code
20250214 162208.396 ERROR            PET200 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:489 Unable to read from file  - Passing error in return code
20250214 162208.396 ERROR            PET200 UFS.F90:397 Unable to read from file  - Aborting UFS

when I try to run control_p8_atmlnd_intel test on Cactus.

I've updated my esmf880_mapl2530 branch with changes in modulefiles/ufs_wcoss2.intel.lua to load Hang's libraries. @Hang-Lei-NOAA please take a look and confirm that my changes are correct.

@uturuncoglu
Copy link
Collaborator

@DusanJovic-NOAA It seems ESMF installation has still issue: PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.

@LarissaReames-NOAA
Copy link
Collaborator

I'm not sure if this explains the issue you're having, but I got this exact NetCDF error once ("NetCDF: Attempt to use feature that was not turned on when netCDF was built.") when attempting to open a CDF5 file (as opposed to NetCDF4) with a NetCDF nf90_open_par command using a NetCDF library that was not built with the --enable-pnetcdf option.

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Feb 14, 2025

All the files that the test should attempt to read are the same on all platforms, so I don't think it is an issue w/ the file type.

@Hang-Lei-NOAA
Copy link

I am in office now. The vpn has some issues to access using the public wireless today. I will check it as soon as I come back home office in the afternoon.

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Feb 14, 2025 via email

@DusanJovic-NOAA
Copy link
Collaborator

Test hangs and eventually hits the wall clock limit, but it actually fails in the land component with the above error.

@DusanJovic-NOAA
Copy link
Collaborator

DusanJovic-NOAA commented Feb 14, 2025

At the beginning of PET200.ESMF_LogFile in my run I see:

$ more PET200.ESMF_LogFile 
20250214 162128.331 INFO             PET200 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.332 INFO             PET200 !!! THE ESMF_LOG IS SET TO OUTPUT ALL LOG MESSAGES !!!
20250214 162128.332 INFO             PET200 !!!     THIS MAY CAUSE SLOWDOWN IN PERFORMANCE     !!!
20250214 162128.332 INFO             PET200 !!! FOR PRODUCTION RUNS, USE:                      !!!
20250214 162128.332 INFO             PET200 !!!                   ESMF_LOGKIND_Multi_On_Error  !!!
20250214 162128.332 INFO             PET200 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.332 INFO             PET200 Running with ESMF Version   : v8.8.0
20250214 162128.332 INFO             PET200 ESMF library build date/time: "Jan 13 2025" "18:47:31"
20250214 162128.332 INFO             PET200 ESMF library build location : /lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/pkg/v8.8.0
20250214 162128.332 INFO             PET200 ESMF_COMM                   : mpi
20250214 162128.332 INFO             PET200 ESMF_MOAB                   : enabled
20250214 162128.332 INFO             PET200 ESMF_LAPACK                 : enabled
20250214 162128.332 INFO             PET200 ESMF_NETCDF                 : enabled
20250214 162128.332 INFO             PET200 ESMF_PNETCDF                : enabled
20250214 162128.332 INFO             PET200 ESMF_PIO                    : enabled
20250214 162128.332 INFO             PET200 ESMF_YAMLCPP                : enabled
20250214 162128.332 INFO             PET200 ESMF Profiling Enabled
20250214 162128.332 INFO             PET200 ESMF Trace/Profile clock: REALTIME
20250214 162128.386 WARNING          PET200 ESMF Tracing/Profiling could not load dynamic instrumentation functions.
20250214 162128.386 TRACE            PET200  ESMF_LogSet() !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.386 TRACE            PET200  ESMF_LogSet() !!!       TRACING is disabled         !!!
20250214 162128.386 TRACE            PET200  ESMF_LogSet() !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.540 INFO             PET200 ReadAttributes DRIVER_attributes:: start:
20250214 162128.540 INFO             PET200 ReadAttributes DRIVER_attributes:: end:
20250214 162128.542 INFO             PET200 ReadAttributes ALLCOMP_attributes:: start:

looks like both PNETCDF and PIO are enabled. Files INPUT/oro_data.tile*.nc are HDF5 so pnetcdf is probably not an issue.

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Feb 14, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants