-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can't run coupled LND RTs on WCOSS2 #2598
Comments
@Hang-Lei-NOAA Would you please repeat the information you included in the email to @LarissaReames-NOAA so that we can track it here ? Thanks. |
I forward Dusan's email below. Please use his branch and run the
atmlnd case. It will work fine.
The libs are on cactus for verification for GDIT.
=====================
…---------- Forwarded message ---------
From: *Dusan Jovic* ***@***.***>
Date: Wed, Jan 29, 2025 at 4:17 PM
Subject: Re: [ufs-community/ufs-weather-model] Test MAPL v2.53.0 in UFS
weather model (Issue #2346)
To: ufs-community/ufs-weather-model ***@***.***>
Cc: Hang-Lei-NOAA ***@***.***>, Mention ***@***.***>
I used this branch for testing:
https://github.com/DusanJovic-NOAA/ufs-weather-model/tree/esmf880_mapl2530
module use
/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.12
module load esmf/8.8.0
module show mapl/2.53.0-esmf-8.8.0
/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.12/mapl/2.53.0-esmf-8.8.0.lua:
help([[]])
conflict("mapl")
setenv("MAPL_ROOT","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/mapl/2.53.0-esmf-8.8.0")
whatis("Name: mapl")
whatis("Version: 2.53.0-esmf-8.8.0")
whatis("Category: library")
whatis("Description: MAPL is a foundation layer of the GEOS architecture")
… <https://mail.google.com/mail/u/0/#m_2018785812781962973_>
Regression test passed on WCOSS2 (Cactus) using esmf/8.8.0 and
mapl/2.53.0-esmf-8.8.0 modules installed in this location.
On Fri, Feb 14, 2025 at 8:38 AM Denise Worthen ***@***.***> wrote:
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> Would you please repeat
the information you included in the email to @LarissaReames-NOAA
<https://github.com/LarissaReames-NOAA> so that we can track it here ?
Thanks.
—
Reply to this email directly, view it on GitHub
<#2598 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFGXJLR3MEHJJSPQ7ED2PXWTRAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJZGM2TSOBVGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: DeniseWorthen]*DeniseWorthen* left a comment
(ufs-community/ufs-weather-model#2598)
<#2598 (comment)>
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> Would you please repeat
the information you included in the email to @LarissaReames-NOAA
<https://github.com/LarissaReames-NOAA> so that we can track it here ?
Thanks.
—
Reply to this email directly, view it on GitHub
<#2598 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFGXJLR3MEHJJSPQ7ED2PXWTRAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJZGM2TSOBVGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Hang-Lei-NOAA Thanks. Just to be clear, you've confirmed that if you edit rt.conf and remove all the "-wcoss2" instances from the configurations using the LND component in rt.conf, that all tests run to completion? |
For me , I only did atmlnd test, and several gocart
Tests, separately. These were failed previously. Dusan runs the rest and
confirmed in the email.
Please make sure you are loading Esmf/8.8.0 and mapl/2.53.0.
…On Fri, Feb 14, 2025 at 9:16 AM Denise Worthen ***@***.***> wrote:
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> Thanks. Just to be
clear, you've confirmed that if you edit rt.conf and remove all the
"-wcoss2" instances from the configurations using the LND component in
rt.conf, that all tests run to completion?
—
Reply to this email directly, view it on GitHub
<#2598 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFBT2VBE2V5D4TKCAQL2PX3CVAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJZGQ2DONBYGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: DeniseWorthen]*DeniseWorthen* left a comment
(ufs-community/ufs-weather-model#2598)
<#2598 (comment)>
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> Thanks. Just to be
clear, you've confirmed that if you edit rt.conf and remove all the
"-wcoss2" instances from the configurations using the LND component in
rt.conf, that all tests run to completion?
—
Reply to this email directly, view it on GitHub
<#2598 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFBT2VBE2V5D4TKCAQL2PX3CVAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJZGQ2DONBYGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@DusanJovic-NOAA Could you please comment here? Were you able to turn on and run all the LND configs in rt.conf w/ your esmf/mapl branch? |
Yes, I confirm again. That the atmlnd test passed.
…On Fri, Feb 14, 2025 at 9:24 AM Denise Worthen ***@***.***> wrote:
@DusanJovic-NOAA <https://github.com/DusanJovic-NOAA> Could you please
comment here? Were you able to turn on and run all the LND configs in
rt.conf w/ your esmf/mapl branch?
—
Reply to this email directly, view it on GitHub
<#2598 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFCW5ZG6GNA6VSBEDGL2PX4AXAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJZGQ3DSNZUHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: DeniseWorthen]*DeniseWorthen* left a comment
(ufs-community/ufs-weather-model#2598)
<#2598 (comment)>
@DusanJovic-NOAA <https://github.com/DusanJovic-NOAA> Could you please
comment here? Were you able to turn on and run all the LND configs in
rt.conf w/ your esmf/mapl branch?
—
Reply to this email directly, view it on GitHub
<#2598 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFCW5ZG6GNA6VSBEDGL2PX4AXAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJZGQ3DSNZUHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
We need someone rerun the specific test again to confirm.
On Fri, Feb 14, 2025 at 9:24 AM Hang Lei - NOAA Affiliate ***@***.***>
wrote:
… Yes, I confirm again. That the atmlnd test passed.
On Fri, Feb 14, 2025 at 9:24 AM Denise Worthen ***@***.***>
wrote:
> @DusanJovic-NOAA <https://github.com/DusanJovic-NOAA> Could you please
> comment here? Were you able to turn on and run all the LND configs in
> rt.conf w/ your esmf/mapl branch?
>
> —
> Reply to this email directly, view it on GitHub
> <#2598 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AKWSMFCW5ZG6GNA6VSBEDGL2PX4AXAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJZGQ3DSNZUHA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
> [image: DeniseWorthen]*DeniseWorthen* left a comment
> (ufs-community/ufs-weather-model#2598)
> <#2598 (comment)>
>
> @DusanJovic-NOAA <https://github.com/DusanJovic-NOAA> Could you please
> comment here? Were you able to turn on and run all the LND configs in
> rt.conf w/ your esmf/mapl branch?
>
> —
> Reply to this email directly, view it on GitHub
> <#2598 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AKWSMFCW5ZG6GNA6VSBEDGL2PX4AXAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJZGQ3DSNZUHA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
These are what we need to confirm are now running on WCOSS2 using Dusan's branch:
|
I still see this error:
when I try to run I've updated my esmf880_mapl2530 branch with changes in |
@DusanJovic-NOAA It seems ESMF installation has still issue: |
I'm not sure if this explains the issue you're having, but I got this exact NetCDF error once ("NetCDF: Attempt to use feature that was not turned on when netCDF was built.") when attempting to open a CDF5 file (as opposed to NetCDF4) with a NetCDF nf90_open_par command using a NetCDF library that was not built with the |
All the files that the test should attempt to read are the same on all platforms, so I don't think it is an issue w/ the file type. |
I am in office now. The vpn has some issues to access using the public wireless today. I will check it as soon as I come back home office in the afternoon. |
I just did a run with Dusan's branch. But this time atmlnd ran over the
walltime.
/lfs/h2/emc/eib/noscrub/hang.lei/dusanufs/tests/logs/log_wcoss2
I will further diagnose and let you know. It seems that pnetcdf is not off
for atmlnd.
…On Fri, Feb 14, 2025 at 12:11 PM Denise Worthen ***@***.***> wrote:
All the files that the test should attempt to read are the same on all
platforms, so I don't think it is an issue w/ the file type.
—
Reply to this email directly, view it on GitHub
<#2598 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFGPE3QMC2476FJPHBL2PYPVPAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJZHA2TSMBVG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: DeniseWorthen]*DeniseWorthen* left a comment
(ufs-community/ufs-weather-model#2598)
<#2598 (comment)>
All the files that the test should attempt to read are the same on all
platforms, so I don't think it is an issue w/ the file type.
—
Reply to this email directly, view it on GitHub
<#2598 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFGPE3QMC2476FJPHBL2PYPVPAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJZHA2TSMBVG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Test hangs and eventually hits the wall clock limit, but it actually fails in the land component with the above error. |
At the beginning of PET200.ESMF_LogFile in my run I see:
looks like both PNETCDF and PIO are enabled. Files INPUT/oro_data.tile*.nc are HDF5 so pnetcdf is probably not an issue. |
The tricky thing here is that pnetcdf should be false to get the land run.
Denise can confirm.
The land model use hdf5 instead of pnetcdf in process data. The correct
situation should be that we install with pnetcdf but close it in the Esmf
to get land pass.
Even Esmf team did not give us an explanation on how to get this point.
…On Fri, Feb 14, 2025 at 2:56 PM Dusan Jovic ***@***.***> wrote:
At the beginning of PET200.ESMF_LogFile in my run I see:
$ more PET200.ESMF_LogFile
20250214 162128.331 INFO PET200 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.332 INFO PET200 !!! THE ESMF_LOG IS SET TO OUTPUT ALL LOG MESSAGES !!!
20250214 162128.332 INFO PET200 !!! THIS MAY CAUSE SLOWDOWN IN PERFORMANCE !!!
20250214 162128.332 INFO PET200 !!! FOR PRODUCTION RUNS, USE: !!!
20250214 162128.332 INFO PET200 !!! ESMF_LOGKIND_Multi_On_Error !!!
20250214 162128.332 INFO PET200 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.332 INFO PET200 Running with ESMF Version : v8.8.0
20250214 162128.332 INFO PET200 ESMF library build date/time: "Jan 13 2025" "18:47:31"
20250214 162128.332 INFO PET200 ESMF library build location : /lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/pkg/v8.8.0
20250214 162128.332 INFO PET200 ESMF_COMM : mpi
20250214 162128.332 INFO PET200 ESMF_MOAB : enabled
20250214 162128.332 INFO PET200 ESMF_LAPACK : enabled
20250214 162128.332 INFO PET200 ESMF_NETCDF : enabled
20250214 162128.332 INFO PET200 ESMF_PNETCDF : enabled
20250214 162128.332 INFO PET200 ESMF_PIO : enabled
20250214 162128.332 INFO PET200 ESMF_YAMLCPP : enabled
20250214 162128.332 INFO PET200 ESMF Profiling Enabled
20250214 162128.332 INFO PET200 ESMF Trace/Profile clock: REALTIME
20250214 162128.386 WARNING PET200 ESMF Tracing/Profiling could not load dynamic instrumentation functions.
20250214 162128.386 TRACE PET200 ESMF_LogSet() !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.386 TRACE PET200 ESMF_LogSet() !!! TRACING is disabled !!!
20250214 162128.386 TRACE PET200 ESMF_LogSet() !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.540 INFO PET200 ReadAttributes DRIVER_attributes:: start:
20250214 162128.540 INFO PET200 ReadAttributes DRIVER_attributes:: end:
20250214 162128.542 INFO PET200 ReadAttributes ALLCOMP_attributes:: start:
look like both PNETCDF and PIO are enabled. Files INPUT/oro_data.tile*.nc
are HDF5 so pnetcdf is probably not an issue.
—
Reply to this email directly, view it on GitHub
<#2598 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFB2N3UUB5J2G32SRY32PZC6HAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRQGE2TGMRSGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: DusanJovic-NOAA]*DusanJovic-NOAA* left a comment
(ufs-community/ufs-weather-model#2598)
<#2598 (comment)>
At the beginning of PET200.ESMF_LogFile in my run I see:
$ more PET200.ESMF_LogFile
20250214 162128.331 INFO PET200 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.332 INFO PET200 !!! THE ESMF_LOG IS SET TO OUTPUT ALL LOG MESSAGES !!!
20250214 162128.332 INFO PET200 !!! THIS MAY CAUSE SLOWDOWN IN PERFORMANCE !!!
20250214 162128.332 INFO PET200 !!! FOR PRODUCTION RUNS, USE: !!!
20250214 162128.332 INFO PET200 !!! ESMF_LOGKIND_Multi_On_Error !!!
20250214 162128.332 INFO PET200 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.332 INFO PET200 Running with ESMF Version : v8.8.0
20250214 162128.332 INFO PET200 ESMF library build date/time: "Jan 13 2025" "18:47:31"
20250214 162128.332 INFO PET200 ESMF library build location : /lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/pkg/v8.8.0
20250214 162128.332 INFO PET200 ESMF_COMM : mpi
20250214 162128.332 INFO PET200 ESMF_MOAB : enabled
20250214 162128.332 INFO PET200 ESMF_LAPACK : enabled
20250214 162128.332 INFO PET200 ESMF_NETCDF : enabled
20250214 162128.332 INFO PET200 ESMF_PNETCDF : enabled
20250214 162128.332 INFO PET200 ESMF_PIO : enabled
20250214 162128.332 INFO PET200 ESMF_YAMLCPP : enabled
20250214 162128.332 INFO PET200 ESMF Profiling Enabled
20250214 162128.332 INFO PET200 ESMF Trace/Profile clock: REALTIME
20250214 162128.386 WARNING PET200 ESMF Tracing/Profiling could not load dynamic instrumentation functions.
20250214 162128.386 TRACE PET200 ESMF_LogSet() !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.386 TRACE PET200 ESMF_LogSet() !!! TRACING is disabled !!!
20250214 162128.386 TRACE PET200 ESMF_LogSet() !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20250214 162128.540 INFO PET200 ReadAttributes DRIVER_attributes:: start:
20250214 162128.540 INFO PET200 ReadAttributes DRIVER_attributes:: end:
20250214 162128.542 INFO PET200 ReadAttributes ALLCOMP_attributes:: start:
look like both PNETCDF and PIO are enabled. Files INPUT/oro_data.tile*.nc
are HDF5 so pnetcdf is probably not an issue.
—
Reply to this email directly, view it on GitHub
<#2598 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFB2N3UUB5J2G32SRY32PZC6HAVCNFSM6AAAAABW3ISLRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRQGE2TGMRSGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Description
Coupled configurations which include FV3 and the LND component model cannot run on WCOSS2. The most recent PR impacted was #2387, which reports the same error seen earlier (see #2232, #2319).
The text was updated successfully, but these errors were encountered: