Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SWE repartitioning for CLM #313

Merged
merged 179 commits into from
Feb 3, 2022
Merged

Add SWE repartitioning for CLM #313

merged 179 commits into from
Feb 3, 2022

Conversation

braczka
Copy link
Contributor

@braczka braczka commented Nov 22, 2021

Description:

This adds the capability to repartition snow layer mass and dimensional variables. The repartitioning assures that the adjustments applied to the snow layers are consistent (mass and dimensions are conserved) with the adjustment of the total snow water equivalent. This also resolves the weakness where if a snow layer does not exist for all ensemble members, it is not adjusted for all ensemble members. We allow the user to choose the SWE repartitioning approach within the dart_to_clm namelist within SWE_repartition. A value of '0' provides the previous, default behavior of CLM-DART, a value of '1' repartitions snow to all active snow layers, and a value of '2' repartitions snow to the bottom snow layer only.

This SWE_repartition_fix branch began from clm_main at commit 337686f (8/27/21). Concurrent changes of the clm_main branch have occurred during this time by @hkershaw-brown. I made an attempt to merge the SWE_repartition_fix with the current clm_main at commit a741218 (10/19). This led to expected behavior for all variables except for the clm variable T_SOISNO (snow/soil temperature) which was updated to unrealistic large values for certain snow layers. This led to mass balance errors in CLM. At the suggestion of Helen I reverted the changes to SWE_repartition_fix to the last working version before that attempted merge with clm_main. File changes directly related to adding new SWE repartitioning capability as part of this PR include:

~/models/clm/dart_to_clm.f90
~/models/clm/dart_to_clm.rst
~/models/clm/readme.rst
~/models/clm/dart_to_clm.nml
~/models/clm/work/input.nml
~/models/clm/shell_scripts/cesm2.2/assimilate.csh

Fixes issue

This directly addresses github issue #253 , updating snow in CLM.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Documentation changes needed?

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Tests

Performed perfect data experiment of CLM5-DART across global domain with daily assimilation across 7 day period. See case /glade/u/home/bmraczka/cases/cesm2.2.0/clm5_SWE1_fix as an example of setup including setup scripts of CLM5_setup_assimilation.original and input.nml. Performed this test for all 3 SWE repartitioning options (clm5_SWE0_fix, clm5_SWE1_fix, clm5_SWE2_fix) that demonstrated: 1) the default implementation (no repartitioning) provides identical bitwise behavior to previous working clm_main version, 2) implementation of repartitioning schemes were mathematically sound (mass/dimensions of snow layers conserved), and 3) Update of CLM restart file is compatible with all internal CLM checks.

I intend to perform further scientific testing of repartitioning script across Western United States domain for Utah
NASA CMS collaboration -- mainly intended to explore impact of SWE repartition options upon soil moisture and surface albedo impacts.

Checklist for merging

  • Updated changelog entry
  • Documentation updated
  • Version tag

Additional changes have to occur before this PR is ready to merge.

Testing Datasets

  • Dataset needed for testing available upon request
  • Dataset download instructions included
  • No dataset needed

For an example set of CLM files for testing dart_to_clm functionality see: /glade/scratch/bmraczka/cesm2.2.0/clm5_SWE_PR/run

dart_to_clm_input_file = 'clm2_0001.r.2011-01-02-00000.nc.original'
dart_to_clm_output_file = 'clm5_SWE_PR.clm2_0001.r.2011-01-02-00000.nc.original'
repartition_vhist_file = 'clm5_SWE_PR.clm2_0001.h2.2011-01-02-00000.nc'
repartition_analysis_file = 'clm_analysis_member_0001_d03.2011-01-02-00000.nc'

timhoar added 30 commits May 20, 2021 09:52
If a variable has both a _FillValue attribute and a missing_value attribute,
make sure the values are identical or we cannot support it with a single DART missing value.
Handle _FillValue and missing_value separately because variables may have one or the other but not both. Adding a couple trivial unrelated changes not worthy of their own pull request,
removing unused variables, making a hardcoded filename a parameter, that sort of thing.
The ultimate goal for models that support missing values is to read
the variables and replace both _FillValue and missing_value items
with the DART missing value. As long as r8 /= r4, there is no loss of
precision when writing BACK to the original files. If DART is built
with r8 = r4, and it reads from a NF90_DOUBLE and writes to a NF90_DOUBLE,
there may be trouble ...
The game plan is to ultimately replace the DART missing value
with whatever is originally in the file with the missing values
in a post-processing 'dart_to_model' program.
Basically, this replaces the use of the progvar structure with the
state_structure. There are some bits in the static_init_model
that have been moved to subroutines to improve readability.
clm_to_dart.f90 is needed to reliably replace the CLM variables
that have missing_value and/or _FillValue attributes but do not actuall
use them. This happens in the restart files for the snow layers, for example.
This is a standalone program and replaces bogus values in layers of snow
with the _FillValue value. It replaces the values in-place - i.e. it
modifies the input file. To use it, you should COPY the clm restart file
to the desired input filename. That filename should be the source of the
variables used in pmo or filter or ...
Explores what happens to related snow variables when there is 'no snow'
according to SNLSNO. T_SOISNO in the layer closest to the ground still
has values even if there is no trace of snow (H2OSNO = 0.0). This seems
like a problem with CLM logic. frac_sno seems like a reliable indicator
to prevent replacing values in the layer thought to be 'empty'.
Using 'simple table' format for namelist which allows specification
of newlines in table cells.
Described in https://rest-sphinx-memo.readthedocs.io/en/latest/ReST.html#tables
levsoi and levdcmp are new dimensions ... and carbon/nitrogen vars use
them, so we need them.
The get_grid_vertval() routine still needs work to fully support
the different vertical coordinates.
Replacing the 'kind' nomenclature when reporting QTY values.
The vector history files have variables declared with the column varying fastest,
i.e. float SOIL1N_vr(time, levdcmp, column) ;
while the other vector format file (the restart) always has vertical varying fastest,
i.e. double H2OSOI_LIQ(column, levtot) ;
That makes a difference when creating the 'model_size' metadata arrays.
Using the netcdf_utilities_mod routines makes it a lot clearer.
get_grid_vertval() still needs work, as does dart_to_clm.f90
The H2OSOI variable (QTY_SOIL_MOISTURE) is currently throwing
a model_interpolate() error ... the COSMOS_NEUTRON_INTENSITY obs
are exercising that FO.  SOIL_MOISTURE is a particularly ugly
interpolation - will deal with that later.

SMINN_vr(time,levsoi,lat,lon)  also uses the 'levsoi' coordinates.
Has the variables necessary for several of the observations in the obs_seq.in
Avoiding soil moisture for now.
The exit from the case statement was in a funny place and probably
not doing what it should have been doing. The code did the right
thing, it just checked extra dimensions.
The dart_to_clm.f90 still needs support for the repartitioning of snow.
Merge remote-tracking branch 'upstream/clm_update_2' into clm_update_2
braczka and others added 4 commits January 10, 2022 16:43
Using integer in repartition_swe >0

Co-authored-by: hkershaw-brown <[email protected]>
Using integer in repartition_swe >0

Co-authored-by: hkershaw-brown <[email protected]>
@braczka
Copy link
Contributor Author

braczka commented Jan 11, 2022

My latest batch of commits are associated with the observation converters mostly used with clm, that never got migrated to the the DART repository during the prior clm_update PRs. These commits include soil moisture and leaf area observations converters coming from NASA (NASA_Earthdata folder) and NSIDC (NSIDC folder). I updated the links to which acquire the raw netcdf or hdf5 data files, tested functionality, and created documentation for each, which was either sparse or non-existent. Changes to HDF5_utilities_mod.f90, obs_seq_utilities.f90, EASE_utilities_mod.f90 were needed to support the obs converters within the NASA_Earthdata and NSIDC folders.

Remove SMAP_L2_to_obs in  NASA_Earthdata doc

Co-authored-by: hkershaw-brown <[email protected]>
@braczka
Copy link
Contributor Author

braczka commented Jan 19, 2022

@hkershaw-brown I re-tested this current version of the branch for all 3 snow re-partitioning schemes and for the most part it looks pretty good, after I added the allow_missing_clm=.true. option within &filter_nml. I am not sure if it is a concern that it is not bit-by-bit identical to a previous version that I have run for the CLM5_setup_assimilation toy script caserun. It starts out identical for the first couple assimilation time steps, then starts to deviate for all snow adjusted CLM variables. Did we add/remove some precision in the DART scripting with these recent updates? Not sure this is expected behavior, and realize sometimes getting bit-by-bit reproducibility can be messy.......

@hkershaw-brown
Copy link
Member

@hkershaw-brown I re-tested this current version of the branch for all 3 snow re-partitioning schemes and for the most part it looks pretty good, after I added the allow_missing_clm=.true. option within &filter_nml. I am not sure if it is a concern that it is not bit-by-bit identical to a previous version that I have run for the CLM5_setup_assimilation toy script caserun. It starts out identical for the first couple assimilation time steps, then starts to deviate for all snow adjusted CLM variables. Did we add/remove some precision in the DART scripting with these recent updates? Not sure this is expected behavior, and realize sometimes getting bit-by-bit reproducibility can be messy.......

Let's check this out.

I'm not sure if CLM is always bit-for-bit idenical, but DART should be.
There is a small bug fix that is not in the clm-swe_pre-release tag, which is io_filenames_mod.f90 should be checking _FillValue not FillValue.

diff --git a/assimilation_code/modules/io/io_filenames_mod.f90 b/assimilation_code/modules/io/io_filenames_mod.f90
index a3a777a15..632cbb3d6 100644
--- a/assimilation_code/modules/io/io_filenames_mod.f90
+++ b/assimilation_code/modules/io/io_filenames_mod.f90
@@ -661,15 +661,15 @@ if ( get_has_FillValue(   domid, varid) ) then
    select case (get_xtype(domid, varid))
       case (NF90_INT)
          call get_FillValue(domid, varid, spvalINT)
-         call check_attribute_value_int(ncFile, filename, ncVarID, 'FillValue', spvalINT)
+         call check_attribute_value_int(ncFile, filename, ncVarID, '_FillValue', spvalINT)
 
       case (NF90_FLOAT)
          call get_FillValue(domid, varid, spvalR4)
-         call check_attribute_value_r4(ncFile, filename, ncVarID, 'FillValue', spvalR4)
+         call check_attribute_value_r4(ncFile, filename, ncVarID, '_FillValue', spvalR4)
 
       case (NF90_DOUBLE)
          call get_FillValue(domid, varid, spvalR8)
-         call check_attribute_value_r8(ncFile, filename, ncVarID, 'FillValue', spvalR8)
+         call check_attribute_value_r8(ncFile, filename, ncVarID, '_FillValue', spvalR8)

I'm wondering if this is the difference between your old and new runs. But it could be something else.
Do you mind if I grab the files you are running with? I just need which directory they are in Cheyenne.

Cheers,
Helen

@braczka
Copy link
Contributor Author

braczka commented Jan 19, 2022

@hkershaw-brown I re-tested this current version of the branch for all 3 snow re-partitioning schemes and for the most part it looks pretty good, after I added the allow_missing_clm=.true. option within &filter_nml. I am not sure if it is a concern that it is not bit-by-bit identical to a previous version that I have run for the CLM5_setup_assimilation toy script caserun. It starts out identical for the first couple assimilation time steps, then starts to deviate for all snow adjusted CLM variables. Did we add/remove some precision in the DART scripting with these recent updates? Not sure this is expected behavior, and realize sometimes getting bit-by-bit reproducibility can be messy.......

Let's check this out.

I'm not sure if CLM is always bit-for-bit idenical, but DART should be. There is a small bug fix that is not in the clm-swe_pre-release tag, which is io_filenames_mod.f90 should be checking _FillValue not FillValue.

diff --git a/assimilation_code/modules/io/io_filenames_mod.f90 b/assimilation_code/modules/io/io_filenames_mod.f90
index a3a777a15..632cbb3d6 100644
--- a/assimilation_code/modules/io/io_filenames_mod.f90
+++ b/assimilation_code/modules/io/io_filenames_mod.f90
@@ -661,15 +661,15 @@ if ( get_has_FillValue(   domid, varid) ) then
    select case (get_xtype(domid, varid))
       case (NF90_INT)
          call get_FillValue(domid, varid, spvalINT)
-         call check_attribute_value_int(ncFile, filename, ncVarID, 'FillValue', spvalINT)
+         call check_attribute_value_int(ncFile, filename, ncVarID, '_FillValue', spvalINT)
 
       case (NF90_FLOAT)
          call get_FillValue(domid, varid, spvalR4)
-         call check_attribute_value_r4(ncFile, filename, ncVarID, 'FillValue', spvalR4)
+         call check_attribute_value_r4(ncFile, filename, ncVarID, '_FillValue', spvalR4)
 
       case (NF90_DOUBLE)
          call get_FillValue(domid, varid, spvalR8)
-         call check_attribute_value_r8(ncFile, filename, ncVarID, 'FillValue', spvalR8)
+         call check_attribute_value_r8(ncFile, filename, ncVarID, '_FillValue', spvalR8)

I'm wondering if this is the difference between your old and new runs. But it could be something else. Do you mind if I grab the files you are running with? I just need which directory they are in Cheyenne.

Cheers, Helen

Sure -- I am currently running with the files within /glade/work/bmraczka/DART. This should be identical, (nearly identical) to the upstream files in NCAR/DART branch SWE_repartition_fix. The directories for the newer case runs are located at /glade/u/home/bmraczka/cases/cesm2.2.0/clm5_SWE0_final, clm5_SWE1_final and clm5_SWE2_final. The run folders for each case respectively are located within /glade/scratch/bmraczka/cesm2.2.0. I have been comparing the output files against the older cases: clm5_SWE0_fix, clm5_SWE1_fix and clm5_SWE2_fix for benchmarking. These older test cases were generated with the clm-swe_pre-release tag.

I have diffused the netcdf restart files for each companion case (the new and older versions) for variable H2OSOI_ICE. Haven't really checked the other variables, but assume the same loss/gain in precision occurs across all variables. This issue is slightly different than previous benchmark testing with DART-CLM in which there were immediate differences with the first set (first time step) of CLM output files -- suggesting CLM was the culprit. In this situation, the first couple of time steps (with assimilation) are identical (as far as I can tell), and differences start to creep in at the 2nd time step after the DART update. This suggests perhaps DART is introducing the differences....?

@braczka
Copy link
Contributor Author

braczka commented Jan 26, 2022

Odds and ends update:

The 156dca3 commit updates the syntax fix to replace 'nlon' with 'lon' with model_mod.f90. This was not causing a problem with any of my recent assimilation run tests -- but this is indeed a typo as brought to our attention from the E3SM-DART user today.

In reference to the reproducibility of SWE_repartition_fix, the slight differences in clm variables are pervasive -- not just the snow related variables that start to deviate from the tag clm-swe_pre-release, but all other variables that I tested as well (e.g. biomass). The deviation in behavior always occurs for portions of the variable state that are updated by DART, suggesting a difference in the DART code, or the way it was compiled. Below are my DART compiler options on Cheyenne:

MPIFC = mpif90
MPILD = mpif90
FC = ifort
LD = ifort

INCS = -I$(NETCDF)/include
LIBS = -L$(NETCDF)/lib -lnetcdff -lnetcdf
FFLAGS = -O -assume buffered_io $(INCS)
LDFLAGS = $(FFLAGS) $(LIBS)

I am going to re-run the test clm5_setup_assimilation with the tag clm-swe_pre-release one more time to see if I can match the behavior. I also plan to re-run the CLM5-DART Western US benchmark simulation with SWE_reparition_fix. This is probably overkill, but won't take too long to check.

@braczka
Copy link
Contributor Author

braczka commented Feb 2, 2022

The CLM5-DART Western US benchmark simulation with SWE_repartition_fix has been completed and is nearly identical to the previous benchmark simulation using the tag clm-swe_pre-release. This is ready to merge as far as I am concerned and am pulling it off draft.

Out of curiosity, I performed some tests running CLM-DART assimilations, with CLM in 'branch' mode (bit by bit reproducible for the start of CLM runs) instead of 'hybrid' mode which is not bit by bit. This didn't explain some of the small differences I was seeing in reproducibility, but doesn't effect the readiness of this PR as far as I am concerned.

@braczka braczka marked this pull request as ready for review February 2, 2022 22:35
Copy link
Member

@hkershaw-brown hkershaw-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

I'll bring this up to date with main (it is just the inflation namelist bug-fix)and bump the version before merging.

@hkershaw-brown hkershaw-brown merged commit c4de64f into main Feb 3, 2022
@hkershaw-brown hkershaw-brown deleted the SWE_repartition_fix branch February 4, 2022 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants