EAMxx: allow multi-output diags #6935

mahf708 · 2025-01-23T15:02:38Z

currently, our AtmoshereDiagnostic class allows for only one field to be output by design; this was the intended use when we first started supporting these diagnostics. For example, currently, a diagnostic can be seen as output1 = function ( { input1, input2, ... } ). This PR allows additional outputs, for example { output1, output2, ...} = = function ( { input1, input2, ... } ).

more design details to follow...

[bfb]

bartgol

This is a good start, but of course, there's more to do. I would add more beef before merging, as it doesn't really add anything (nor does much that we can test) until then.

Here's what I think should be added:

in IO, we need to build diags smartly. Right now, we build one diag object for each output field, but if one obj can compute 2+, we should not build 2+ copies of the same diag. This requires IO to do 2 passes when building diags: the first pass to figure out which diags to build, along with all the input params, and the second to actually build the diags)
I would convert one of the multi-purpose diags to allow computing N fields. E.g., the number path or water path diags, since they are very simple. Then, we can test that they work fine when computing 1 or 2+ fields.
Diag like NumberPath will now have to change their param list organization. E.g., that diag now checks the param Number Kind, to figure out which one to compute. Instead, it should now have 3 inputs, and check them separately. E.g., it should check something like Compute Liq, Compute Ice, and Compute Rain.
We currently use name() from the diag as a proxy for the diagnostic output name. With 2+ fields, this is no longer the case, so we must ensure that we DON'T use that string to refer to the field. Instead, the name() method should refer to the class itself.

For the first point, I envision something like this:

map<string,ParameterList> params;
for (f in diag_fields) {
  diag_factory_key = find_key_for(f);  // this is the key to pass to the diag factory.
  params[diag_factory_key].set(...);
}

for (f in diag_field) {
  diag_factory_key = find_key_for(f);
  create_diag(diag_factory_key,params[diag_factory_key])
}

I didn't do a great job explaining, but basically, the 1st loop is neede b/c different fields will set multiple params for the same diag class (e.g., the Compute Liq/Compute Ice/Compute Rain mentioned above). Only after ALL params have been set, we can proceed to create the diags.

mahf708 · 2025-01-23T16:53:21Z

as it doesn't really add anything (nor does much that we can test) until then.

I would still want to ensure this minor addition isn't breaking some corner test. Could you run the CI when you have a moment?

mahf708 · 2025-01-23T17:00:30Z

We currently use name() from the diag as a proxy for the diagnostic output name. With 2+ fields, this is no longer the case, so we must ensure that we DON'T use that string to refer to the field. Instead, the name() method should refer to the class itself.

I agree with everything, but I am not sure about this one. Let me try to think about it. My goal is to keep edits as noninvasive as possible. However, this part is more or less necessary if we want to compute diags smartly. Let me think more carefully

bartgol · 2025-01-23T19:08:52Z

We currently use name() from the diag as a proxy for the diagnostic output name. With 2+ fields, this is no longer the case, so we must ensure that we DON'T use that string to refer to the field. Instead, the name() method should refer to the class itself.

I agree with everything, but I am not sure about this one. Let me try to think about it. My goal is to keep edits as noninvasive as possible. However, this part is more or less necessary if we want to compute diags smartly. Let me think more carefully

The single-output diags can still use name() internally when creating the diag, I wasn't hinting at that. I just want to make sure that from the outside we are not using AtmosphereDiagnostic::name() and the actual diag field name as they were the same thing.

tcclevenger · 2025-01-24T17:46:19Z

@bartgol I'm confused why your approval doesn't allow the testing to move forward

mahf708 · 2025-01-24T17:59:21Z

@bartgol I'm confused why your approval doesn't allow the testing to move forward

No, don't worry about the approval. This is NOT ready to be merged. I am going to address Luca's comments.

I think Luca approved so that the CI runs.

dismissing review since more work is needed. Will re-request review after I address the mods needed.

mahf708 · 2025-01-26T20:56:31Z

@bartgol PTAL; there's very likely something I am missing on the IO side, especially surrounding the name() part.

A few notes on design choices (feel free to push back or request drastic changes or suggest improvements)

I want us to unify the params for these diags; for multi-out diags, I think we should just force the customer to pass a std::vector of "auxiliary params" they want out (i.e., no flexibility with names such as "temperature kind" or "water kind"), and it is up to the diag impl to interpret that. That will simplify the logic in the IO. In the future, we should move the complex regex to to the diag impl (e.g., the field_at stuff shouldn't belong in IO, it should belong in diag impl)
I created a set_diagnostic functionlaity (similar to create_diagnostic) to differentiate single- vs multi-output diagnostic; this is temporary until we feel comfortable with what we have in terms of multi-out stuff; in the future, we should just phase out single-output diagnostics entirely (after all, they're the trivial case of multi-output diagnostics)
Basic testing passes (sp on pm-gpu) but I will need your help with any issues surrounding the index of m_diagnostics (i.e., the name() shenangians)

In general, I think we should limit functionality to what we want to support and not leave it open-ended up and fragile. Adding Aaron to review as well, since he might be interested in weighing in.

bartgol

Submitting some comments. The biggest concern is the comment in scorpio_output.cpp. Then the offline comment about the create_diagnostic utility.

bartgol · 2025-01-27T16:39:30Z

components/eamxx/src/diagnostics/number_path.cpp

-        "  - valid values: Liq, Ice, Rain\n");
+  m_kinds = m_params.get<std::vector<std::string>>("Number Kinds");
+  // check if "Liq" or "Ice" or "Rain" is in m_kinds
+  EKAT_REQUIRE_MSG(


Imho, this check is a bit redundant with the one below. In the for loop below, you already throw for anything other than "Liq", "Ice", and "Rain". So maybe (in the spirit of DRY) here you could just check that m_kinds.size()>0, deferring to the for loop for the validity of the values.

bartgol · 2025-01-27T17:39:24Z

components/eamxx/src/diagnostics/tests/number_paths_tests.cpp

@@ -196,6 +210,9 @@ void run(std::mt19937_64 &engine) {
        REQUIRE(std::abs(rnp_h(icol) - qndr_prod) < macheps);
      }
    }
+    for(int icol = 0; icol < ncols; icol++) {


What about checking that the 3 fields in the cnp diag match exactly the fields in each of the lnp/inp/rnp diagnostics? You can prob just use views_are_equal on the fields, and be done.

bartgol · 2025-01-27T17:41:48Z

components/eamxx/src/share/io/scorpio_output.cpp

@@ -1268,7 +1269,7 @@ compute_diagnostic(const std::string& name, const bool allow_invalid_fields)
    for (auto f : diag->get_fields_in()) {
      if (not f.get_header().get_tracking().get_time_stamp().is_valid()) {
        // Fill diag with invalid data and return
-        diag->get_diagnostic().deep_copy(m_fill_value);


Can't find another place to add this comment, so I add it here.

A few lines above we do m_diag_computed[name] = true;. This will ensure that we don't recompute name if another diag has it as a dependency. We need to modify that line, so that it sets the value to true for ALL fields computed by diag.

bartgol · 2025-01-27T17:46:40Z

components/eamxx/src/share/io/scorpio_output.cpp

+  scream::set_diagnostic(all_diag_list, single_diag_list, multi_diag_list,
+                         diag_map);
+
+  for(const auto &name : single_diag_list) {


I don't see why you need to distinguish between single- and multiple-output diagnostics. The two loops below are identical, so you could just loop over all the diags in one loop, no?

Agree; i just lazily didn't create a combined list ... the one above contains redundancies

I guess you could see how this was developing in my head but never reached the finished line ... so duplication and such are consequences. Will fix

bartgol · 2025-01-27T17:58:41Z

components/eamxx/src/share/io/scream_io_utils.cpp

-  std::regex pot_temp ("(Liq)?PotentialTemperature$");
-  std::regex vert_layer ("(z|geopotential|height)_(mid|int)$");
-  std::regex horiz_avg ("([A-Za-z0-9_]+)_horiz_avg$");
+void set_diagnostic(const std::vector<std::string> &all_diag_list,


Naser and I had a private conversation on slack. Too long to report here, just noting that we are discussing different takes on how to get a list of diags to create.

mahf708 added enhancement BFB PR leaves answers BFB EAMxx PRs focused on capabilities for EAMxx labels Jan 23, 2025

mahf708 requested a review from bartgol January 23, 2025 15:02

mahf708 added the wip work in progress label Jan 23, 2025

bartgol reviewed Jan 23, 2025

View reviewed changes

bartgol previously approved these changes Jan 23, 2025

View reviewed changes

mahf708 changed the title ~~[WIP] EAMxx: allow multi-output diags~~ EAMxx: allow multi-output diags Jan 24, 2025

mahf708 removed the wip work in progress label Jan 24, 2025

mahf708 mentioned this pull request Jan 24, 2025

EAMxx: PBL entrainment budget diags #6923

Draft

rljacob assigned tcclevenger Jan 24, 2025

mahf708 added 2 commits January 26, 2025 09:21

EAMxx: skeleton to allow multi-output diags

e20ea43

EAMxx: convert numberpath diag to be multi-output

40a88b7

mahf708 force-pushed the mahf708/eamxx/multi-output-diagnostics branch from 54ca49a to 40a88b7 Compare January 26, 2025 17:21

EAMxx: split diags creation in IO

c477c30

mahf708 force-pushed the mahf708/eamxx/multi-output-diagnostics branch from 9a476f1 to c477c30 Compare January 26, 2025 20:48

mahf708 requested a review from bartgol January 26, 2025 20:49

mahf708 requested a review from AaronDonahue January 26, 2025 20:56

EAMxx: fix number path multi-out impl

68188d0

mahf708 force-pushed the mahf708/eamxx/multi-output-diagnostics branch from b5d627c to 68188d0 Compare January 27, 2025 16:09

bartgol reviewed Jan 27, 2025

View reviewed changes

mahf708 marked this pull request as draft February 1, 2025 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EAMxx: allow multi-output diags #6935

EAMxx: allow multi-output diags #6935

mahf708 commented Jan 23, 2025 •

edited

Loading

bartgol left a comment •

edited by mahf708

Loading

mahf708 commented Jan 23, 2025

mahf708 commented Jan 23, 2025

bartgol commented Jan 23, 2025

tcclevenger commented Jan 24, 2025

mahf708 commented Jan 24, 2025

mahf708 commented Jan 26, 2025 •

edited

Loading

bartgol left a comment

bartgol Jan 27, 2025

bartgol Jan 27, 2025

bartgol Jan 27, 2025

bartgol Jan 27, 2025

mahf708 Jan 27, 2025

mahf708 Jan 27, 2025

bartgol Jan 27, 2025

EAMxx: allow multi-output diags #6935

Are you sure you want to change the base?

EAMxx: allow multi-output diags #6935

Conversation

mahf708 commented Jan 23, 2025 • edited Loading

bartgol left a comment • edited by mahf708 Loading

Choose a reason for hiding this comment

mahf708 commented Jan 23, 2025

mahf708 commented Jan 23, 2025

bartgol commented Jan 23, 2025

tcclevenger commented Jan 24, 2025

mahf708 commented Jan 24, 2025

mahf708 commented Jan 26, 2025 • edited Loading

bartgol left a comment

Choose a reason for hiding this comment

bartgol Jan 27, 2025

Choose a reason for hiding this comment

bartgol Jan 27, 2025

Choose a reason for hiding this comment

bartgol Jan 27, 2025

Choose a reason for hiding this comment

bartgol Jan 27, 2025

Choose a reason for hiding this comment

mahf708 Jan 27, 2025

Choose a reason for hiding this comment

mahf708 Jan 27, 2025

Choose a reason for hiding this comment

bartgol Jan 27, 2025

Choose a reason for hiding this comment

mahf708 commented Jan 23, 2025 •

edited

Loading

bartgol left a comment •

edited by mahf708

Loading

mahf708 commented Jan 26, 2025 •

edited

Loading