Skip to content

Commit

Permalink
Merge pull request #710 from ImperialCollegeLondon/709-allow-data-to-…
Browse files Browse the repository at this point in the history
…be-configured-in-different-files

Allow `[[core.data.variable]]` to be split across files
  • Loading branch information
davidorme authored Jan 31, 2025
2 parents 97ee94b + e8d197c commit 4cab25c
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 13 deletions.
7 changes: 3 additions & 4 deletions docs/source/using_the_ve/data/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,10 +210,9 @@ file="'../../data/xy_dim.nc'"
var_name="temp"
```

**NOTE**: At the moment,
`core.data.variable` tags cannot be used across multiple toml config files without
causing `ConfigurationError: Duplicated entries in config files: core.data.variable` to
be raised. This means that all variables need to be combined in one `config` file.
You can include `core.data.variable` tags in different files. This can be useful to
group model-specific data with other model configuration options, and allow
configuration files to be swapped in a more modular fashion.

To load configuration data , you will typically use the `cfg_paths` argument
to pass one or more TOML formatted configuration files to create a
Expand Down
23 changes: 23 additions & 0 deletions tests/core/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,29 @@
("b.bb.bbb.bbba.bbbaa",),
id="conflict_complex",
),
pytest.param(
{"d1": {"d2": [1, 2, 3]}},
{"d1": {"d2": [4, 5]}},
{"d1": {"d2": [1, 2, 3, 4, 5]}},
(),
id="no_conflict_list_merge",
),
# The next example passes just fine, which is intentional, but the test is here
# to highlight the behaviour
pytest.param(
{"d1": {"d2": [1, 2, 3]}},
{"d1": {"d2": [{"file": "a_path"}]}},
{"d1": {"d2": [1, 2, 3, {"file": "a_path"}]}},
(),
id="no_conflict_list_merge_dubious_content",
),
pytest.param(
{"d1": {"d2": [1, 2, 3]}},
{"d1": {"d2": "a"}},
{"d1": {"d2": "a"}},
("d1.d2",),
id="conflict_list_and_not_list",
),
],
)
def test_config_merge(dest, source, exp_result, exp_conflicts):
Expand Down
23 changes: 19 additions & 4 deletions virtual_ecosystem/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,22 @@ def config_merge(
"""Recursively merge two dictionaries detecting duplicated key definitions.
This function returns a copy of the input ``dest`` dictionary that has been extended
recursively with the entries from the input ``source`` dictionary. The two input
dictionaries must not share any key paths and when duplicated key paths are
found, the value from the source dictionary is used and the function extends the
returned ``conflicts`` tuple with the duplicated key path.
recursively with the entries from the input ``source`` dictionary.
In general, if two input dictionaries share complete key paths (that is a set of
nested dictionary keys leading to a value) then that indicates a duplicated setting.
The values might be identical, but the configuration files should not duplicate
settings. When duplicated key paths are found, the value from the source dictionary
is used and the function extends the returned ``conflicts`` tuple with the
duplicated key path.
However an exception is where both entries are lists - resulting from a TOML array
of tables (https://toml.io/en/v1.0.0#array-of-tables). In this case, it is
reasonable to append the source values to the destination values. The motivating
example here are `[[core.data.variable]]` entries, which can quite reasonably be
split across configuration sources. Note that no attempt is made to check that the
combined values are congruent - this is deferred to error handling when the
configuration is used.
Args:
dest: A dictionary to extend
Expand Down Expand Up @@ -66,6 +78,9 @@ def config_merge(
dest[src_key], conflicts = config_merge(
dest_val, src_val, conflicts=conflicts, path=next_path
)
elif isinstance(dest_val, list) and isinstance(src_val, list):
# Both values for this key are lists, so merge the lists
dest[src_key] = [*dest_val, *src_val]
elif dest_val is None:
# The key is not currently in dest, so add the key value pair
dest[src_key] = src_val
Expand Down
10 changes: 5 additions & 5 deletions virtual_ecosystem/core/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,11 +107,11 @@
[[core.data.variable]]
var_name="elev"
Data configurations must not contain repeated data variable names. NOTE: At the moment,
```core.data.variable``` tags cannot be used across multiple toml config files without
causing ```ConfigurationError: Duplicated entries in config files: core.data.variable```
to be raised. This means that all variables need to be combined in one ```config```
file.
Data configurations must not contain repeated data variable names.
You can include ```core.data.variable``` tags in different files. This can be useful to
group model-specific data with other model configuration options, and allow
configuration files to be swapped in a more modular fashion.
.. code-block:: python
Expand Down

0 comments on commit 4cab25c

Please sign in to comment.