Description
Related to #871 .
Background
The PGM supports a couple data validation options (see https://power-grid-model.readthedocs.io/en/stable/api_reference/python-api-reference.html#validation )
- accumulating
validate_input_data
validate_batch_data
- throwing
assert_valid_input_data
assert_valid_batch_data
The throwing versions first run the accumulating equivalent and then throw if the accumulated result is not empty. Below, we will therefore focus on the validate_*
versions.
The issue with the existing approach
When running batch calculations, the following situations might be encountered, especially in performant environments like a production environment:
- input data might contain omitted data (NaN values) for required but updateable attributes that are in fact provided in the update data. These absent attributes will cause
validate_input_data
to report errors. Instead,validate_batch_data
is required. - Conversely, the update data might only update a small subset of all updateable attributes. Because it affects batch data,
validate_batch_data
is required. - A combination of the above two situations is also possible.
In all these cases, a lot of values could be checked on just the input data alone, either because they are not provided or not updateable in the first place. As it is, any such errors will be reported for all scenarios, rather than just once, which results in an excessively large list of issues.
Example
- Valid values for
tap_nom
of a transformer are:(tap_min <= tap_nom <= tap_max) or (tap_min >= tap_nom >= tap_max)
(taken from https://power-grid-model.readthedocs.io/en/stable/user_manual/components.html#transformer).tap_nom
,tap_min
andtap_max
are not updateable, so a validation on the input data should be enough to capture this - However, if the input data contains no
status_from
(which are updateable but required attributes). Instead, the update data contains validstatus_from
andstatus_to
which will resolve the issues. Then,validate_input_data
will report errors onstatus_from
, but it will be resolved when callingvalidate_batch_data
- but the
tap_nom
is not updateable and therefore may still be invalid for every scenario when callingvalidate_batch_data
. validate_batch_data
will now correctly report the error ontap_nom
, but it will do so for every scenario in the batch. That is a lot of unnecessary duplication that could be caught in thevalidate_input_data
already.
Proposed solutions
New functionality
2 new types of functionality should be added:
- extend validation functionality on input data with partial checks.
- TBD: Either of the following options should be selected (or both):
- check only non-updateable attributes
- check only provided attributes
- TBD: Either of the following options should be selected (or both):
- extend validation functionality on batch data with partial checks.
- TBD: Either of the following options should be selected
- check only attributes on the input data that are not provided in any of the update data scenarios
- check homogeneous attributes (that are the same for all update data scenarios)
- TBD: Either of the following options should be selected
Implementation
TBD:
- Add new functions for the above functionality
- Pro: not breaking
- Con: more functions
- Add new keyword arguments to existing validation functions (both
validate_*
andassert_valid_*
). The default behavior should be the existing behavior (report all errors for all scenarios)- Pro: no new functions
- Con: how to output?
Considered and rejected alternatives
- The following changes to
validate_batch_data
were considered but would be breaking:- Adding early returns
- This removes data from the output
- Changing the output from a dict of scenario index + scenario errors to a dict of "all" errors for the errors that are the same across all scenarios, and then also the scenario-specific errors in the same way as before (scenario index + scenario errors)
- This changes the output
- Adding early returns
Metadata
Metadata
Assignees
Labels
Type
Projects
Status