You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The throwing versions first run the accumulating equivalent and then throw if the accumulated result is not empty. Below, we will therefore focus on the validate_* versions.
The issue with the existing approach
When running batch calculations, the following situations might be encountered, especially in performant environments like a production environment:
input data might contain omitted data (NaN values) for required but updateable attributes that are in fact provided in the update data. These absent attributes will cause validate_input_data to report errors. Instead, validate_batch_data is required.
Conversely, the update data might only update a small subset of all updateable attributes. Because it affects batch data, validate_batch_data is required.
A combination of the above two situations is also possible.
In all these cases, a lot of values could be checked on just the input data alone, either because they are not provided or not updateable in the first place. As it is, any such errors will be reported for all scenarios, rather than just once, which results in an excessively large list of issues.
However, if the input data contains no status_from (which are updateable but required attributes). Instead, the update data contains valid status_from and status_to which will resolve the issues. Then, validate_input_data will report errors on status_from, but it will be resolved when calling validate_batch_data
but the tap_nom is not updateable and therefore may still be invalid for every scenario when calling validate_batch_data.
validate_batch_data will now correctly report the error on tap_nom, but it will do so for every scenario in the batch. That is a lot of unnecessary duplication that could be caught in the validate_input_data already.
Proposed solutions
New functionality
2 new types of functionality should be added:
extend validation functionality on input data with partial checks.
TBD: Either of the following options should be selected (or both):
check only non-updateable attributes
check only provided attributes
extend validation functionality on batch data with partial checks.
TBD: Either of the following options should be selected
check only attributes on the input data that are not provided in any of the update data scenarios
check homogeneous attributes (that are the same for all update data scenarios)
Implementation
TBD:
Add new functions for the above functionality
Pro: not breaking
Con: more functions
Add new keyword arguments to existing validation functions (both validate_* and assert_valid_*). The default behavior should be the existing behavior (report all errors for all scenarios)
Pro: no new functions
Con: how to output?
Considered and rejected alternatives
The following changes to validate_batch_data were considered but would be breaking:
Adding early returns
This removes data from the output
Changing the output from a dict of scenario index + scenario errors to a dict of "all" errors for the errors that are the same across all scenarios, and then also the scenario-specific errors in the same way as before (scenario index + scenario errors)
This changes the output
The text was updated successfully, but these errors were encountered:
Related to #871 .
Background
The PGM supports a couple data validation options (see https://power-grid-model.readthedocs.io/en/stable/api_reference/python-api-reference.html#validation )
validate_input_data
validate_batch_data
assert_valid_input_data
assert_valid_batch_data
The throwing versions first run the accumulating equivalent and then throw if the accumulated result is not empty. Below, we will therefore focus on the
validate_*
versions.The issue with the existing approach
When running batch calculations, the following situations might be encountered, especially in performant environments like a production environment:
validate_input_data
to report errors. Instead,validate_batch_data
is required.validate_batch_data
is required.In all these cases, a lot of values could be checked on just the input data alone, either because they are not provided or not updateable in the first place. As it is, any such errors will be reported for all scenarios, rather than just once, which results in an excessively large list of issues.
Example
tap_nom
of a transformer are:(tap_min <= tap_nom <= tap_max) or (tap_min >= tap_nom >= tap_max)
(taken from https://power-grid-model.readthedocs.io/en/stable/user_manual/components.html#transformer).tap_nom
,tap_min
andtap_max
are not updateable, so a validation on the input data should be enough to capture thisstatus_from
(which are updateable but required attributes). Instead, the update data contains validstatus_from
andstatus_to
which will resolve the issues. Then,validate_input_data
will report errors onstatus_from
, but it will be resolved when callingvalidate_batch_data
tap_nom
is not updateable and therefore may still be invalid for every scenario when callingvalidate_batch_data
.validate_batch_data
will now correctly report the error ontap_nom
, but it will do so for every scenario in the batch. That is a lot of unnecessary duplication that could be caught in thevalidate_input_data
already.Proposed solutions
New functionality
2 new types of functionality should be added:
Implementation
TBD:
validate_*
andassert_valid_*
). The default behavior should be the existing behavior (report all errors for all scenarios)Considered and rejected alternatives
validate_batch_data
were considered but would be breaking:The text was updated successfully, but these errors were encountered: