Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Improve batch data validation verbosity #872

Open
10 tasks
mgovers opened this issue Jan 15, 2025 · 0 comments
Open
10 tasks

[FEATURE] Improve batch data validation verbosity #872

mgovers opened this issue Jan 15, 2025 · 0 comments
Labels
feature New feature or request

Comments

@mgovers
Copy link
Member

mgovers commented Jan 15, 2025

Related to #871 .

Background

The PGM supports a couple data validation options (see https://power-grid-model.readthedocs.io/en/stable/api_reference/python-api-reference.html#validation )

  • accumulating
    • validate_input_data
    • validate_batch_data
  • throwing
    • assert_valid_input_data
    • assert_valid_batch_data

The throwing versions first run the accumulating equivalent and then throw if the accumulated result is not empty. Below, we will therefore focus on the validate_* versions.

The issue with the existing approach

When running batch calculations, the following situations might be encountered, especially in performant environments like a production environment:

  • input data might contain omitted data (NaN values) for required but updateable attributes that are in fact provided in the update data. These absent attributes will cause validate_input_data to report errors. Instead, validate_batch_data is required.
  • Conversely, the update data might only update a small subset of all updateable attributes. Because it affects batch data, validate_batch_data is required.
  • A combination of the above two situations is also possible.

In all these cases, a lot of values could be checked on just the input data alone, either because they are not provided or not updateable in the first place. As it is, any such errors will be reported for all scenarios, rather than just once, which results in an excessively large list of issues.

Example

  • Valid values for tap_nom of a transformer are: (tap_min <= tap_nom <= tap_max) or (tap_min >= tap_nom >= tap_max) (taken from https://power-grid-model.readthedocs.io/en/stable/user_manual/components.html#transformer). tap_nom, tap_min and tap_max are not updateable, so a validation on the input data should be enough to capture this
  • However, if the input data contains no status_from (which are updateable but required attributes). Instead, the update data contains valid status_from and status_to which will resolve the issues. Then, validate_input_data will report errors on status_from, but it will be resolved when calling validate_batch_data
  • but the tap_nom is not updateable and therefore may still be invalid for every scenario when calling validate_batch_data.
  • validate_batch_data will now correctly report the error on tap_nom, but it will do so for every scenario in the batch. That is a lot of unnecessary duplication that could be caught in the validate_input_data already.

Proposed solutions

New functionality

2 new types of functionality should be added:

  • extend validation functionality on input data with partial checks.
    • TBD: Either of the following options should be selected (or both):
      • check only non-updateable attributes
      • check only provided attributes
  • extend validation functionality on batch data with partial checks.
    • TBD: Either of the following options should be selected
      • check only attributes on the input data that are not provided in any of the update data scenarios
      • check homogeneous attributes (that are the same for all update data scenarios)

Implementation

TBD:

  • Add new functions for the above functionality
    • Pro: not breaking
    • Con: more functions
  • Add new keyword arguments to existing validation functions (both validate_* and assert_valid_*). The default behavior should be the existing behavior (report all errors for all scenarios)
    • Pro: no new functions
    • Con: how to output?

Considered and rejected alternatives

  • The following changes to validate_batch_data were considered but would be breaking:
    • Adding early returns
      • This removes data from the output
    • Changing the output from a dict of scenario index + scenario errors to a dict of "all" errors for the errors that are the same across all scenarios, and then also the scenario-specific errors in the same way as before (scenario index + scenario errors)
      • This changes the output
@mgovers mgovers added the feature New feature or request label Jan 15, 2025
@mgovers mgovers changed the title [FEATURE] less verbose batch data validation [FEATURE] Improve batch data validation verbosity Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Status: No status
Development

No branches or pull requests

1 participant