Should pa.Check have more built-in checks for common validation tasks? #799

vovavili · 2022-03-25T18:01:07Z

vovavili
Mar 25, 2022

Hello, gentlemen!

I've been using this package for my work-related tasks for quite some time now and I find it to be useful beyond belief. In comparison to something like Great Expectations (another data validation suite that I have to use for work-related tasks), Pandera is light-weight, easy to set up and extend and it serves its job in a highly intuitive way, with fantastic integration with hypothesis package. For Great Expectations I had to read documentation multiple times just to get to begin to understand how to configure it properly.

However, one thing that I really like about GE is that is has a wide assortment of built-in checks, while sometime working with Pandera makes me write custom checks for what are seemingly common validation tasks. While this is nothing way too cumbersome, as a matter of common good I think Pandera would benefit if as many common validation tasks as possible were to be bundled with it, leaving custom checks as a resort of less commonly used operations. I think this would push Pandera's already high ease of use even further.

For example, here are some of checks from GE that I frequently rely on during my workflow which I think are common enough to warrant being considered to be built into Pandera as well:

One common thread around checks outlined about is that most of them can be rather simple custom wide checks, and built-in Pandera checks are all dealing with tidy data. I think there is ample space of improvement here as well, since I don't really see a reason as to why most common validation operations cannot be cross-column.

Would you all agree?
What other common validation operations would you think can be bundled into Pandera?
Would you agree with a proposal that some of wide checks should be built-in as well?

Thank you all in advance for your input, thoughts and opinions.

cosmicBboy · 2022-03-25T18:11:39Z

cosmicBboy
Mar 25, 2022
Maintainer

Hi @vovavili

Thanks for your detailed post!

Short answer: yes! Let's chip away at getting feature parity in the built-in checks between GE and Pandera.

The 5 checks you enumerated are easily implemented in pandera.

I'll start an issue with these 5!

@vovavili would you mind helping prioritize what GE checks you'd like supported in pandera?

4 replies

cosmicBboy Mar 25, 2022
Maintainer

Maybe we can continue this discussion thread to rank order the next 10 or 20 GE checks?

Other folks in the pandera community can also feel free to put in their priority requests

vovavili Mar 28, 2022
Author

In addition to 6 checks listed above, I think these would be useful to be prioritized first (judging purely by my intuition and which checks I have used myself for work-related tasks):

Would you say that this is a fair list to prioritize?

cosmicBboy Mar 29, 2022
Maintainer

thanks @vovavili ! looks like 7 and 8 are the same... did you mean to put another expectation for 8?

vovavili Mar 31, 2022
Author

@cosmicBboy My bad, fixed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should pa.Check have more built-in checks for common validation tasks? #799

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Should pa.Check have more built-in checks for common validation tasks? #799

vovavili Mar 25, 2022

Replies: 1 comment · 4 replies

cosmicBboy Mar 25, 2022 Maintainer

cosmicBboy Mar 25, 2022 Maintainer

vovavili Mar 28, 2022 Author

cosmicBboy Mar 29, 2022 Maintainer

vovavili Mar 31, 2022 Author

vovavili
Mar 25, 2022

Replies: 1 comment 4 replies

cosmicBboy
Mar 25, 2022
Maintainer

cosmicBboy Mar 25, 2022
Maintainer

vovavili Mar 28, 2022
Author

cosmicBboy Mar 29, 2022
Maintainer

vovavili Mar 31, 2022
Author