Replies: 1 comment 4 replies
-
Hi @vovavili Thanks for your detailed post! Short answer: yes! Let's chip away at getting feature parity in the built-in checks between GE and Pandera. The 5 checks you enumerated are easily implemented in pandera. I'll start an issue with these 5! @vovavili would you mind helping prioritize what GE checks you'd like supported in pandera? |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, gentlemen!
I've been using this package for my work-related tasks for quite some time now and I find it to be useful beyond belief. In comparison to something like Great Expectations (another data validation suite that I have to use for work-related tasks), Pandera is light-weight, easy to set up and extend and it serves its job in a highly intuitive way, with fantastic integration with hypothesis package. For Great Expectations I had to read documentation multiple times just to get to begin to understand how to configure it properly.
However, one thing that I really like about GE is that is has a wide assortment of built-in checks, while sometime working with Pandera makes me write custom checks for what are seemingly common validation tasks. While this is nothing way too cumbersome, as a matter of common good I think Pandera would benefit if as many common validation tasks as possible were to be bundled with it, leaving custom checks as a resort of less commonly used operations. I think this would push Pandera's already high ease of use even further.
For example, here are some of checks from GE that I frequently rely on during my workflow which I think are common enough to warrant being considered to be built into Pandera as well:
Expect a specific format of datetime string in a given column
Expect all values in a column to be unique; also this
An ability to operate specifically with column's min, max and average values
For a pair of columns, expect value in column n1 to be greater than value in column n2.
Check pertaining to order of rows, i.e. expect column values to be decreasining/increasing
One common thread around checks outlined about is that most of them can be rather simple custom wide checks, and built-in Pandera checks are all dealing with tidy data. I think there is ample space of improvement here as well, since I don't really see a reason as to why most common validation operations cannot be cross-column.
Would you all agree?
What other common validation operations would you think can be bundled into Pandera?
Would you agree with a proposal that some of wide checks should be built-in as well?
Thank you all in advance for your input, thoughts and opinions.
Beta Was this translation helpful? Give feedback.
All reactions