Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix validate(sample=x) for pl.DataFrame #1923

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

m-richards
Copy link
Collaborator

@m-richards m-richards commented Mar 2, 2025

Fixes #1912.

I had to feed through some additional information which meant passing through an additional argument to validate or subsample, which differs from the base class implementations. It seemed better to do that in subsample which is less user facing.

Not sure if I found the best place to put tests.

Copy link

codecov bot commented Mar 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.53%. Comparing base (812b2a8) to head (ba4896e).
Report is 199 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1923      +/-   ##
==========================================
- Coverage   94.28%   93.53%   -0.76%     
==========================================
  Files          91      121      +30     
  Lines        7013     9382    +2369     
==========================================
+ Hits         6612     8775    +2163     
- Misses        401      607     +206     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend only accepting LazyFrame was an intentional choice here, to force the backend only use polar's lazy api when implementing the actual validation logic.

Thinking about this a little bit, instead of introducing an additional argument to the subsample api, what if we did the following:

  • Preserve the type of polars.DataFrame and pass it into the backend validate method
  • Do the subsampling on polars.DataFrame if that's what the user passed in
  • Convert it to a lazyframe.
  • At the end of the backend validate method, we may also want to convert that back into a polars.DataFrame if that's what the user passed in.

@m-richards
Copy link
Collaborator Author

I'll have a go and see how this works, I was aware that LazyFrame input was deliberate and wasn't sure if it made sense to change that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Polars DataFrameModel.validate crashes with sample specified
2 participants