Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow pydantic model validation via Annotated metadata #1924

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cswartzvi
Copy link

Hi - this is an attempt to add preliminary schema validation via Annotated metadata (#1333). This PR, in it's current form, alters the __get_pydantic_core_schema__ method of DataFrameModel to allow for metadata validation in a pydantic context. For example:

from typing import Annotated

import pandas as pd
import pandera as pa
from pandera.typing import Series
from pydantic import BaseModel
from pydantic import ConfigDict
from pydantic import validate_call


class SimpleSchema(pa.DataFrameModel):
    str_col: Series[str] = pa.Field(unique=True)

df = pd.DataFrame({"str_col": ["hello", "world"]})


# NOTE: arbitrary_types_allowed required becuase pd.DataFrame
# is not a recognized pydantic type
config = ConfigDict(arbitrary_types_allowed=True)


# Case 1. Using pydanitc.BaseModel

class AnnotatedDataframe(BaseModel):
    model_config = config
    df: Annotated[pd.DataFrame, SimpleSchema]

model = AnnotatedDataframe(df=df)  # No type error!


# Case 2. Using pydanitc.validate_call

@validate_call(config=config, validate_return=True)
def process(
    df: Annotated[pd.DataFrame, SimpleSchema]
) -> Annotated[pd.DataFrame, SimpleSchema]:
    return df.dropna()  # No type error!

filtered_df = process(df)

The benefit of this approach is that AnnotatedDataframe(df=df) and df.dropna() pass type checkers without resorting to a (mypy-only) plugin.

Note that this PR, currently, does not address changes to check_types. If there is interest in this PR being merged, I would be happy to contribute additional changes to check_types. My idea was to use the information already available in pandera.typing.AnnotationInfo and create a support for Annotated[DataFrame, Schema] in additional to the current DataFrame[Schema]. Either way, just let me know - Thanks!

Copy link

codecov bot commented Mar 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.26%. Comparing base (812b2a8) to head (b207b7e).
Report is 202 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1924      +/-   ##
==========================================
- Coverage   94.28%   93.26%   -1.03%     
==========================================
  Files          91      121      +30     
  Lines        7013     9377    +2364     
==========================================
+ Hits         6612     8745    +2133     
- Misses        401      632     +231     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cosmicBboy
Copy link
Collaborator

this is awesome, thanks @cswartzvi ! looks like there are unit test issues, let me know if you need any help.

Also, could we also add support for Annotated validation with the pandera.check_types decorator?

@cswartzvi
Copy link
Author

@cosmicBboy,

looks like there are unit test issues, let me know if you need any help.

Seems like a pydantic<2 issue. That's my fault, I should have tested the compatibility more thoroughly 🤦🏻. Probably needs some more guardrails to check for PYDANTIC_V2 - I will clean that up.

Also, could we also add support for Annotated validation with the pandera.check_types decorator?

Sure thing, I was hoping you would say that! It might require some rewiring, but I will get working on that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants