How do I validate a value in a dataframe which is dependent on other value in that specific row? #714
-
Hello, developers of this wonderful library! Data science newbie here. Suppose I have a .csv which follows this format:
After exporting this to pandas/modin, I'd like to perform row-differentiated checks, where:
How would you validate this .csv using Pandera? Sorry if that is a noobish question but I've read the entire documentation from A to Z and found no straightforward answer to the task at hand. Thank you all very much in advance, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @vovavili Depending on which API you're using, you can check out the wide checks for the object-based API or dataframe checks for the class-based API. Note: the code snippets below aren't tested, but should be going in the right direction Class-based API: import pandera as pa
from pandera.typing as Series
class Schema(pa.SchemaModel):
Name: Series[str]
Salary: Series[int]
Department: Series[str]
Mandatory: Series[str]
@pa.dataframe_check
def rob_aviation_check(cls, df) -> Series[bool]:
return df.loc[df["Name"] == "Rob" & df["Department"] == "Aviation", "Salary"] >= 5000 Object-based API: schema = DataFrameSchema(
columns={
"Name": pa.Column(str),
"Salary": Pa.Column(int),
...
}
checks=[
pa.Check(lambda df: df.loc[df["Name"] == "Rob" & df["Department"] == "Aviation", "Salary"] >= 5000)
]
) |
Beta Was this translation helpful? Give feedback.
Hi @vovavili
Depending on which API you're using, you can check out the wide checks for the object-based API or dataframe checks for the class-based API.
Note: the code snippets below aren't tested, but should be going in the right direction
Class-based API:
Object-based API: