-
Question about panderaHi, I am trying to test pandera for schema validation. print(err.data) indicates "int_column dtype('int64')" as the possible error but without row or index value. As per the documentation page err.data # invalid dataframe should return only the invalid row. Is it possible to print only rows where the data type does not match the schema in the following example?
output as below:
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Hi @Lavi2015. By default, pandera does not check values individually, but checks the dtypes of the columns (i.e. import pandas as pd
import pandera as pa
from pandera import Check, Column, DataFrameSchema
schema = pa.DataFrameSchema(
columns={
"int_column": Column(int),
"float_column": Column(float),
"str_column": Column(str),
},
strict=True,
coerce=True, # <----
)
df = pd.DataFrame(
{
"int_column": ["a", 2, 3],
"float_column": [0.0, 1.0, 2.0],
"str_column": ["a", "b", "c"],
}
)
try:
schema.validate(df, lazy=True)
except pa.errors.SchemaErrors as err:
print("Schema errors and failure cases:")
print(err.failure_cases)
print("\nDataFrame object that failed validation:")
print(err.data)
#> Schema errors and failure cases:
#> schema_context column check check_number failure_case \
#> 0 Column int_column coerce_dtype('int64') None a
#> 1 Column int_column dtype('int64') None object
#>
#> index
#> 0 0
#> 1 None
#>
#> DataFrame object that failed validation:
#> int_column float_column str_column
#> 0 a 0.0 a
#> 1 2 1.0 b
#> 2 3 2.0 c It's true that this behavior could be made easier to discover. We've talked before about writing a cookbook. I think that would be a good recipe. |
Beta Was this translation helpful? Give feedback.
-
Hi @jeffzi ,
My dataset has around 100 thousand records of which 90 rows fail due to dtype mismatch. I tried redirecting the exceptions to a file but still I could see only first and last 5 lines. My intention is to find out all the rows (form index column) for troubleshooting as well as to identify all the issues. How to achieve this as Also what's the first column about? Thanks. |
Beta Was this translation helpful? Give feedback.
-
How did you redirect? Exporting to csv with |
Beta Was this translation helpful? Give feedback.
-
@jeffzi , Awesome. It works. Thank you so much. |
Beta Was this translation helpful? Give feedback.
Hi @Lavi2015.
By default, pandera does not check values individually, but checks the dtypes of the columns (i.e.
DataFrame.dtypes
). To know the exact failure cases, you can enablecoerce=True
. Pandera will attempt to coerce the DataFrame to the schema dtypes and will report values that could not be coerced: