Support for Logical Data Types #798: These data types check the actual values of the data container at runtime to support data types like "URL", "Name", etc.
Check.unique_values_eq #858: Make sure that all of the values in the data container cover the entire domain of the specified finite set of values.

What's Changed 📈

Lazy SchemaErrors contain schema name by @fleimgruber in 0d10f39
Support for logical data types by @jeffzi in #798
fix for Index of type category fails on validation by @kuutsav in #840
Add new check unique_values_eq by @johnkangw in #858
Add from records to panderas dataframe #850 by @borissmidt in #859
Doc fix: incorrect default value by @plague006 in #862
Handle cases of reset_index level being None or an empty list by @plague006 in #865
fixing unique multi index in SchemaModel by @mattB1989 in #870
Adding description and title to column serializations by @dantheand in #877
Fix modin and pyspark CI by @jeffzi and @cosmicBboy in #886
Add pandas_engine.Date by @jeffzi in #887
fix typo in docs by @jonwiggins in #895
Update strict type-hints by @the-matt-morris in #898
fix strategies ci by @cosmicBboy in #899
Bugfix/882 don't coerce datatypes twice by @ng-henry in #901
bugfix/904: ignore_na only ignores df records if all are Nan by @cosmicBboy in #909
fix sphinx docs by @cosmicBboy in #912
ExtensionDtype path should follow documentation by @pepelovesvim in #915
pin pandas-stubs version, bump mypy by @cosmicBboy in #916
Docs/867 by @the-matt-morris in #919

New Contributors 🎉

@johnkangw made their first contribution in #858
@pepelovesvim made their first contribution in #915
@dantheand made their first contribution in #877
@jonwiggins made their first contribution in #895
@kuutsav made their first contribution in #840
@borissmidt made their first contribution in #859
@plague006 made their first contribution in #862
@mattB1989 made their first contribution in #870
@the-matt-morris made their first contribution in #898
@ng-henry made their first contribution in #901
@pepelovesvim made their first contribution in #915

Full Changelog: v0.11.0...v0.12.0

Contributors

cosmicBboy, borissmidt, and 11 other contributors

Assets 2

0 Join discussion

12 Aug 16:31

cosmicBboy

v0.12.0b0

7a9c6ca

Beta release v0.12.0b0 Pre-release

Pre-release

beta release v0.12.0b0

Assets 2

01 May 00:31

cosmicBboy

v0.11.0

c494ee7

0.11.0: Docs support dark mode, custom names and errors for built-in checks, bug fixes

Big shoutout to the contributors on this release!

Highlights

Docs Gets Dark Mode 🌓

Just a little something for folks who prefer dark mode!

Enhancements

Make DataFrameSchema respect subclassing #830
Feature: Add support for Generic to SchemaModel #810
feat: make schema available in SchemaErrors #831
add support for custom name and error in builtin checks #843

Bugfixes

Make DataFrameSchema respect subclassing #830
fix pandas_engine.DateTime.coerce_value not consistent with coerce #827
fix mypy 9c5eaa3

Documentation Improvements

Dark docs #841

Contributors

tfwillems, fleimgruber, and 2 other contributors

Assets 2

1 Join discussion

30 Apr 13:39

cosmicBboy

v0.11.0b1

9c5eaa3

0.11.0b1: fix mypy error Pre-release

Pre-release

v0.11.0b1

release v0.11.0b1

Assets 2

29 Apr 21:13

cosmicBboy

v0.11.0b0

f302b5a

0.11.0b0: Docs support dark mode, custom names and errors for built-in checks, bug fixes Pre-release

Pre-release

v0.11.0b0

beta release for 0.11.0

Assets 2

04 Apr 03:13

cosmicBboy

v0.10.1

89154b9

0.10.1: Pyspark documentation fixes

v0.10.1

release 0.10.1

Assets 2

01 Apr 13:56

cosmicBboy

v0.10.0

5913499

0.10.0: Pyspark.pandas Support, PydanticModel datatype, Performance Improvements

Highlights

`pandera` now supports pyspark dataframe validation via `pyspark.pandas`

The pandera koalas integration has now been deprecated

You can now pip install pandera[pyspark] and validate pyspark.pandas dataframes:

import pyspark.pandas as ps
import pandas as pd
import pandera as pa

from pandera.typing.pyspark import DataFrame, Series


class Schema(pa.SchemaModel):
    state: Series[str]
    city: Series[str]
    price: Series[int] = pa.Field(in_range={"min_value": 5, "max_value": 20})


# create a pyspark.pandas dataframe that's validated on object initialization
df = DataFrame[Schema](
    {
        'state': ['FL','FL','FL','CA','CA','CA'],
        'city': [
            'Orlando',
            'Miami',
            'Tampa',
            'San Francisco',
            'Los Angeles',
            'San Diego',
        ],
        'price': [8, 12, 10, 16, 20, 18],
    }
)
print(df)

`PydanticModel` DataType Enables Row-wise Validation with a `pydantic` model

Pandera now supports row-wise validation by applying a pydantic model as a dataframe-level dtype:

from pydantic import BaseModel

import pandera as pa


class Record(BaseModel):
    name: str
    xcoord: str
    ycoord: int

import pandas as pd
from pandera.engines.pandas_engine import PydanticModel


class PydanticSchema(pa.SchemaModel):
    """Pandera schema using the pydantic model."""

    class Config:
        """Config with dataframe-level data type."""

        dtype = PydanticModel(Record)
        coerce = True  # this is required, otherwise a SchemaInitError is raised

⚠️ Warning: This may lead to performance issues for very large dataframes.

Improved conda installation experience

Before this release there were only two conda packages: one to install pandera-core and another to install pandera (which would install all extras functionality)

The conda packaging now supports finer-grained control:

conda install -c conda-forge pandera-hypotheses  # hypothesis checks
conda install -c conda-forge pandera-io          # yaml/script schema io utilities
conda install -c conda-forge pandera-strategies  # data synthesis strategies
conda install -c conda-forge pandera-mypy        # enable static type-linting of pandas
conda install -c conda-forge pandera-fastapi     # fastapi integration
conda install -c conda-forge pandera-dask        # validate dask dataframes
conda install -c conda-forge pandera-pyspark     # validate pyspark dataframes
conda install -c conda-forge pandera-modin       # validate modin dataframes
conda install -c conda-forge pandera-modin-ray   # validate modin dataframes with ray
conda install -c conda-forge pandera-modin-dask  # validate modin dataframes with dask

Enhancements

Bugfixes

Deprecations

Docs Improvements

Testing Improvements

Misc Changes

Contributors

Assets 2

0 Join discussion

09 Feb 00:34

cosmicBboy

v0.9.0

f36cc9b

0.9.0: FastAPI Integration, Support GeoPandas DataFrames

Highlights

FastAPI Integration [Docs]

pandera now integrates with fastapi. You can decorate app endpoint arguments with DataFrame[Schema] types and the endpoint will validate incoming and outgoing data.

from typing import Optional

from pydantic import BaseModel, Field

import pandera as pa


# schema definitions
class Transactions(pa.SchemaModel):
    id: pa.typing.Series[int]
    cost: pa.typing.Series[float] = pa.Field(ge=0, le=1000)

    class Config:
        coerce = True

class TransactionsOut(Transactions):
    id: pa.typing.Series[int]
    cost: pa.typing.Series[float]
    name: pa.typing.Series[str]

class TransactionsDictOut(TransactionsOut):
    class Config:
        to_format = "dict"
        to_format_kwargs = {"orient": "records"}

App endpoint example:

from fastapi import FastAPI, File

app = FastAPI()

@app.post("/transactions/", response_model=DataFrame[TransactionsDictOut])
def create_transactions(transactions: DataFrame[Transactions]):
    output = transactions.assign(name="foo")
    ...  # do other stuff, e.g. update backend database with transactions
    return output

Data Format Conversion [Docs]

The class-based API now supports automatically deserializing/serializing pandas dataframes in the context of @pa.check_types-decorated functions, @pydantic.validate_arguments-decorated functions, and fastapi endpoint functions.

import pandera as pa
from pandera.typing import DataFrame, Series

# base schema definitions
class InSchema(pa.SchemaModel):
    str_col: Series[str] = pa.Field(unique=True, isin=[*"abcd"])
    int_col: Series[int]

class OutSchema(InSchema):
    float_col: pa.typing.Series[float]

# read and validate data from a parquet file
class InSchemaParquet(InSchema):
    class Config:
        from_format = "parquet"

# output data as a list of dictionary records
class OutSchemaDict(OutSchema):
    class Config:
        to_format = "dict"
        to_format_kwargs = {"orient": "records"}

@pa.check_types
def transform(df: DataFrame[InSchemaParquet]) -> DataFrame[OutSchemaDict]:
    return df.assign(float_col=1.1)

The transform function can then take a filepath or buffer containing a parquet file that pandera automatically reads and validates:

import io
import json

buffer = io.BytesIO()
data = pd.DataFrame({"str_col": [*"abc"], "int_col": range(3)})
data.to_parquet(buffer)
buffer.seek(0)

dict_output = transform(buffer)
print(json.dumps(dict_output, indent=4))

Output:

[
    {
        "str_col": "a",
        "int_col": 0,
        "float_col": 1.1
    },
    {
        "str_col": "b",
        "int_col": 1,
        "float_col": 1.1
    },
    {
        "str_col": "c",
        "int_col": 2,
        "float_col": 1.1
    }
]

Data Validation with GeoPandas [Docs]

DataFrameSchemas can now validate geopandas.GeoDataFrame and GeoSeries objects:

import geopandas as gpd
import pandas as pd
import pandera as pa
from shapely.geometry import Polygon

geo_schema = pa.DataFrameSchema({
    "geometry": pa.Column("geometry"),
    "region": pa.Column(str),
})

geo_df = gpd.GeoDataFrame({
    "geometry": [
        Polygon(((0, 0), (0, 1), (1, 1), (1, 0))),
        Polygon(((0, 0), (0, -1), (-1, -1), (-1, 0)))
    ],
    "region": ["NA", "SA"]
})

geo_schema.validate(geo_df)

You can also define SchemaModel classes with a GeoSeries field type annotation to create validated GeoDataFrames, or use then in @pa.check_types-decorated functions for input/output validation:

from pandera.typing import Series
from pandera.typing.geopandas import GeoDataFrame, GeoSeries


class Schema(pa.SchemaModel):
    geometry: GeoSeries
    region: Series[str]


# create a geodataframe that's validated on object initialization
df = GeoDataFrame[Schema](
    {
        'geometry': [
            Polygon(((0, 0), (0, 1), (1, 1), (1, 0))),
            Polygon(((0, 0), (0, -1), (-1, -1), (-1, 0)))
        ],
        'region': ['NA','SA']
    }
)

Enhancements

Support GeoPandas data structures (#732)
Fastapi integration (#741)
add title/description fields (#754)
add nullable float dtypes (#721)

Bugfixes

typed descriptors and setup.py only includes pandera (#739)
@pa.dataframe_check works correctly on pandas==1.1.5 (#735)
fix set_index with MultiIndex (#751)
strategies: correctly handle StringArray null values (#748)

Docs Improvements

fastapi docs, add to ci (#753)

Testing Improvements

Add Python 3.10 to CI matrix (#724)

Contributors

Big shout out to the following folks for your contributions on this release 🎉🎉🎉

Contributors

jamesmyatt, roshcagra, and 3 other contributors

Assets 2

0 Join discussion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.12.0

Highlights ⭐️

What's Changed 📈

New Contributors 🎉

Contributors

Big shoutout to the contributors on this release!

Highlights

Docs Gets Dark Mode 🌓

Enhancements

Bugfixes

Documentation Improvements

Contributors

Highlights

`pandera` now supports pyspark dataframe validation via `pyspark.pandas`

`PydanticModel` DataType Enables Row-wise Validation with a `pydantic` model

Improved conda installation experience

Enhancements

Bugfixes

Deprecations

Docs Improvements

Testing Improvements

Misc Changes

Contributors

Highlights

FastAPI Integration [Docs]

Data Format Conversion [Docs]

Data Validation with GeoPandas [Docs]

Enhancements

Bugfixes

Docs Improvements

Testing Improvements

Contributors

Contributors

Releases: unionai-oss/pandera

Beta Release: v0.13.0b1

Beta Release v0.13.0b0

Release 0.12.0: Logical Types, New Built-in Check, Bugfixes, Doc Improvements

Release 0.12.0

Highlights ⭐️

What's Changed 📈

New Contributors 🎉

Contributors

Beta release v0.12.0b0

0.11.0: Docs support dark mode, custom names and errors for built-in checks, bug fixes

Big shoutout to the contributors on this release!

Highlights

Docs Gets Dark Mode 🌓

Enhancements

Bugfixes

Documentation Improvements

Contributors

0.11.0b1: fix mypy error

0.11.0b0: Docs support dark mode, custom names and errors for built-in checks, bug fixes

0.10.1: Pyspark documentation fixes

0.10.0: Pyspark.pandas Support, PydanticModel datatype, Performance Improvements

Highlights

pandera now supports pyspark dataframe validation via pyspark.pandas

PydanticModel DataType Enables Row-wise Validation with a pydantic model

Improved conda installation experience

Enhancements

Bugfixes

Deprecations

Docs Improvements

Testing Improvements

Misc Changes

Contributors

0.9.0: FastAPI Integration, Support GeoPandas DataFrames

Highlights

FastAPI Integration [Docs]

Data Format Conversion [Docs]

Data Validation with GeoPandas [Docs]

Enhancements

Bugfixes

Docs Improvements

Testing Improvements

Contributors

Contributors

`pandera` now supports pyspark dataframe validation via `pyspark.pandas`

`PydanticModel` DataType Enables Row-wise Validation with a `pydantic` model