Skip to content

Releases: unionai-oss/pandera

0.8.1: Mypy Plugin, Better Editor Type Annotation Autocomplete, Pickleable SchemaError(s), Improved Error-reporting, Bugfixes

31 Dec 21:43
9448d0a
Compare
Choose a tag to compare

Enhancements

  • add __all__ declaration to root module for better editor autocompletion 42e60c6
  • fix: expose nullable boolean in pandera.typing 5f9c713
  • type annotations for DataFrameSchema (#700)
  • add head of coerce failure cases (#710)
  • add mypy plugin (#701)
  • make SchemaError and SchemaErrors picklable (#722)

Bugfixes

  • Only concat and drop_duplicates if more than one of {sample,head,tail} are present d3bc974, f756166, 20a631f
  • fix field autocompletion (#702)

Docs Improvements

  • Update contributing documentation: how to add dependencies #696
  • update package description in setup.py eb130b4
  • Fix broken links in dataframe_schemas.rst (#708)

Contributors

Big shout out to the following folks for your contributions on this release 🎉🎉🎉

0.8.0: Integrate with Dask, Koalas, Modin, Pydantic, Mypy

13 Nov 05:03
Compare
Choose a tag to compare

Community Announcements

Pandera now has a discord community! Join us if you need help, want to discuss features/bugs, or help other community members 🤝

Discord

Highlights

Schema support for Dask, Koalas, Modin

Excited to announce that 0.8.0 is the first release that adds built-in support for additional dataframe types beyond Pandas: you can now use the exact same DataFrameSchema objects or SchemaModel classes to validate Dask, Modin, and Koalas dataframes.

import dask.dataframe as dd
import pandas as pd
import pandera as pa

from pandera.typing import dask, koalas, modin

class Schema(pa.SchemaModel):
    state: Series[str]
    city: Series[str]
    price: Series[int] = pa.Field(in_range={"min_value": 5, "max_value": 20})

@pa.check_types
def dask_function(ddf: dask.DataFrame[Schema]) -> dask.DataFrame[Schema]:
    return ddf[ddf["state"] == "CA"]

@pa.check_types
def koalas_function(df: koalas.DataFrame[Schema]) -> koalas.DataFrame[Schema]:
    return df[df["state"] == "CA"]

@pa.check_types
def modin_function(df: modin.DataFrame[Schema]) -> modin.DataFrame[Schema]:
    return df[df["state"] == "CA"]

And DataFramaSchema objects will work on all dataframe types:

schema: pa.DataFrameSchema = Schema.to_schema()

schema(dask_df)
schema(modin_df)
schema(koalas_df)

Pydantic Integration

pandera.SchemaModels are fully compatible with pydantic:

import pandas as pd
import pandera as pa
from pandera.typing import DataFrame, Series
import pydantic


class SimpleSchema(pa.SchemaModel):
    str_col: Series[str] = pa.Field(unique=True)


class PydanticModel(pydantic.BaseModel):
    x: int
    df: DataFrame[SimpleSchema]


valid_df = pd.DataFrame({"str_col": ["hello", "world"]})
PydanticModel(x=1, df=valid_df)

invalid_df = pd.DataFrame({"str_col": ["hello", "hello"]})
PydanticModel(x=1, df=invalid_df)

Error:

Traceback (most recent call last):
...
ValidationError: 1 validation error for PydanticModel
df
series 'str_col' contains duplicate values:
1    hello
Name: str_col, dtype: object (type=value_error)

Mypy Integration

Pandera now supports static type-linting of DataFrame types with mypy out of the box so you can catch certain classes of errors at lint-time.

import pandera as pa
from pandera.typing import DataFrame, Series

class Schema(pa.SchemaModel):
    id: Series[int]
    name: Series[str]

class SchemaOut(pa.SchemaModel):
    age: Series[int]

class AnotherSchema(pa.SchemaModel):
    foo: Series[int]

def fn(df: DataFrame[Schema]) -> DataFrame[SchemaOut]:
    return df.assign(age=30).pipe(DataFrame[SchemaOut])  # mypy okay

def fn_pipe_incorrect_type(df: DataFrame[Schema]) -> DataFrame[SchemaOut]:
    return df.assign(age=30).pipe(DataFrame[AnotherSchema])  # mypy error
    # error: Argument 1 to "pipe" of "NDFrame" has incompatible type "Type[DataFrame[Any]]";
    # expected "Union[Callable[..., DataFrame[SchemaOut]], Tuple[Callable[..., DataFrame[SchemaOut]], str]]"  [arg-type]  # noqa

schema_df = DataFrame[Schema]({"id": [1], "name": ["foo"]})
pandas_df = pd.DataFrame({"id": [1], "name": ["foo"]})

fn(schema_df)  # mypy okay
fn(pandas_df)  # mypy error
# error: Argument 1 to "fn" has incompatible type "pandas.core.frame.DataFrame";
# expected "pandera.typing.pandas.DataFrame[Schema]"  [arg-type]

Enhancements

Bugfixes

  • 7a98e23 bugfix: support nullable empty strategies (#638)
  • 5ec4611 Fix remaining unrecognized numpy dtypes (#637)
  • 96d6516 Correctly handling single string constraints (#670)

Docs Improvements

  • 1860685 add pyproject.toml, update doc typos
  • 3c086a9 add discord link, update readme, docs (#674)
  • d75298f more detailed docstring of pandera.model_components.Field (#671)
  • 96415a0 Add strictly typed pandas to readme (#649)

Testing Improvements

Internals Improvements

Contributors

Big shout out to the following folks for your contributions on this release 🎉🎉🎉

0.7.2: Bugfixes

25 Sep 02:06
Compare
Choose a tag to compare

Bugfixes

  • Strategies should not rely on pandas dtype aliases (#620)
  • support timedelta in data synthesis strats (#621)
  • fix multiindex error reporting (#622)
  • Pin pylint (#629)
  • exclude np.float128 type registration in MacM1 (#624)
  • fix numpy_pandas_coercible bug dealing with single element (#626)
  • update pylint (#630)

0.7.1: Add unique option to DataFrameSchema

13 Sep 00:28
f0ddcbf
Compare
Choose a tag to compare

Enhancements

  • add support for Any annotation in schema model (#594)
  • add support for timezone-aware datetime strategies (#595)
  • unique keyword arg: replace and deprecate allow_duplicates (#580)
  • Add support for empty data type annotation in SchemaModel (#602)
  • support frictionless primary keys with multiple fields (#608)

Bugfixes

  • unify typing.DataFrame class definitions (#576)
  • schemas with multi-index columns correctly report errors (#600)
  • strategies module supports undefined checks in regex columns (#599)
  • fix validation of check raising error without message (#613)

Docs Improvements

  • Tutorial: docs/scaling - Bring Pandera to Spark and Dask (#588)

Repo Improvements

  • use virtualenv instead of conda in ci (#578)

Dependency Changes

  • remove frictionless from core pandera deps (#609)
  • docs/requirements.txt pin setuptools (#611)

Contributors

🎉🎉 Big shout out to all the contributors on this release 🎉🎉

0.7.0: Pandera Type System Overhaul

06 Aug 02:30
Compare
Choose a tag to compare

Enhancements

  • Add support for frictionless schemas (#454) [docs]
  • decouple pandera and pandas dtypes (#559) [docs]
  • Unify dataframe definitions to fix auto-complete #576
  • Report all failure cases when coercing dtypes fails (#584)

Bugfixes

  • Handle case of pandas.DataFrame with pandas.MultiIndex in pandera.error_formatters.reshape_failure_cases (#560)
  • Add 'ordered.setter' decorator (#567)
  • Fix decorators on classmethods (#568)
  • better handling of datetime/timedelta in serialize/deserialize (#585)

Docs Improvements

  • Update contributing guide ccca82f
  • Add documentation build to contributing guide 361fec0
  • Fix virtualenv instructions in contributing guide ed74a65
  • Feature/coroutines docs (#570)
  • Add frictionless documentation (#579)
  • use python primitive types in docs where possible (#581)

Repo Improvements

  • Add typing to un-annotated functions (#569)
  • use virtualenv instead of conda in ci (#578)

Contributors

Big shout out to ✨ @mattHawthorn, @vinisalazar, @cristianmatache, @TColl, @jeffzi, @admackin, and @benkeesey ✨ for your contributions on this release 🎉🎉🎉

0.6.5: Support coroutines, regex matching on non-str column names, bugfixes

13 Jul 19:21
Compare
Choose a tag to compare

Enhancements

  • Raise error if check_obj.index is MultiIndex when using pandera.Index (#483)
  • support decorators for coroutines (#546)
  • added py.typed and typed Series descriptor (#543)
  • select non-str column names with regex=True (#551)

Bugfixes

  • check decorators support non-DataFrame types (#510)
  • lazy validation correctly reports all errors (#528)
  • don't drop duplicates for series failure cases (#535)
  • custom dataframe-level checks don't corrupt data-synthesis strategy #550

Contributors

Thanks to @jekwatt @cristianmatache @lkadin for your first-time contributions! 🎉🎉🎉

0.6.4: Support dataframe-level checks in SchemaModel Config, Bugfixes

08 May 16:08
Compare
Choose a tag to compare

New Features

  • Allow attaching registered dataframe checks by using Config field names (#478)

Bugfixes

  • alias propagation works correctly on empty subclass (#446)
  • Add missing inplace arg to SchemaModel's validate (#450)
  • fix check_types decorator should return results from validate (#458)
  • Dataframe schemas in yaml do not require any field (#479)
  • coerce=True and pandas_dtype=None should be a noop (#476)

Doc Improvement

  • update documentation css to fit mobile (#447)
  • add copy button to docs (#448)
  • link documentation to github (#449)

Infrastructure Changes

0.6.3: Bugfixes, update docs

28 Mar 02:28
Compare
Choose a tag to compare

New Features

  • add new method SchemaModel.to_yaml to serialize SchemaModels to yaml #428

Bugfixes

  • preserve pandas extension types during validation (#443)
  • Fix to_yaml serialization dropping global checks (#428) 🎉 first contribution @antonl 🎉
  • fix empty data type not supported for serialization (#435)
  • fix empty SchemaModel (#434)
  • add doc about attributes excluded by SchemaModel (#436) @jeffzi
  • fix DataFrameSchema comparison with non-DataFrameSchema (#431) @jeffzi
  • schema serialization handles non-PandasDtype (#424)
  • pa.Object coerce should preserve object type (#423)

Documentation

0.6.2: SchemaModel and synthesis bugfixes

16 Feb 17:17
Compare
Choose a tag to compare

New Feature

  • Add SchemaModel column name access through class attributes (#388) @jespercodes @jeffzi 🎉
  • Parametrized PandasExtensionType types (#389) @jeffzi 🎉
  • adding filter argument to strict parameter (#401) @ktroutman
  • feature/341: improve str and repr methods for schemas (#413)

Bugfixes

  • fix py3.6 optional + literal dtypes in SchemaModel (#379) @jeffzi 🎉
  • Fix minimally required packaging version (#380) contribution #1️⃣ @probberechts 🎉
  • prevent mypy Check getattr error for registered checks 920a98c
  • Compatibility with numpy 1.20 (#395) @jeffzi
  • dataframe strategies can generate regex columns (#402)
  • bugfix: df data synthesis with size=None, fix CI (#410)
  • bugfix: SeriesSchema raises SchemaErrors on lazy validation (#412)

Repo Improvements

  • improvements to local CI (#409) @jeffzi
  • feature/414: improve contributing docs and add to sphinx docs (#416)

0.6.1: coercion and required column bugfixes

07 Jan 01:03
bfdb118
Compare
Choose a tag to compare

Bugfix Release

This release contains two bugfixes:

  • coerce nullable str column handles all na (#366)
  • non-required columns that are not in dataframe are not coerced (#368)