Releases · unionai-oss/pandera

31 Dec 21:43

v0.8.1

9448d0a

0.8.1: Mypy Plugin, Better Editor Type Annotation Autocomplete, Pickleable SchemaError(s), Improved Error-reporting, Bugfixes

Enhancements

add __all__ declaration to root module for better editor autocompletion 42e60c6
fix: expose nullable boolean in pandera.typing 5f9c713
type annotations for DataFrameSchema (#700)
add head of coerce failure cases (#710)
add mypy plugin (#701)
make SchemaError and SchemaErrors picklable (#722)

Bugfixes

Only concat and drop_duplicates if more than one of {sample,head,tail} are present d3bc974, f756166, 20a631f
fix field autocompletion (#702)

Docs Improvements

Update contributing documentation: how to add dependencies #696
update package description in setup.py eb130b4
Fix broken links in dataframe_schemas.rst (#708)

Contributors

Big shout out to the following folks for your contributions on this release 🎉🎉🎉

Contributors

nickolay, smackesey, and 4 other contributors

Assets 2

0 Join discussion

13 Nov 05:03

cosmicBboy

v0.8.0

cf37ced

0.8.0: Integrate with Dask, Koalas, Modin, Pydantic, Mypy

Community Announcements

Pandera now has a discord community! Join us if you need help, want to discuss features/bugs, or help other community members 🤝

Highlights

Schema support for Dask, Koalas, Modin

Excited to announce that 0.8.0 is the first release that adds built-in support for additional dataframe types beyond Pandas: you can now use the exact same DataFrameSchema objects or SchemaModel classes to validate Dask, Modin, and Koalas dataframes.

import dask.dataframe as dd
import pandas as pd
import pandera as pa

from pandera.typing import dask, koalas, modin

class Schema(pa.SchemaModel):
    state: Series[str]
    city: Series[str]
    price: Series[int] = pa.Field(in_range={"min_value": 5, "max_value": 20})

@pa.check_types
def dask_function(ddf: dask.DataFrame[Schema]) -> dask.DataFrame[Schema]:
    return ddf[ddf["state"] == "CA"]

@pa.check_types
def koalas_function(df: koalas.DataFrame[Schema]) -> koalas.DataFrame[Schema]:
    return df[df["state"] == "CA"]

@pa.check_types
def modin_function(df: modin.DataFrame[Schema]) -> modin.DataFrame[Schema]:
    return df[df["state"] == "CA"]

And DataFramaSchema objects will work on all dataframe types:

schema: pa.DataFrameSchema = Schema.to_schema()

schema(dask_df)
schema(modin_df)
schema(koalas_df)

Pydantic Integration

pandera.SchemaModels are fully compatible with pydantic:

import pandas as pd
import pandera as pa
from pandera.typing import DataFrame, Series
import pydantic


class SimpleSchema(pa.SchemaModel):
    str_col: Series[str] = pa.Field(unique=True)


class PydanticModel(pydantic.BaseModel):
    x: int
    df: DataFrame[SimpleSchema]


valid_df = pd.DataFrame({"str_col": ["hello", "world"]})
PydanticModel(x=1, df=valid_df)

invalid_df = pd.DataFrame({"str_col": ["hello", "hello"]})
PydanticModel(x=1, df=invalid_df)

Error:

Traceback (most recent call last):
...
ValidationError: 1 validation error for PydanticModel
df
series 'str_col' contains duplicate values:
1    hello
Name: str_col, dtype: object (type=value_error)

Mypy Integration

Pandera now supports static type-linting of DataFrame types with mypy out of the box so you can catch certain classes of errors at lint-time.

import pandera as pa
from pandera.typing import DataFrame, Series

class Schema(pa.SchemaModel):
    id: Series[int]
    name: Series[str]

class SchemaOut(pa.SchemaModel):
    age: Series[int]

class AnotherSchema(pa.SchemaModel):
    foo: Series[int]

def fn(df: DataFrame[Schema]) -> DataFrame[SchemaOut]:
    return df.assign(age=30).pipe(DataFrame[SchemaOut])  # mypy okay

def fn_pipe_incorrect_type(df: DataFrame[Schema]) -> DataFrame[SchemaOut]:
    return df.assign(age=30).pipe(DataFrame[AnotherSchema])  # mypy error
    # error: Argument 1 to "pipe" of "NDFrame" has incompatible type "Type[DataFrame[Any]]";
    # expected "Union[Callable[..., DataFrame[SchemaOut]], Tuple[Callable[..., DataFrame[SchemaOut]], str]]"  [arg-type]  # noqa

schema_df = DataFrame[Schema]({"id": [1], "name": ["foo"]})
pandas_df = pd.DataFrame({"id": [1], "name": ["foo"]})

fn(schema_df)  # mypy okay
fn(pandas_df)  # mypy error
# error: Argument 1 to "fn" has incompatible type "pandas.core.frame.DataFrame";
# expected "pandera.typing.pandas.DataFrame[Schema]"  [arg-type]

Enhancements

735e7fe implement dataframe types (#672)
46dc3a2 Support mypy (#650)
02063c8 Add Basic Dask Support (#665)
b7f6516 Modin support (#660)
cdf4667 Add Pydantic support (#659)
12378ea Support Koalas (#658)
62d689d improve lazy validation performance for nullable cases (#655)

Bugfixes

7a98e23 bugfix: support nullable empty strategies (#638)
5ec4611 Fix remaining unrecognized numpy dtypes (#637)
96d6516 Correctly handling single string constraints (#670)

Docs Improvements

1860685 add pyproject.toml, update doc typos
3c086a9 add discord link, update readme, docs (#674)
d75298f more detailed docstring of pandera.model_components.Field (#671)
96415a0 Add strictly typed pandas to readme (#649)

Testing Improvements

0a72a51 update suppression of health checks (#653)

Internals Improvements

fdcdb91 Reuse coerce in engines.utils (#645)
655dd85 remove assumption from nullable strategies (#641)

Contributors

Big shout out to the following folks for your contributions on this release 🎉🎉🎉

@sbrugman
@rbngz
@jeffzi
@bphillips-exos
@thorben-flapo
@tfwillems: special shout out here for contributing a good chunk of the code for the pydantic plugin #659

Contributors

tfwillems, sbrugman, and 3 other contributors

Assets 2

0 Join discussion

25 Sep 02:06

cosmicBboy

v0.7.2

1085259

0.7.2: Bugfixes

Bugfixes

Strategies should not rely on pandas dtype aliases (#620)
support timedelta in data synthesis strats (#621)
fix multiindex error reporting (#622)
Pin pylint (#629)
exclude np.float128 type registration in MacM1 (#624)
fix numpy_pandas_coercible bug dealing with single element (#626)
update pylint (#630)

Assets 2

13 Sep 00:28

cosmicBboy

v0.7.1

f0ddcbf

0.7.1: Add unique option to DataFrameSchema

Enhancements

add support for Any annotation in schema model (#594)
add support for timezone-aware datetime strategies (#595)
unique keyword arg: replace and deprecate allow_duplicates (#580)
Add support for empty data type annotation in SchemaModel (#602)
support frictionless primary keys with multiple fields (#608)

Bugfixes

unify typing.DataFrame class definitions (#576)
schemas with multi-index columns correctly report errors (#600)
strategies module supports undefined checks in regex columns (#599)
fix validation of check raising error without message (#613)

Docs Improvements

Tutorial: docs/scaling - Bring Pandera to Spark and Dask (#588)

Repo Improvements

use virtualenv instead of conda in ci (#578)

Dependency Changes

remove frictionless from core pandera deps (#609)
docs/requirements.txt pin setuptools (#611)

Contributors

🎉🎉 Big shout out to all the contributors on this release 🎉🎉

Contributors

admackin, tfwillems, and 3 other contributors

Assets 2

0 Join discussion

06 Aug 02:30

cosmicBboy

v0.7.0

abc817f

0.7.0: Pandera Type System Overhaul

Enhancements

Add support for frictionless schemas (#454) [docs]
decouple pandera and pandas dtypes (#559) [docs]
Unify dataframe definitions to fix auto-complete #576
Report all failure cases when coercing dtypes fails (#584)

Bugfixes

Handle case of pandas.DataFrame with pandas.MultiIndex in pandera.error_formatters.reshape_failure_cases (#560)
Add 'ordered.setter' decorator (#567)
Fix decorators on classmethods (#568)
better handling of datetime/timedelta in serialize/deserialize (#585)

Docs Improvements

Update contributing guide ccca82f
Add documentation build to contributing guide 361fec0
Fix virtualenv instructions in contributing guide ed74a65
Feature/coroutines docs (#570)
Add frictionless documentation (#579)
use python primitive types in docs where possible (#581)

Repo Improvements

Add typing to un-annotated functions (#569)
use virtualenv instead of conda in ci (#578)

Contributors

Big shout out to ✨ @mattHawthorn, @vinisalazar, @cristianmatache, @TColl, @jeffzi, @admackin, and @benkeesey ✨ for your contributions on this release 🎉🎉🎉

Contributors

admackin, mattHawthorn, and 5 other contributors

Assets 2

13 Jul 19:21

cosmicBboy

v0.6.5

2c30d13

0.6.5: Support coroutines, regex matching on non-str column names, bugfixes

Enhancements

Raise error if check_obj.index is MultiIndex when using pandera.Index (#483)
support decorators for coroutines (#546)
added py.typed and typed Series descriptor (#543)
select non-str column names with regex=True (#551)

Bugfixes

check decorators support non-DataFrame types (#510)
lazy validation correctly reports all errors (#528)
don't drop duplicates for series failure cases (#535)
custom dataframe-level checks don't corrupt data-synthesis strategy #550

Contributors

Thanks to @jekwatt @cristianmatache @lkadin for your first-time contributions! 🎉🎉🎉

Assets 2

0 Join discussion

08 May 16:08

cosmicBboy

v0.6.4

41bd759

0.6.4: Support dataframe-level checks in SchemaModel Config, Bugfixes

New Features

Allow attaching registered dataframe checks by using Config field names (#478)

Bugfixes

alias propagation works correctly on empty subclass (#446)
Add missing inplace arg to SchemaModel's validate (#450)
fix check_types decorator should return results from validate (#458)
Dataframe schemas in yaml do not require any field (#479)
coerce=True and pandas_dtype=None should be a noop (#476)

Doc Improvement

update documentation css to fit mobile (#447)
add copy button to docs (#448)
link documentation to github (#449)

Infrastructure Changes

add bugfixes and release branches to github actions eb38173
fix github action triggers 3191be9
update bug report template b0db5b0
bump cache 0ee703f
noxfile fixes 489695d
update pylint (#477)

Assets 2

0 Join discussion

28 Mar 02:28

cosmicBboy

v0.6.3

45aaa2c

0.6.3: Bugfixes, update docs

New Features

add new method SchemaModel.to_yaml to serialize SchemaModels to yaml #428

Bugfixes

preserve pandas extension types during validation (#443)
Fix to_yaml serialization dropping global checks (#428) 🎉 first contribution @antonl 🎉
fix empty data type not supported for serialization (#435)
fix empty SchemaModel (#434)
add doc about attributes excluded by SchemaModel (#436) @jeffzi
fix DataFrameSchema comparison with non-DataFrameSchema (#431) @jeffzi
schema serialization handles non-PandasDtype (#424)
pa.Object coerce should preserve object type (#423)

Documentation

Update documentation theme to use furo #444
Add favicon e3540f1

Assets 2

16 Feb 17:17

cosmicBboy

v0.6.2

f99b163

0.6.2: SchemaModel and synthesis bugfixes

New Feature

Add SchemaModel column name access through class attributes (#388) @jespercodes @jeffzi 🎉
Parametrized PandasExtensionType types (#389) @jeffzi 🎉
adding filter argument to strict parameter (#401) @ktroutman
feature/341: improve str and repr methods for schemas (#413)

Bugfixes

fix py3.6 optional + literal dtypes in SchemaModel (#379) @jeffzi 🎉
Fix minimally required packaging version (#380) contribution #1️⃣ @probberechts 🎉
prevent mypy Check getattr error for registered checks 920a98c
Compatibility with numpy 1.20 (#395) @jeffzi
dataframe strategies can generate regex columns (#402)
bugfix: df data synthesis with size=None, fix CI (#410)
bugfix: SeriesSchema raises SchemaErrors on lazy validation (#412)

Repo Improvements

improvements to local CI (#409) @jeffzi
feature/414: improve contributing docs and add to sphinx docs (#416)

Assets 2

07 Jan 01:03

cosmicBboy

v0.6.1

bfdb118

0.6.1: coercion and required column bugfixes

Bugfix Release

This release contains two bugfixes:

coerce nullable str column handles all na (#366)
non-required columns that are not in dataframe are not coerced (#368)

Assets 2

Releases: unionai-oss/pandera

0.8.1: Mypy Plugin, Better Editor Type Annotation Autocomplete, Pickleable SchemaError(s), Improved Error-reporting, Bugfixes

Enhancements

Bugfixes

Docs Improvements

Contributors

Contributors

0.8.0: Integrate with Dask, Koalas, Modin, Pydantic, Mypy

Community Announcements

Highlights

Schema support for Dask, Koalas, Modin

Pydantic Integration

Mypy Integration

Enhancements

Bugfixes

Docs Improvements

Testing Improvements

Internals Improvements

Contributors

Contributors

0.7.2: Bugfixes

Bugfixes

0.7.1: Add unique option to DataFrameSchema

Enhancements

Bugfixes

Docs Improvements

Repo Improvements

Dependency Changes

Contributors

Contributors

0.7.0: Pandera Type System Overhaul

Enhancements

Bugfixes

Docs Improvements

Repo Improvements

Contributors

Contributors

0.6.5: Support coroutines, regex matching on non-str column names, bugfixes

Enhancements

Bugfixes

Contributors

0.6.4: Support dataframe-level checks in SchemaModel Config, Bugfixes

New Features

Bugfixes

Doc Improvement

Infrastructure Changes

0.6.3: Bugfixes, update docs

New Features

Bugfixes

Documentation

0.6.2: SchemaModel and synthesis bugfixes

New Feature

Bugfixes

Repo Improvements

0.6.1: coercion and required column bugfixes

Bugfix Release