Releases: unionai-oss/pandera
Release v0.18.1: Granular control of validation on pandas dfs.
✨ Highlights ✨
Granular control of pandas validation #1490
There is now support for granular control of schema-level or data-level validations. This can be done via the PANDERA_VALIDATION_DEPTH
environment variable. Schema-level (or metadata) validation includes things like column name checks and column data types, while data-level validation involves checks that operate on actual data values.
export PANDERA_VALIDATION_DEPTH= SCHEMA_AND_DATA # check schema- and data-level checks (default)
export PANDERA_VALIDATION_DEPTH=SCHEMA_ONLY # only do schema-level checks
export PANDERA_VALIDATION_DEPTH=DATA_ONLY # only do data-level checks
Efficient Hypothesis strategies #1503
Pandas data synthesis strategies now uses comparison operator functions for more efficient data synthesis. It also updates the minimum hypothesis
version to 6.92.7
.
What's Changed
- Fix copy-pasted docstring in PySpark accessor test by @deepyaman in #1448
- Mypy precommit by @cosmicBboy in #1468
- @check_types now properly passes in *args **kwargs and checks their types by @ecthompson99 in #1336
- Bump starlette from 0.27.0 to 0.36.2 in /dev by @dependabot in #1484
- Bump fastapi from 0.103.0 to 0.109.1 by @dependabot in #1482
- Bump actions/cache from 3 to 4 by @dependabot in #1478
- Bump codecov/codecov-action from 3 to 4 by @dependabot in #1477
- Bump jinja2 from 3.1.2 to 3.1.3 by @dependabot in #1459
- fix: pin multimethod dep version (#1485) by @schatimo in #1486
- Fix issue where str dtype in a multiindex dataframe schema results in invalid example by @gsugar87 in #1050
- Bump python-multipart from 0.0.6 to 0.0.7 by @dependabot in #1496
- Bump python-multipart from 0.0.6 to 0.0.7 in /dev by @dependabot in #1495
- Bump python-multipart from 0.0.6 to 0.0.7 in /ci by @dependabot in #1494
- Bump jinja2 from 3.1.2 to 3.1.3 in /ci by @dependabot in #1457
- Bump starlette from 0.27.0 to 0.36.2 in /dev by @dependabot in #1489
- Bugfix/1463 Pandas 2.2.0 FutureWarning resolution by using assignment instead of … by @derinwalters in #1464
- Bump jinja2 from 3.1.2 to 3.1.3 in /dev by @dependabot in #1458
- add pandas 2.2.0 to tests, use uv for pip compile by @cosmicBboy in #1502
- Efficient Hypothesis strategies by @Zac-HD in #1503
- remove headers in requirements files by @cosmicBboy in #1512
- Granular validations on pandas dfs by @kykyi in #1490
New Contributors
- @deepyaman made their first contribution in #1448
- @ecthompson99 made their first contribution in #1336
- @schatimo made their first contribution in #1486
- @gsugar87 made their first contribution in #1050
- @Zac-HD made their first contribution in #1503
Full Changelog: v0.18.0...v0.18.1
Release v0.18.0: Pandas schemas supports global configuration
✨ Highlight ✨
Pandera now supports the configuration environment variable PANDERA_VALIDATION_ENABLED
.
export PANDERA_VALIDATION_ENABLED=False
now globally deactivates validation.
What's Changed
- Bump urllib3 from 2.0.4 to 2.0.7 by @dependabot in #1383
- Bump urllib3 from 2.0.5 to 2.0.7 in /dev by @dependabot in #1382
- Bump urllib3 from 2.0.4 to 2.0.7 in /ci by @dependabot in #1381
- Bugfix/1278 add_missing_columns assorted bugfixes by @derinwalters in #1372
- Fix lack of support for new TimestampNTZType in Spark 3.4 datatypes by @filipeo2-mck in #1385
- Current
pip-compile
usage does not have--no-emit-index-url
by @filipeo2-mck in #1390 - Avoid throwing exception on Union types by @mjgp2 in #1378
- Fix optional fields in PySpark SQL by @filipeo2-mck in #1387
- Add support for
unique
validation in PySpark by @filipeo2-mck in #1396 - Enhancement to support GeoDataFrame, Geometry coercion, and CRS (Feature/1108) by @derinwalters in #1392
- fix issue for optional fields by @coobas in #1258
- Fix validating pyspark dataframes with regex columns by @lexanth in #1397
- Bump pyarrow from 13.0.0 to 14.0.1 by @dependabot in #1417
- Bump pyarrow from 13.0.0 to 14.0.1 in /dev by @dependabot in #1416
- Bump pyarrow from 13.0.0 to 14.0.1 in /ci by @dependabot in #1415
- [BUGFIX] [PYSPARK] Avoid running nullable checks if
nullable=True
by @filipeo2-mck in #1403 - Add Date type to pandera.all by @diederikperdok in #1419
- Fix disabling validation for PySpark DataFrame Schemas by @maxispeicher in #1407
- Bump actions/checkout from 3 to 4 by @dependabot in #1361
- [PySpark] Improve validation performance by enabling
cache()
/unpersist()
toggles by @filipeo2-mck in #1414 - Bump urllib3 from 2.0.5 to 2.0.7 by @dependabot in #1420
- Generate localized timestamps in multiindex examples by @rob-sil in #1426
- feature: support string column validation for pandas 2.1.3 by @karlma821 in #1425
- Add support for
PANDERA_VALIDATION_ENABLED
for pandas and Configuration docs by @noklam in #1354 - update total download badge and fix contributing instructions by @cosmicBboy in #1436
- update cache dataframe config args, fix tests by @cosmicBboy in #1437
- Bump jupyter-server from 2.7.3 to 2.11.2 in /dev by @dependabot in #1440
- Bump cryptography from 41.0.4 to 41.0.6 by @dependabot in #1435
- Bump jupyter-server from 2.7.2 to 2.11.2 by @dependabot in #1441
New Contributors
- @filipeo2-mck made their first contribution in #1385
- @mjgp2 made their first contribution in #1378
- @coobas made their first contribution in #1258
- @lexanth made their first contribution in #1397
- @diederikperdok made their first contribution in #1419
- @maxispeicher made their first contribution in #1407
- @rob-sil made their first contribution in #1426
- @karlma821 made their first contribution in #1425
- @noklam made their first contribution in #1354
Full Changelog: v0.17.2...v0.18.0
v0.18.0b0: Beta release
beta release v0.18.0b0
Release v0.17.2: Improve PydanticModel performance
What's Changed
- improve pydantic model efficiency by @cosmicBboy in #1358
Full Changelog: v0.17.1...v0.17.2
Release v0.17.1: Python generic types bugfix
What's Changed
- bugfix: empty list/dicts and None values should be handled by @cosmicBboy in #1347
- add unit tests by @cosmicBboy in #1351
Full Changelog: v0.17.0...v0.17.1
Release v0.17.0: Add support for pydantic v2
⭐️ Highlight
This release adds support for pydantic v2. Pydantic < v2 should be supported for the foreseeable future.
What's Changed
- fix: docstrong for to_script by @tmcclintock in #1266
- fix multimethod bug in pyspark by @cosmicBboy in #1260
- Fix CI tests by @cosmicBboy in #1303
- Very minor reworking of error message by @nathanjmcdougall in #1304
- Fix typo in docs by @nathanjmcdougall in #1300
- Update drop_invalid_rows.rst by @cosmicBboy in #1277
- CONTRIBUTING.md Typo fixes and Markdown conventions by @nathanjmcdougall in #1290
- Move black config to pyproject.toml by @nathanjmcdougall in #1292
- Bugfix: Update drop invalid logic to handle multi-index dfs by @kykyi in #1320
- Use mirrors-mypy for pre-commit by @nathanjmcdougall in #1291
- Generalize mypy ignore to pass linter CI by @nathanjmcdougall in #1321
- raise_warning -> new SchemaWarning not UserWarning by @nathanjmcdougall in #1298
- Doc for coercion behaviour for whole schema by @nathanjmcdougall in #1289
- support pydantic v2 by @cosmicBboy in #1253
- unpin mypy from requirements, xfail on unstable tests by @cosmicBboy in #1338
- add ci requirements using pip, drop python 3.7 support by @cosmicBboy in #1340
New Contributors
- @tmcclintock made their first contribution in #1266
Full Changelog: v0.16.1...v0.17.0
Beta release v0.17.0b0: Add support for pydantic v2
What's Changed
- fix: docstrong for to_script by @tmcclintock in #1266
- fix multimethod bug in pyspark by @cosmicBboy in #1260
- Fix CI tests by @cosmicBboy in #1303
- Very minor reworking of error message by @nathanjmcdougall in #1304
- Fix typo in docs by @nathanjmcdougall in #1300
- Update drop_invalid_rows.rst by @cosmicBboy in #1277
- CONTRIBUTING.md Typo fixes and Markdown conventions by @nathanjmcdougall in #1290
- Move black config to pyproject.toml by @nathanjmcdougall in #1292
- Bugfix: Update drop invalid logic to handle multi-index dfs by @kykyi in #1320
- Use mirrors-mypy for pre-commit by @nathanjmcdougall in #1291
- Generalize mypy ignore to pass linter CI by @nathanjmcdougall in #1321
- raise_warning -> new SchemaWarning not UserWarning by @nathanjmcdougall in #1298
- Doc for coercion behaviour for whole schema by @nathanjmcdougall in #1289
- support pydantic v2 by @cosmicBboy in #1253
- unpin mypy from requirements, xfail on unstable tests by @cosmicBboy in #1338
- add ci requirements using pip by @cosmicBboy in #1340
New Contributors
- @tmcclintock made their first contribution in #1266
Full Changelog: v0.16.1...v0.17.0b0
v0.16.1: Bugfix pyspark dependency
What's Changed
- Use pandera-dev in envrc to match the environment.yml by @thomasjpfan in #1264
- remove pyspark dep from common types by @cosmicBboy in #1268
New Contributors
- @thomasjpfan made their first contribution in #1264
Full Changelog: v0.16.0...v0.16.1
v0.16.0: Support Pyspark SQL dataframes
What's Changed
- Use custom check strategies by @honno in #1203
- Bugfix: check for presence of default attribute before calling by @kykyi in #1191
- Remove outdated warning by @tpvasconcelos in #1190
- Static type hint error on class pandera DataFrame by @manel-ab in #1207
- Relax python_requires constraint by @danhje in #1209
- fix typo in docs by @lindenwells in #1201
- Make hypothesis dependency optional (#1215) by @leifwar in #1216
- Update extensions.rst by @nathanjmcdougall in #1219
- Test col-level checks in
test_definied_check_strategy
by @honno in #1224 - Add
unique_values_eq
argument topa.Field
by @karajan1001 in #1230 - Add a Dependabot config to update GitHub workflow actions by @kurtmckee in #1223
- Bump actions/checkout from 2 to 3 by @dependabot in #1234
- Enhancement: drop invalid rows on validate with new param by @kykyi in #1189
- Bump actions/cache from 2 to 3 by @dependabot in #1233
- Bump actions/setup-python from 1 to 4 by @dependabot in #1232
- Bugfix: Ensure defaults are correctly applied by @kykyi in #1240
- Bug fix while ordering optional keys from schema in static method from_records from pandera df by @manel-ab in #1238
- Add add_missing_columns DataFrame schema config per enhancement #687 by @derinwalters in #1186
- Support pyspark sql dataframe validation by @cosmicBboy in #1243
- fix issue with non-required regex-matched columns by @cosmicBboy in #1251
- Pin pydantic < v2 by @cosmicBboy in #1256
New Contributors
- @honno made their first contribution in #1203
- @manel-ab made their first contribution in #1207
- @danhje made their first contribution in #1209
- @lindenwells made their first contribution in #1201
- @leifwar made their first contribution in #1216
- @nathanjmcdougall made their first contribution in #1219
- @karajan1001 made their first contribution in #1230
- @kurtmckee made their first contribution in #1223
- @dependabot made their first contribution in #1234
- @derinwalters made their first contribution in #1186
- @NeerajMalhotra-QB made their first contribution in #1243
- @jaskaransinghsidana made their first contribution in #1243
Full Changelog: v0.15.2...v0.16.0
Beta release: v0.16.0b1 - Docs updates
Full Changelog: v0.16.0b0...v0.16.0b1