Skip to content

Commit

Permalink
feat: Release kedro-datasets version 3.0.0 (#644)
Browse files Browse the repository at this point in the history
* bump up kedro-datasets version to 3.0.0

Signed-off-by: lrcouto <[email protected]>

* Reformatted release notes

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed typo

Signed-off-by: Elena Khaustova <[email protected]>

---------

Signed-off-by: lrcouto <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Co-authored-by: Elena Khaustova <[email protected]>
Co-authored-by: Elena Khaustova <[email protected]>
  • Loading branch information
3 people authored Apr 10, 2024
1 parent 90efa7d commit 80ba790
Show file tree
Hide file tree
Showing 2 changed files with 64 additions and 10 deletions.
72 changes: 63 additions & 9 deletions kedro-datasets/RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,26 @@
# Upcoming Release
## Major features and improvements

## Bug fixes and other changes

## Community contributions

# Release 3.0.0
## Major features and improvements

* Added the following new datasets:

| Type | Description | Location |
|-------------------------|-----------------------------------------------------------|-------------------------|
| `netcdf.NetCDFDataset` | A dataset for loading and saving `*.nc` files. | `kedro_datasets.netcdf` |
| `ibis.TableDataset` | A dataset for loading and saving using Ibis's backends. | `kedro_datasets.ibis` |

* Added support for Python 3.12.
* Normalised optional dependencies names for datasets to follow [PEP 685](https://peps.python.org/pep-0685/). The `.` characters have been replaced with `-` in the optional dependencies names. Note that this might be breaking for some users. For example, users should now install optional dependencies for `pandas.ParquetDataset` from `kedro-datasets` like this:
```bash
pip install kedro-datasets[pandas-parquetdataset]
```
* Removed `setup.py` and move to `pyproject.toml` completely for `kedro-datasets`.
* Added `NetCDFDataset` for loading and saving `*.nc` files.
* Added dataset to load/save with Ibis.

## Bug fixes and other changes
* If using MSSQL, `load_args:params` will be typecasted as tuple.
Expand All @@ -23,7 +36,13 @@ Many thanks to the following Kedroids for contributing PRs to this release:

# Release 2.1.0
## Major features and improvements
* Added `MatlabDataset` which uses `scipy` to save and load `.mat` files.

* Added the following new datasets:

| Type | Description | Location |
|------------------------|-------------------------------------------------------------|-------------------------|
| `matlab.MatlabDataset` | A dataset which uses `scipy` to save and load `.mat` files. | `kedro_datasets.matlab` |

* Extended preview feature for matplotlib, plotly and tracking datasets.
* Allowed additional parameters for sqlalchemy engine when using sql datasets.

Expand All @@ -38,8 +57,15 @@ Many thanks to the following Kedroids for contributing PRs to this release:

# Release 2.0.0
## Major features and improvements

* Added the following new datasets:

| Type | Description | Location |
|--------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|------------------------------|
| `huggingface.HFDataset` | A dataset to load Hugging Face datasets using the [datasets](https://pypi.org/project/datasets) library. | `kedro_datasets.huggingface` |
| `huggingface.HFTransformerPipelineDataset` | A dataset to load pretrained Hugging Face transformers using the [transformers](https://pypi.org/project/transformers) library. | `kedro_datasets.huggingface` |

* Removed Dataset classes ending with "DataSet", use the "Dataset" spelling instead.
* Added Hugging Face datasets `huggingface.HFDataset` and `huggingface.HFTransformerPipelineDataset`.
* Removed support for Python 3.7 and 3.8.
* Added [databricks-connect>=13.0](https://docs.databricks.com/en/dev-tools/databricks-connect-ref.html) support for Spark- and Databricks-based datasets.
* Bumped `s3fs` to latest calendar-versioned release.
Expand All @@ -59,8 +85,14 @@ Many thanks to the following Kedroids for contributing PRs to this release:

# Release 1.8.0
## Major features and improvements

* Added the following new datasets:

| Type | Description | Location |
|------------------------------|------------------------------------------------------------------------|-------------------------|
| `polars.LazyPolarsDataset` | A `LazyPolarsDataset` using [polars](https://www.pola.rs/)'s Lazy API. | `kedro_datasets.polars` |

* Moved `PartitionedDataSet` and `IncrementalDataSet` from the core Kedro repo to `kedro-datasets` and renamed to `PartitionedDataset` and `IncrementalDataset`.
* Added `polars.LazyPolarsDataset`, a `GenericDataSet` using [polars](https://www.pola.rs/)'s Lazy API.
* Renamed `polars.GenericDataSet` to `polars.EagerPolarsDataset` to better reflect the difference between the two dataset classes.
* Added a deprecation warning when using `polars.GenericDataSet` or `polars.GenericDataset` that these have been renamed to `polars.EagerPolarsDataset`
* Delayed backend connection for `pandas.SQLTableDataset`, `pandas.SQLQueryDataset`, and `snowflake.SnowparkTableDataset`. In practice, this means that a dataset's connection details aren't used (or validated) until the dataset is accessed. On the plus side, the cost of connection isn't incurred regardless of when or whether the dataset is used.
Expand All @@ -85,7 +117,12 @@ Many thanks to the following Kedroids for contributing PRs to this release:

# Release 1.7.0:
## Major features and improvements
* Added `polars.GenericDataSet`, a `GenericDataSet` backed by [polars](https://www.pola.rs/), a lightning fast dataframe package built entirely using Rust.

* Added the following new datasets:

| Type | Description | Location |
|---------------------------|----------------------------------------------------------------------------------------------------------------------------|-------------------------|
| `polars.GenericDataSet` | A `GenericDataSet` backed by [polars](https://www.pola.rs/), a lightning fast dataframe package built entirely using Rust. | `kedro_datasets.polars` |

## Bug fixes and other changes
* Fixed broken links in docstrings.
Expand Down Expand Up @@ -122,10 +159,16 @@ Many thanks to the following Kedroids for contributing PRs to this release:
# Release 1.5.0

## Major features and improvements

* Added the following new datasets:

| Type | Description | Location |
| -------------------------- |--------------------------------------|-------------------------|
| `pandas.DeltaTableDataSet` | A dataset to work with delta tables. | `kedro_datasets.pandas` |

* Implemented lazy loading of dataset subpackages and classes.
* Suppose that SQLAlchemy, a Python SQL toolkit, is installed in your Python environment. With this change, the SQLAlchemy library will not be loaded (for `pandas.SQLQueryDataSet` or `pandas.SQLTableDataSet`) if you load a different pandas dataset (e.g. `pandas.CSVDataSet`).
* Added automatic inference of file format for `pillow.ImageDataSet` to be passed to `save()`.
* Added `pandas.DeltaTableDataSet`.

## Bug fixes and other changes
* Improved error messages for missing dataset dependencies.
Expand All @@ -151,21 +194,32 @@ Many thanks to the following Kedroids for contributing PRs to this release:
# Release 1.4.0:

## Major features and improvements
* Added `SparkStreamingDataSet`.

* Added the following new datasets:

| Type | Description | Location |
|-------------------------------|-----------------------------------------------------|------------------------|
| `spark.SparkStreamingDataSet` | A dataset to work with PySpark Streaming DataFrame. | `kedro_datasets.spark` |

## Bug fixes and other changes
* Fixed problematic docstrings of `APIDataSet`.

# Release 1.3.0:

## Major features and improvements

* Added the following new datasets:

| Type | Description | Location |
|----------------------------------|---------------------------------------------------------|-----------------------------|
| `databricks.ManagedTableDataSet` | A dataset to access managed delta tables in Databricks. | `kedro_datasets.databricks` |

* Added pandas 2.0 support.
* Added SQLAlchemy 2.0 support (and dropped support for versions below 1.4).
* Added a save method to `APIDataSet`.
* Reduced constructor arguments for `APIDataSet` by replacing most arguments with a single constructor argument `load_args`. This makes it more consistent with other Kedro DataSets and the underlying `requests` API, and automatically enables the full configuration domain: stream, certificates, proxies, and more.
* Relaxed Kedro version pin to `>=0.16`.
* Added `metadata` attribute to all existing datasets. This is ignored by Kedro, but may be consumed by users or external plugins.
* Added `ManagedTableDataSet` for managed delta tables on Databricks.

## Bug fixes and other changes
* Relaxed `delta-spark` upper bound to allow compatibility with Spark 3.1.x and 3.2.x.
Expand Down
2 changes: 1 addition & 1 deletion kedro-datasets/kedro_datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""``kedro_datasets`` is where you can find all of Kedro's data connectors."""

__all__ = ["KedroDeprecationWarning"]
__version__ = "2.1.0"
__version__ = "3.0.0"

import sys
import warnings
Expand Down

0 comments on commit 80ba790

Please sign in to comment.