Skip to content

Commit

Permalink
merge main and address PR comments
Browse files Browse the repository at this point in the history
  • Loading branch information
ravi-kumar-pilla committed Dec 16, 2024
2 parents a338694 + 2df011c commit 8d44ca5
Show file tree
Hide file tree
Showing 11 changed files with 5,829 additions and 7 deletions.
1 change: 1 addition & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

- [ ] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
- [ ] Updated the documentation to reflect the code changes
- [ ] Updated `jsonschema/kedro-catalog-X.XX.json` if necessary
- [ ] Added a description of this change in the relevant `RELEASE.md` file
- [ ] Added tests to cover my changes
- [ ] Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)
2 changes: 1 addition & 1 deletion kedro-datasets/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ If you have new ideas for Kedro-Datasets then please open a [GitHub issue](https

If you're unsure where to begin contributing to Kedro-Datasets, please start by looking through the `good first issue` and `help wanted` on [GitHub](https://github.com/kedro-org/kedro-plugins/issues).
If you want to contribute a new dataset, read the [tutorial to create and contribute a custom dataset](https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html) in the Kedro documentation.
Make sure to add the new dataset to `kedro_datasets.rst` so that it shows up in the API documentation and to `static/jsonschema/kedro-catalog-X.json` for IDE validation.
Make sure to add the new dataset to `kedro_datasets.rst` so that it shows up in the API documentation and to `kedro-datasets/static/jsonschema/kedro-catalog-X.json` for IDE validation.

Below is a guide to help you understand the process of contributing a new dataset, whether it falls under the category of core or experimental datasets.

Expand Down
5 changes: 5 additions & 0 deletions kedro-datasets/RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## Major features and improvements

- Supported passing `database` to `ibis.TableDataset` for load and save operations.
- Added functionality to save pandas DataFrames directly to Snowflake, facilitating seamless `.csv` ingestion.
- Added Python 3.9, 3.10 and 3.11 support for `snowflake.SnowflakeTableDataset`.
- Enabled connection sharing between `ibis.FileDataset` and `ibis.TableDataset` instances, thereby allowing nodes to save data loaded by one to the other (as long as they share the same connection configuration).
Expand All @@ -17,6 +18,8 @@
- Implemented Snowflake's [local testing framework](https://docs.snowflake.com/en/developer-guide/snowpark/python/testing-locally) for testing purposes.
- Improved the dependency management for Spark-based datasets by refactoring the Spark and Databricks utility functions used across the datasets.
- Added deprecation warning for `tracking.MetricsDataset` and `tracking.JSONDataset`.
- Moved `kedro-catalog` JSON schemas from Kedro core to `kedro-datasets`.
- Removed file handling using Ibis's backends from `ibis.TableDataset`. `ibis.FileDataset` will handle loading and saving files using Ibis's backends.

## Breaking Changes

Expand All @@ -28,6 +31,8 @@ Many thanks to the following Kedroids for contributing PRs to this release:

- [Thomas d'Hooghe](https://github.com/tdhooghe)
- [Minura Punchihewa](https://github.com/MinuraPunchihewa)
- [Mark Druffel](https://github.com/mark-druffel)
- [Chris Schopp](https://github.com/chrisschopp)

# Release 5.1.0

Expand Down
24 changes: 20 additions & 4 deletions kedro-datasets/kedro_datasets/ibis/table_dataset.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
"""Provide data loading and saving functionality for Ibis's backends."""

from __future__ import annotations

from copy import deepcopy
Expand Down Expand Up @@ -76,6 +75,7 @@ def __init__( # noqa: PLR0913
self,
*,
table_name: str,
database: str | None = None,
connection: dict[str, Any] | None = None,
load_args: dict[str, Any] | None = None,
save_args: dict[str, Any] | None = None,
Expand All @@ -99,6 +99,12 @@ def __init__( # noqa: PLR0913
Args:
table_name: The name of the table or view to read or create.
database: The name of the database to read the table or view
from or create the table or view in. If not passed, then
the current database is used. Provide a tuple of strings
(e.g. `("catalog", "database")`) or a dotted string path
(e.g. `"catalog.database"`) to reference a table or view
in a multi-level table hierarchy.
connection: Configuration for connecting to an Ibis backend.
If not provided, connect to DuckDB in in-memory mode.
load_args: Additional arguments passed to the Ibis backend's
Expand All @@ -113,17 +119,22 @@ def __init__( # noqa: PLR0913
"""

self._table_name = table_name
self._database = database
self._connection_config = connection or self.DEFAULT_CONNECTION_CONFIG
self.metadata = metadata

# Set load and save arguments, overwriting defaults if provided.
self._load_args = deepcopy(self.DEFAULT_LOAD_ARGS)
if load_args is not None:
self._load_args.update(load_args)
if database is not None:
self._load_args["database"] = database

self._save_args = deepcopy(self.DEFAULT_SAVE_ARGS)
if save_args is not None:
self._save_args.update(save_args)
if database is not None:
self._save_args["database"] = database

self._materialized = self._save_args.pop("materialized")

Expand All @@ -140,18 +151,23 @@ def connection(self) -> BaseBackend:
return self._connection

def load(self) -> ir.Table:
return self.connection.table(self._table_name)
return self.connection.table(self._table_name, **self._load_args)

def save(self, data: ir.Table) -> None:
writer = getattr(self.connection, f"create_{self._materialized}")
writer(self._table_name, data, **self._save_args)

def _describe(self) -> dict[str, Any]:
load_args = deepcopy(self._load_args)
save_args = deepcopy(self._save_args)
load_args.pop("database", None)
save_args.pop("database", None)
return {
"table_name": self._table_name,
"database": self._database,
"backend": self._connection_config["backend"],
"load_args": self._load_args,
"save_args": self._save_args,
"load_args": load_args,
"save_args": save_args,
"materialized": self._materialized,
}

Expand Down
1 change: 0 additions & 1 deletion kedro-datasets/kedro_datasets/pandas/sql_dataset.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
"""``SQLDataset`` to load and save data to a SQL backend."""

from __future__ import annotations

import copy
Expand Down
Loading

0 comments on commit 8d44ca5

Please sign in to comment.