Skip to content

Releases: kedro-org/kedro

0.17.3

21 Apr 15:12
ae9f15c
Compare
Choose a tag to compare

Release 0.17.3

Major features and improvements

  • Kedro plugins can now override built-in CLI commands.
  • Added a before_command_run hook for plugins to add extra behaviour before Kedro CLI commands run.
  • pipelines from pipeline_registry.py and register_pipeline hooks are now loaded lazily when they are first accessed, not on startup:
from kedro.framework.project import pipelines

print(pipelines["__default__"])  # pipeline loading is only triggered here

Bug fixes and other changes

  • TemplatedConfigLoader now correctly inserts default values when no globals are supplied.
  • Fixed a bug where the KEDRO_ENV environment variable had no effect on instantiating the context variable in an iPython session or a Jupyter notebook.
  • Plugins with empty CLI groups are no longer displayed in the Kedro CLI help screen.
  • Duplicate commands will no longer appear twice in the Kedro CLI help screen.
  • CLI commands from sources with the same name will show under one list in the help screen.
  • The setup of a Kedro project, including adding src to path and configuring settings, is now handled via the bootstrap_project method.
  • configure_project is invoked if a package_name is supplied to KedroSession.create. This is added for backward-compatibility purpose to support a workflow that creates Session manually. It will be removed in 0.18.0.
  • Stopped swallowing up all ModuleNotFoundError if register_pipelines not found, so that a more helpful error message will appear when a dependency is missing, e.g. Issue #722.
  • When kedro new is invoked using a configuration yaml file, output_dir is no longer a required key; by default the current working directory will be used.
  • When kedro new is invoked using a configuration yaml file, the appropriate prompts.yml file is now used for validating the provided configuration. Previously, validation was always performed against the kedro project template prompts.yml file.
  • When a relative path to a starter template is provided, kedro new now generates user prompts to obtain configuration rather than supplying empty configuration.
  • Fixed error when using starters on Windows with Python 3.7 (Issue #722).
  • Fixed decoding error of config files that contain accented characters by opening them for reading in UTF-8.
  • Fixed an issue where after_dataset_loaded run would finish before a dataset is actually loaded when using --async flag.

Upcoming deprecations for Kedro 0.18.0

  • kedro.versioning.journal.Journal will be removed.
  • The following properties on kedro.framework.context.KedroContext will be removed:
    • io in favour of KedroContext.catalog
    • pipeline (equivalent to pipelines["__default__"])
    • pipelines in favour of kedro.framework.project.pipelines

0.17.2

15 Mar 18:10
eda6762
Compare
Choose a tag to compare

Release 0.17.2

Major features and improvements

  • Added support for compress_pickle backend to PickleDataSet.
  • Enabled loading pipelines without creating a KedroContext instance:
from kedro.framework.project import pipelines

print(pipelines)
  • Projects generated with kedro>=0.17.2:
    • should define pipelines in pipeline_registry.py rather than hooks.py.
    • when run as a package, will behave the same as kedro run

Bug fixes and other changes

  • If settings.py is not importable, the errors will be surfaced earlier in the process, rather than at runtime.

Minor breaking changes to the API

  • kedro pipeline list and kedro pipeline describe no longer accept redundant --env parameter.
  • from kedro.framework.cli.cli import cli no longer includes the new and starter commands.

Upcoming deprecations for Kedro 0.18.0

  • kedro.framework.context.KedroContext.run will be removed in release 0.18.0.

Thanks for supporting contributions

Sasaki Takeru

0.17.1

04 Mar 14:31
535570a
Compare
Choose a tag to compare

Release 0.17.1

Major features and improvements

  • Added env and extra_params to reload_kedro() line magic.
  • Extended the pipeline() API to allow strings and sets of strings as inputs and outputs, to specify when a dataset name remains the same (not namespaced).
  • Added the ability to add custom prompts with regexp validator for starters by repurposing default_config.yml as prompts.yml.
  • Added the env and extra_params arguments to register_config_loader hook.
  • Refactored the way settings are loaded. You will now be able to run:
from kedro.framework.project import settings

print(settings.CONF_ROOT)

Bug fixes and other changes

  • The version of a packaged modular pipeline now defaults to the version of the project package.
  • Added fix to prevent new lines being added to pandas CSV datasets.
  • Fixed issue with loading a versioned SparkDataSet in the interactive workflow.
  • Kedro CLI now checks pyproject.toml for a tool.kedro section before treating the project as a Kedro project.
  • Added fix to DataCatalog::shallow_copy now it should copy layers.
  • kedro pipeline pull now uses pip download for protocols that are not supported by fsspec.
  • Cleaned up documentation to fix broken links and rewrite permanently redirected ones.
  • Added a jsonschema schema definition for the Kedro 0.17 catalog.
  • kedro install now waits on Windows until all the requirements are installed.
  • Exposed --to-outputs option in the CLI, throughout the codebase, and as part of hooks specifications.
  • Fixed a bug where ParquetDataSet wasn't creating parent directories on the fly.
  • Updated documentation.

Breaking changes to the API

  • This release has broken the kedro ipython and kedro jupyter workflows. To fix this, follow the instructions in the migration guide below.

Note: If you're using the ipython extension instead, you will not encounter this problem.

Migration guide

You will have to update the file <your_project>/.ipython/profile_default/startup/00-kedro-init.py in order to make kedro ipython and/or kedro jupyter work. Add the following line before the KedroSession is created:

configure_project(metadata.package_name)  # to add

session = KedroSession.create(metadata.package_name, path)

Make sure that the associated import is provided in the same place as others in the file:

from kedro.framework.project import configure_project  # to add
from kedro.framework.session import KedroSession

Thanks for supporting contributions

Mariana Silva,
Kiyohito Kunii,
noklam,
Ivan Doroshenko,
Zain Patel,
Deepyaman Datta,
Sam Hiscox,
Pascal Brokmeier

0.17.0

17 Dec 13:29
fb88cc2
Compare
Choose a tag to compare

Release 0.17.0

Major features and improvements

  • In a significant change, we have introduced KedroSession which is responsible for managing the lifecycle of a Kedro run.
  • Created a new Kedro Starter: kedro new --starter=mini-kedro. It is possible to use the DataCatalog as a standalone component in a Jupyter notebook and transition into the rest of the Kedro framework.
  • Added DatasetSpecs with Hooks to run before and after datasets are loaded from/saved to the catalog.
  • Added a command: kedro catalog create. For a registered pipeline, it creates a <conf_root>/<env>/catalog/<pipeline_name>.yml configuration file with MemoryDataSet datasets for each dataset that is missing from DataCatalog.
  • Added settings.py and pyproject.toml (to replace .kedro.yml) for project configuration, in line with Python best practice.
  • ProjectContext is no longer needed, unless for very complex customisations. KedroContext, ProjectHooks and settings.py together implement sensible default behaviour. As a result context_path is also now an optional key in pyproject.toml.
  • Removed ProjectContext from src/<package_name>/run.py.
  • TemplatedConfigLoader now supports Jinja2 template syntax alongside its original syntax.
  • Made registration Hooks mandatory, as the only way to customise the ConfigLoader or the DataCatalog used in a project. If no such Hook is provided in src/<package_name>/hooks.py, a KedroContextError is raised. There are sensible defaults defined in any project generated with Kedro >= 0.16.5.

Bug fixes and other changes

  • ParallelRunner no longer results in a run failure, when triggered from a notebook, if the run is started using KedroSession (session.run()).
  • before_node_run can now overwrite node inputs by returning a dictionary with the corresponding updates.
  • Added minimal, black-compatible flake8 configuration to the project template.
  • Moved isort and pytest configuration from <project_root>/setup.cfg to <project_root>/pyproject.toml.
  • Extra parameters are no longer incorrectly passed from KedroSession to KedroContext.
  • Relaxed pyspark requirements to allow for installation of pyspark 3.0.
  • Added a --fs-args option to the kedro pipeline pull command to specify configuration options for the fsspec filesystem arguments used when pulling modular pipelines from non-PyPI locations.
  • Bumped maximum required fsspec version to 0.9.
  • Bumped maximum supported s3fs version to 0.5 (S3FileSystem interface has changed since 0.4.1 version).

Deprecations

  • In Kedro 0.17.0 we have deleted the deprecated kedro.cli and kedro.context modules in favour of kedro.framework.cli and kedro.framework.context respectively.

Other breaking changes to the API

  • kedro.io.DataCatalog.exists() returns False when the dataset does not exist, as opposed to raising an exception.
  • The pipeline-specific catalog.yml file is no longer automatically created for modular pipelines when running kedro pipeline create. Use kedro catalog create to replace this functionality.
  • Removed include_examples prompt from kedro new. To generate boilerplate example code, you should use a Kedro starter.
  • Changed the --verbose flag from a global command to a project-specific command flag (e.g kedro --verbose new becomes kedro new --verbose).
  • Dropped support of the dataset_credentials key in credentials in PartitionedDataSet.
  • get_source_dir() was removed from kedro/framework/cli/utils.py.
  • Dropped support of get_config, create_catalog, create_pipeline, template_version, project_name and project_path keys by get_project_context() function (kedro/framework/cli/cli.py).
  • kedro new --starter now defaults to fetching the starter template matching the installed Kedro version.
  • Renamed kedro_cli.py to cli.py and moved it inside the Python package (src/<package_name>/), for a better packaging and deployment experience.
  • Removed .kedro.yml from the project template and replaced it with pyproject.toml.
  • Removed KEDRO_CONFIGS constant (previously residing in kedro.framework.context.context).
  • Modified kedro pipeline create CLI command to add a boilerplate parameter config file in conf/<env>/parameters/<pipeline_name>.yml instead of conf/<env>/pipelines/<pipeline_name>/parameters.yml. CLI commands kedro pipeline delete / package / pull were updated accordingly.
  • Removed get_static_project_data from kedro.framework.context.
  • Removed KedroContext.static_data.
  • The KedroContext constructor now takes package_name as first argument.
  • Replaced context property on KedroSession with load_context() method.
  • Renamed _push_session and _pop_session in kedro.framework.session.session to _activate_session and _deactivate_session respectively.
  • Custom context class is set via CONTEXT_CLASS variable in src/<your_project>/settings.py.
  • Removed KedroContext.hooks attribute. Instead, hooks should be registered in src/<your_project>/settings.py under the HOOKS key.
  • Restricted names given to nodes to match the regex pattern [\w\.-]+$.
  • Removed KedroContext._create_config_loader() and KedroContext._create_data_catalog(). They have been replaced by registration hooks, namely register_config_loader() and register_catalog() (see also upcoming deprecations).

Upcoming deprecations for Kedro 0.18.0

  • kedro.framework.context.load_context will be removed in release 0.18.0.
  • kedro.framework.cli.get_project_context will be removed in release 0.18.0.
  • We've added a DeprecationWarning to the decorator API for both node and pipeline. These will be removed in release 0.18.0. Use Hooks to extend a node's behaviour instead.
  • We've added a DeprecationWarning to the Transformers API when adding a transformer to the catalog. These will be removed in release 0.18.0. Use Hooks to customise the load and save methods.

Thanks for supporting contributions

Deepyaman Datta, Zach Schuster

Migration guide from Kedro 0.16.* to 0.17.*

Reminder: Our documentation on how to upgrade Kedro covers a few key things to remember when updating any Kedro version.

The Kedro 0.17.0 release contains some breaking changes. If you update Kedro to 0.17.0 and then try to work with projects created against earlier versions of Kedro, you may encounter some issues when trying to run kedro commands in the terminal for that project. Here's a short guide to getting your projects running against the new version of Kedro.

Note: As always, if you hit any problems, please check out our documentation:

To get an existing Kedro project to work after you upgrade to Kedro 0.17.0, we recommend that you create a new project against Kedro 0.17.0 and move the code from your existing project into it. Let's go through the changes, but first, note that if you create a new Kedro project with Kedro 0.17.0 you will not be asked whether you want to include the boilerplate code for the Iris dataset example. We've removed this option (you should now use a Kedro starter if you want to create a project that is pre-populated with code).

To create a new, blank Kedro 0.17.0 project to drop your existing code into, you can create one, as always, with kedro new. We also recommend creating a new virtual environment for your new project, or you might run into conflicts with existing dependencies.

  • Update pyproject.toml: Copy the following three keys from the .kedro.yml of your existing Kedro project into the pyproject.toml file of your new Kedro 0.17.0 project:
[tools.kedro]
package_name = "<package_name>"
project_name = "<project_name>"
project_version = "0.17.0"

Check your source directory. If you defined a different source directory (source_dir), make sure you also move that to pyproject.toml.

  • Copy files from your existing project:

    • Copy subfolders of project/src/project_name/pipelines from existing to new project
    • Copy subfolders of project/src/test/pipelines from existing to new project
    • Copy the requirements your project needs into requirements.txt and/or requirements.in.
    • Copy your project configuration from the conf folder. Take note of the new locations needed for modular pipeline configuration (move it from conf/<env>/pipeline_name/catalog.yml to conf/<env>/catalog/pipeline_name.yml and likewise for parameters.yml).
    • Copy from the data/ folder of your existing project, if needed, into the same location in your new project.
    • Copy any Hooks from src/<package_name>/hooks.py.
  • Update your new project's README and docs as necessary.

  • Update settings.py: For example, if you specified additional Hook implementations in hooks, or listed plugins under disable_hooks_by_plugin in your .kedro.yml, you will need to move them to settings.py accordingly:

from <package_name>.hooks import MyCustomHooks, ProjectHooks

HOOKS = (ProjectHooks(), MyCustomHooks())

DISABLE_HOOKS_FOR_PLUGINS = ("my_plugin1",)
  • **Mig...
Read more

0.16.6

23 Oct 10:42
029d40a
Compare
Choose a tag to compare

Major features and improvements

  • Added documentation with a focus on single machine and distributed environment deployment; the series includes Docker, Argo, Prefect, Kubeflow, AWS Batch, AWS Sagemaker and extends our section on Databricks
  • Added kedro-starter-spaceflights alias for generating a project: kedro new --starter spaceflights.

Bug fixes and other changes

  • Fixed TypeError when converting dict inputs to a node made from a wrapped partial function.
  • PartitionedDataSet improvements:
    • Supported passing arguments to the underlying filesystem.
  • Improved handling of non-ASCII word characters in dataset names.
    • For example, a dataset named jalapeño will be accessible as DataCatalog.datasets.jalapeño rather than DataCatalog.datasets.jalape__o.
  • Fixed kedro install for an Anaconda environment defined in environment.yml.
  • Fixed backwards compatibility with templates generated with older Kedro versions <0.16.5. No longer need to update .kedro.yml to use kedro lint and kedro jupyter notebook convert.
  • Improved documentation.
  • Added documentation using MinIO with Kedro.
  • Improved error messages for incorrect parameters passed into a node.
  • Fixed issue with saving a TensorFlowModelDataset in the HDF5 format with versioning enabled.
  • Added missing run_result argument in after_pipeline_run Hooks spec.
  • Fixed a bug in IPython script that was causing context hooks to be registered twice. To apply this fix to a project generated with an older Kedro version, apply the same changes made in this PR to your 00-kedro-init.py file.

Thanks for supporting contributions

Deepyaman Datta, Bhavya Merchant, Lovkush Agarwal, Varun Krishna S, Sebastian Bertoli, noklam, Daniel Petti, Waylon Walker

0.16.5

09 Sep 11:21
f9100f8
Compare
Choose a tag to compare

Major features and improvements

  • Added the following new datasets.
Type Description Location
email.EmailMessageDataSet Manage email messages using the Python standard library kedro.extras.datasets.email
  • Added support for pyproject.toml to configure Kedro. pyproject.toml is used if .kedro.yml doesn't exist (Kedro configuration should be under [tool.kedro] section).
  • Projects created with this version will have no pipeline.py, having been replaced by hooks.py.
  • Added a set of registration hooks, as the new way of registering library components with a Kedro project:
    • register_pipelines(), to replace _get_pipelines()
    • register_config_loader(), to replace _create_config_loader()
    • register_catalog(), to replace _create_catalog()
      These can be defined in src/<package-name>/hooks.py and added to .kedro.yml (or pyproject.toml). The order of execution is: plugin hooks, .kedro.yml hooks, hooks in ProjectContext.hooks.
  • Added ability to disable auto-registered Hooks using .kedro.yml (or pyproject.toml) configuration file.

Bug fixes and other changes

  • Added option to run asynchronously via the Kedro CLI.
  • Absorbed .isort.cfg settings into setup.cfg.
  • project_name, project_version and package_name now have to be defined in .kedro.yml for projects generated using Kedro 0.16.5+.
  • Packaging a modular pipeline raises an error if the pipeline directory is empty or non-existent.

Thanks for supporting contributions

Deepyaman Datta, Bas Nijholt, Sebastian Bertoli

0.16.4

30 Jul 10:22
e7cf14d
Compare
Choose a tag to compare

Release 0.16.4

Major features and improvements

  • Enabled auto-discovery of hooks implementations coming from installed plugins.

Bug fixes and other changes

  • Fixed a bug for using ParallelRunner on Windows.
  • Modified GBQTableDataSet to load customised results using customised queries from Google Big Query tables.
  • Documentation improvements.

Thanks for supporting contributions

Ajay Bisht, Vijay Sajjanar, Deepyaman Datta, Sebastian Bertoli, Shahil Mawjee, Louis Guitton, Emanuel Ferm

0.16.3

13 Jul 11:37
7152b41
Compare
Choose a tag to compare

Release 0.16.3

0.16.2

15 Jun 14:36
a4fe8d1
Compare
Choose a tag to compare

Major features and improvements

  • Added the following new datasets.
Type Description Location
pandas.AppendableExcelDataSet Works with Excel file opened in append mode kedro.extras.datasets.pandas
tensorflow.TensorFlowModelDataset Works with TensorFlow models using TensorFlow 2.X kedro.extras.datasets.tensorflow
holoviews.HoloviewsWriter Works with Holoviews objects (saves as image file) kedro.extras.datasets.holoviews
  • kedro install will now compile project dependencies (by running kedro build-reqs behind the scenes) before the installation if the src/requirements.in file doesn't exist.
  • Added only_nodes_with_namespace in Pipeline class to filter only nodes with a specified namespace.
  • Added the kedro pipeline delete command to help delete unwanted or unused pipelines (it won't remove references to the pipeline in your create_pipelines() code).
  • Added the kedro pipeline package command to help package up a modular pipeline. It will bundle up the pipeline source code, tests, and parameters configuration into a .whl file.

Bug fixes and other changes

  • Improvement in DataCatalog:
    • Introduced regex filtering to the DataCatalog.list() method.
    • Non-alphanumeric characters (except underscore) in dataset name are replaced with __ in DataCatalog.datasets, for ease of access to transcoded datasets.
  • Improvement in Datasets:
    • Improved initialization speed of spark.SparkHiveDataSet.
    • Improved S3 cache in spark.SparkDataSet.
    • Added support of options for building pyarrow table in pandas.ParquetDataSet.
  • Improvement in kedro build-reqs CLI command:
    • kedro build-reqs is now called with -q option and will no longer print out compiled requirements to the console for security reasons.
    • All unrecognized CLI options in kedro build-reqs command are now passed to pip-compile call (e.g. kedro build-reqs --generate-hashes).
  • Improvement in kedro jupyter CLI command:
    • Improved error message when running kedro jupyter notebook, kedro jupyter lab or kedro ipython with Jupyter/IPython dependencies not being installed.
    • Fixed %run_viz line magic for showing kedro viz inside a Jupyter notebook. For the fix to be applied on existing Kedro project, please see the migration guide.
    • Fixed the bug in IPython startup script (issue 298).
  • Documentation improvements:
    • Updated community-generated content in FAQ.
    • Added find-kedro and kedro-static-viz to the list of community plugins.
    • Add missing pillow.ImageDataSet entry to the documentation.

Breaking changes to the API

Migration guide from Kedro 0.16.1 to 0.16.2

Guide to apply the fix for %run_viz line magic in existing project

Even though this release ships a fix for project generated with kedro==0.16.2, after upgrading, you will still need to make a change in your existing project if it was generated with kedro>=0.16.0,<=0.16.1 for the fix to take effect. Specifically, please change the content of your project's IPython init script located at .ipython/profile_default/startup/00-kedro-init.py with the content of this file. You will also need kedro-viz>=3.3.1.

Thanks for supporting contributions

Miguel Rodriguez Gutierrez, Joel Schwarzmann, w0rdsm1th, Deepyaman Datta, Tam-Sanh Nguyen, Marcus Gawronsky

0.16.1

21 May 12:46
d291a21
Compare
Choose a tag to compare

Bug fixes and other changes

  • Fixed deprecation warnings from kedro.cli and kedro.context when running kedro jupyter notebook.
  • Fixed a bug where catalog and context were not available in Jupyter Lab and Notebook.
  • Fixed a bug where kedro build-reqs would fail if you didn't have your project dependencies installed.