0.17.0
Release 0.17.0
Major features and improvements
- In a significant change, we have introduced
KedroSession
which is responsible for managing the lifecycle of a Kedro run. - Created a new Kedro Starter:
kedro new --starter=mini-kedro
. It is possible to use the DataCatalog as a standalone component in a Jupyter notebook and transition into the rest of the Kedro framework. - Added
DatasetSpecs
with Hooks to run before and after datasets are loaded from/saved to the catalog. - Added a command:
kedro catalog create
. For a registered pipeline, it creates a<conf_root>/<env>/catalog/<pipeline_name>.yml
configuration file withMemoryDataSet
datasets for each dataset that is missing fromDataCatalog
. - Added
settings.py
andpyproject.toml
(to replace.kedro.yml
) for project configuration, in line with Python best practice. ProjectContext
is no longer needed, unless for very complex customisations.KedroContext
,ProjectHooks
andsettings.py
together implement sensible default behaviour. As a resultcontext_path
is also now an optional key inpyproject.toml
.- Removed
ProjectContext
fromsrc/<package_name>/run.py
. TemplatedConfigLoader
now supports Jinja2 template syntax alongside its original syntax.- Made registration Hooks mandatory, as the only way to customise the
ConfigLoader
or theDataCatalog
used in a project. If no such Hook is provided insrc/<package_name>/hooks.py
, aKedroContextError
is raised. There are sensible defaults defined in any project generated with Kedro >= 0.16.5.
Bug fixes and other changes
ParallelRunner
no longer results in a run failure, when triggered from a notebook, if the run is started usingKedroSession
(session.run()
).before_node_run
can now overwrite node inputs by returning a dictionary with the corresponding updates.- Added minimal, black-compatible flake8 configuration to the project template.
- Moved
isort
andpytest
configuration from<project_root>/setup.cfg
to<project_root>/pyproject.toml
. - Extra parameters are no longer incorrectly passed from
KedroSession
toKedroContext
. - Relaxed
pyspark
requirements to allow for installation ofpyspark
3.0. - Added a
--fs-args
option to thekedro pipeline pull
command to specify configuration options for thefsspec
filesystem arguments used when pulling modular pipelines from non-PyPI locations. - Bumped maximum required
fsspec
version to 0.9. - Bumped maximum supported
s3fs
version to 0.5 (S3FileSystem
interface has changed since 0.4.1 version).
Deprecations
- In Kedro 0.17.0 we have deleted the deprecated
kedro.cli
andkedro.context
modules in favour ofkedro.framework.cli
andkedro.framework.context
respectively.
Other breaking changes to the API
kedro.io.DataCatalog.exists()
returnsFalse
when the dataset does not exist, as opposed to raising an exception.- The pipeline-specific
catalog.yml
file is no longer automatically created for modular pipelines when runningkedro pipeline create
. Usekedro catalog create
to replace this functionality. - Removed
include_examples
prompt fromkedro new
. To generate boilerplate example code, you should use a Kedro starter. - Changed the
--verbose
flag from a global command to a project-specific command flag (e.gkedro --verbose new
becomeskedro new --verbose
). - Dropped support of the
dataset_credentials
key in credentials inPartitionedDataSet
. get_source_dir()
was removed fromkedro/framework/cli/utils.py
.- Dropped support of
get_config
,create_catalog
,create_pipeline
,template_version
,project_name
andproject_path
keys byget_project_context()
function (kedro/framework/cli/cli.py
). kedro new --starter
now defaults to fetching the starter template matching the installed Kedro version.- Renamed
kedro_cli.py
tocli.py
and moved it inside the Python package (src/<package_name>/
), for a better packaging and deployment experience. - Removed
.kedro.yml
from the project template and replaced it withpyproject.toml
. - Removed
KEDRO_CONFIGS
constant (previously residing inkedro.framework.context.context
). - Modified
kedro pipeline create
CLI command to add a boilerplate parameter config file inconf/<env>/parameters/<pipeline_name>.yml
instead ofconf/<env>/pipelines/<pipeline_name>/parameters.yml
. CLI commandskedro pipeline delete
/package
/pull
were updated accordingly. - Removed
get_static_project_data
fromkedro.framework.context
. - Removed
KedroContext.static_data
. - The
KedroContext
constructor now takespackage_name
as first argument. - Replaced
context
property onKedroSession
withload_context()
method. - Renamed
_push_session
and_pop_session
inkedro.framework.session.session
to_activate_session
and_deactivate_session
respectively. - Custom context class is set via
CONTEXT_CLASS
variable insrc/<your_project>/settings.py
. - Removed
KedroContext.hooks
attribute. Instead, hooks should be registered insrc/<your_project>/settings.py
under theHOOKS
key. - Restricted names given to nodes to match the regex pattern
[\w\.-]+$
. - Removed
KedroContext._create_config_loader()
andKedroContext._create_data_catalog()
. They have been replaced by registration hooks, namelyregister_config_loader()
andregister_catalog()
(see also upcoming deprecations).
Upcoming deprecations for Kedro 0.18.0
kedro.framework.context.load_context
will be removed in release 0.18.0.kedro.framework.cli.get_project_context
will be removed in release 0.18.0.- We've added a
DeprecationWarning
to the decorator API for bothnode
andpipeline
. These will be removed in release 0.18.0. Use Hooks to extend a node's behaviour instead. - We've added a
DeprecationWarning
to the Transformers API when adding a transformer to the catalog. These will be removed in release 0.18.0. Use Hooks to customise theload
andsave
methods.
Thanks for supporting contributions
Deepyaman Datta, Zach Schuster
Migration guide from Kedro 0.16.* to 0.17.*
Reminder: Our documentation on how to upgrade Kedro covers a few key things to remember when updating any Kedro version.
The Kedro 0.17.0 release contains some breaking changes. If you update Kedro to 0.17.0 and then try to work with projects created against earlier versions of Kedro, you may encounter some issues when trying to run kedro
commands in the terminal for that project. Here's a short guide to getting your projects running against the new version of Kedro.
Note: As always, if you hit any problems, please check out our documentation:
To get an existing Kedro project to work after you upgrade to Kedro 0.17.0, we recommend that you create a new project against Kedro 0.17.0 and move the code from your existing project into it. Let's go through the changes, but first, note that if you create a new Kedro project with Kedro 0.17.0 you will not be asked whether you want to include the boilerplate code for the Iris dataset example. We've removed this option (you should now use a Kedro starter if you want to create a project that is pre-populated with code).
To create a new, blank Kedro 0.17.0 project to drop your existing code into, you can create one, as always, with kedro new
. We also recommend creating a new virtual environment for your new project, or you might run into conflicts with existing dependencies.
- Update
pyproject.toml
: Copy the following three keys from the.kedro.yml
of your existing Kedro project into thepyproject.toml
file of your new Kedro 0.17.0 project:
[tools.kedro]
package_name = "<package_name>"
project_name = "<project_name>"
project_version = "0.17.0"
Check your source directory. If you defined a different source directory (source_dir
), make sure you also move that to pyproject.toml
.
-
Copy files from your existing project:
- Copy subfolders of
project/src/project_name/pipelines
from existing to new project - Copy subfolders of
project/src/test/pipelines
from existing to new project - Copy the requirements your project needs into
requirements.txt
and/orrequirements.in
. - Copy your project configuration from the
conf
folder. Take note of the new locations needed for modular pipeline configuration (move it fromconf/<env>/pipeline_name/catalog.yml
toconf/<env>/catalog/pipeline_name.yml
and likewise forparameters.yml
). - Copy from the
data/
folder of your existing project, if needed, into the same location in your new project. - Copy any Hooks from
src/<package_name>/hooks.py
.
- Copy subfolders of
-
Update your new project's README and docs as necessary.
-
Update
settings.py
: For example, if you specified additional Hook implementations inhooks
, or listed plugins underdisable_hooks_by_plugin
in your.kedro.yml
, you will need to move them tosettings.py
accordingly:
from <package_name>.hooks import MyCustomHooks, ProjectHooks
HOOKS = (ProjectHooks(), MyCustomHooks())
DISABLE_HOOKS_FOR_PLUGINS = ("my_plugin1",)
-
Migration for
node
names. From 0.17.0 the only allowed characters for node names are letters, digits, hyphens, underscores and/or fullstops. If you have previously defined node names that have special characters, spaces or other characters that are no longer permitted, you will need to rename those nodes. -
Copy changes to
kedro_cli.py
. If you previously customised thekedro run
command or added more CLI commands to yourkedro_cli.py
, you should move them into<project_root>/src/<package_name>/cli.py
. Note, however, that the new way to run a Kedro pipeline is via aKedroSession
, rather than using theKedroContext
:
with KedroSession.create(package_name=...) as session:
session.run()
-
Copy changes made to
ConfigLoader
. If you have defined a custom class, such asTemplatedConfigLoader
, by overridingProjectContext._create_config_loader
, you should move the contents of the function insrc/<package_name>/hooks.py
, underregister_config_loader
. -
Copy changes made to
DataCatalog
. Likewise, if you haveDataCatalog
defined withProjectContext._create_catalog
, you should copy-paste the contents intoregister_catalog
. -
Optional: If you have plugins such as Kedro-Viz installed, it's likely that Kedro 0.17.0 won't work with their older versions, so please either upgrade to the plugin's newest version or follow their migration guides.