Releases: kedro-org/kedro
0.17.3
Release 0.17.3
Major features and improvements
- Kedro plugins can now override built-in CLI commands.
- Added a
before_command_run
hook for plugins to add extra behaviour before Kedro CLI commands run. pipelines
frompipeline_registry.py
andregister_pipeline
hooks are now loaded lazily when they are first accessed, not on startup:
from kedro.framework.project import pipelines
print(pipelines["__default__"]) # pipeline loading is only triggered here
Bug fixes and other changes
TemplatedConfigLoader
now correctly inserts default values when no globals are supplied.- Fixed a bug where the
KEDRO_ENV
environment variable had no effect on instantiating thecontext
variable in an iPython session or a Jupyter notebook. - Plugins with empty CLI groups are no longer displayed in the Kedro CLI help screen.
- Duplicate commands will no longer appear twice in the Kedro CLI help screen.
- CLI commands from sources with the same name will show under one list in the help screen.
- The setup of a Kedro project, including adding src to path and configuring settings, is now handled via the
bootstrap_project
method. configure_project
is invoked if apackage_name
is supplied toKedroSession.create
. This is added for backward-compatibility purpose to support a workflow that createsSession
manually. It will be removed in0.18.0
.- Stopped swallowing up all
ModuleNotFoundError
ifregister_pipelines
not found, so that a more helpful error message will appear when a dependency is missing, e.g. Issue #722. - When
kedro new
is invoked using a configuration yaml file,output_dir
is no longer a required key; by default the current working directory will be used. - When
kedro new
is invoked using a configuration yaml file, the appropriateprompts.yml
file is now used for validating the provided configuration. Previously, validation was always performed against the kedro project templateprompts.yml
file. - When a relative path to a starter template is provided,
kedro new
now generates user prompts to obtain configuration rather than supplying empty configuration. - Fixed error when using starters on Windows with Python 3.7 (Issue #722).
- Fixed decoding error of config files that contain accented characters by opening them for reading in UTF-8.
- Fixed an issue where
after_dataset_loaded
run would finish before a dataset is actually loaded when using--async
flag.
Upcoming deprecations for Kedro 0.18.0
kedro.versioning.journal.Journal
will be removed.- The following properties on
kedro.framework.context.KedroContext
will be removed:io
in favour ofKedroContext.catalog
pipeline
(equivalent topipelines["__default__"]
)pipelines
in favour ofkedro.framework.project.pipelines
0.17.2
Release 0.17.2
Major features and improvements
- Added support for
compress_pickle
backend toPickleDataSet
. - Enabled loading pipelines without creating a
KedroContext
instance:
from kedro.framework.project import pipelines
print(pipelines)
- Projects generated with kedro>=0.17.2:
- should define pipelines in
pipeline_registry.py
rather thanhooks.py
. - when run as a package, will behave the same as
kedro run
- should define pipelines in
Bug fixes and other changes
- If
settings.py
is not importable, the errors will be surfaced earlier in the process, rather than at runtime.
Minor breaking changes to the API
kedro pipeline list
andkedro pipeline describe
no longer accept redundant--env
parameter.from kedro.framework.cli.cli import cli
no longer includes thenew
andstarter
commands.
Upcoming deprecations for Kedro 0.18.0
kedro.framework.context.KedroContext.run
will be removed in release 0.18.0.
Thanks for supporting contributions
0.17.1
Release 0.17.1
Major features and improvements
- Added
env
andextra_params
toreload_kedro()
line magic. - Extended the
pipeline()
API to allow strings and sets of strings asinputs
andoutputs
, to specify when a dataset name remains the same (not namespaced). - Added the ability to add custom prompts with regexp validator for starters by repurposing
default_config.yml
asprompts.yml
. - Added the
env
andextra_params
arguments toregister_config_loader
hook. - Refactored the way
settings
are loaded. You will now be able to run:
from kedro.framework.project import settings
print(settings.CONF_ROOT)
Bug fixes and other changes
- The version of a packaged modular pipeline now defaults to the version of the project package.
- Added fix to prevent new lines being added to pandas CSV datasets.
- Fixed issue with loading a versioned
SparkDataSet
in the interactive workflow. - Kedro CLI now checks
pyproject.toml
for atool.kedro
section before treating the project as a Kedro project. - Added fix to
DataCatalog::shallow_copy
now it should copy layers. kedro pipeline pull
now usespip download
for protocols that are not supported byfsspec
.- Cleaned up documentation to fix broken links and rewrite permanently redirected ones.
- Added a
jsonschema
schema definition for the Kedro 0.17 catalog. kedro install
now waits on Windows until all the requirements are installed.- Exposed
--to-outputs
option in the CLI, throughout the codebase, and as part of hooks specifications. - Fixed a bug where
ParquetDataSet
wasn't creating parent directories on the fly. - Updated documentation.
Breaking changes to the API
- This release has broken the
kedro ipython
andkedro jupyter
workflows. To fix this, follow the instructions in the migration guide below.
Note: If you're using the
ipython
extension instead, you will not encounter this problem.
Migration guide
You will have to update the file <your_project>/.ipython/profile_default/startup/00-kedro-init.py
in order to make kedro ipython
and/or kedro jupyter
work. Add the following line before the KedroSession
is created:
configure_project(metadata.package_name) # to add
session = KedroSession.create(metadata.package_name, path)
Make sure that the associated import is provided in the same place as others in the file:
from kedro.framework.project import configure_project # to add
from kedro.framework.session import KedroSession
Thanks for supporting contributions
Mariana Silva,
Kiyohito Kunii,
noklam,
Ivan Doroshenko,
Zain Patel,
Deepyaman Datta,
Sam Hiscox,
Pascal Brokmeier
0.17.0
Release 0.17.0
Major features and improvements
- In a significant change, we have introduced
KedroSession
which is responsible for managing the lifecycle of a Kedro run. - Created a new Kedro Starter:
kedro new --starter=mini-kedro
. It is possible to use the DataCatalog as a standalone component in a Jupyter notebook and transition into the rest of the Kedro framework. - Added
DatasetSpecs
with Hooks to run before and after datasets are loaded from/saved to the catalog. - Added a command:
kedro catalog create
. For a registered pipeline, it creates a<conf_root>/<env>/catalog/<pipeline_name>.yml
configuration file withMemoryDataSet
datasets for each dataset that is missing fromDataCatalog
. - Added
settings.py
andpyproject.toml
(to replace.kedro.yml
) for project configuration, in line with Python best practice. ProjectContext
is no longer needed, unless for very complex customisations.KedroContext
,ProjectHooks
andsettings.py
together implement sensible default behaviour. As a resultcontext_path
is also now an optional key inpyproject.toml
.- Removed
ProjectContext
fromsrc/<package_name>/run.py
. TemplatedConfigLoader
now supports Jinja2 template syntax alongside its original syntax.- Made registration Hooks mandatory, as the only way to customise the
ConfigLoader
or theDataCatalog
used in a project. If no such Hook is provided insrc/<package_name>/hooks.py
, aKedroContextError
is raised. There are sensible defaults defined in any project generated with Kedro >= 0.16.5.
Bug fixes and other changes
ParallelRunner
no longer results in a run failure, when triggered from a notebook, if the run is started usingKedroSession
(session.run()
).before_node_run
can now overwrite node inputs by returning a dictionary with the corresponding updates.- Added minimal, black-compatible flake8 configuration to the project template.
- Moved
isort
andpytest
configuration from<project_root>/setup.cfg
to<project_root>/pyproject.toml
. - Extra parameters are no longer incorrectly passed from
KedroSession
toKedroContext
. - Relaxed
pyspark
requirements to allow for installation ofpyspark
3.0. - Added a
--fs-args
option to thekedro pipeline pull
command to specify configuration options for thefsspec
filesystem arguments used when pulling modular pipelines from non-PyPI locations. - Bumped maximum required
fsspec
version to 0.9. - Bumped maximum supported
s3fs
version to 0.5 (S3FileSystem
interface has changed since 0.4.1 version).
Deprecations
- In Kedro 0.17.0 we have deleted the deprecated
kedro.cli
andkedro.context
modules in favour ofkedro.framework.cli
andkedro.framework.context
respectively.
Other breaking changes to the API
kedro.io.DataCatalog.exists()
returnsFalse
when the dataset does not exist, as opposed to raising an exception.- The pipeline-specific
catalog.yml
file is no longer automatically created for modular pipelines when runningkedro pipeline create
. Usekedro catalog create
to replace this functionality. - Removed
include_examples
prompt fromkedro new
. To generate boilerplate example code, you should use a Kedro starter. - Changed the
--verbose
flag from a global command to a project-specific command flag (e.gkedro --verbose new
becomeskedro new --verbose
). - Dropped support of the
dataset_credentials
key in credentials inPartitionedDataSet
. get_source_dir()
was removed fromkedro/framework/cli/utils.py
.- Dropped support of
get_config
,create_catalog
,create_pipeline
,template_version
,project_name
andproject_path
keys byget_project_context()
function (kedro/framework/cli/cli.py
). kedro new --starter
now defaults to fetching the starter template matching the installed Kedro version.- Renamed
kedro_cli.py
tocli.py
and moved it inside the Python package (src/<package_name>/
), for a better packaging and deployment experience. - Removed
.kedro.yml
from the project template and replaced it withpyproject.toml
. - Removed
KEDRO_CONFIGS
constant (previously residing inkedro.framework.context.context
). - Modified
kedro pipeline create
CLI command to add a boilerplate parameter config file inconf/<env>/parameters/<pipeline_name>.yml
instead ofconf/<env>/pipelines/<pipeline_name>/parameters.yml
. CLI commandskedro pipeline delete
/package
/pull
were updated accordingly. - Removed
get_static_project_data
fromkedro.framework.context
. - Removed
KedroContext.static_data
. - The
KedroContext
constructor now takespackage_name
as first argument. - Replaced
context
property onKedroSession
withload_context()
method. - Renamed
_push_session
and_pop_session
inkedro.framework.session.session
to_activate_session
and_deactivate_session
respectively. - Custom context class is set via
CONTEXT_CLASS
variable insrc/<your_project>/settings.py
. - Removed
KedroContext.hooks
attribute. Instead, hooks should be registered insrc/<your_project>/settings.py
under theHOOKS
key. - Restricted names given to nodes to match the regex pattern
[\w\.-]+$
. - Removed
KedroContext._create_config_loader()
andKedroContext._create_data_catalog()
. They have been replaced by registration hooks, namelyregister_config_loader()
andregister_catalog()
(see also upcoming deprecations).
Upcoming deprecations for Kedro 0.18.0
kedro.framework.context.load_context
will be removed in release 0.18.0.kedro.framework.cli.get_project_context
will be removed in release 0.18.0.- We've added a
DeprecationWarning
to the decorator API for bothnode
andpipeline
. These will be removed in release 0.18.0. Use Hooks to extend a node's behaviour instead. - We've added a
DeprecationWarning
to the Transformers API when adding a transformer to the catalog. These will be removed in release 0.18.0. Use Hooks to customise theload
andsave
methods.
Thanks for supporting contributions
Deepyaman Datta, Zach Schuster
Migration guide from Kedro 0.16.* to 0.17.*
Reminder: Our documentation on how to upgrade Kedro covers a few key things to remember when updating any Kedro version.
The Kedro 0.17.0 release contains some breaking changes. If you update Kedro to 0.17.0 and then try to work with projects created against earlier versions of Kedro, you may encounter some issues when trying to run kedro
commands in the terminal for that project. Here's a short guide to getting your projects running against the new version of Kedro.
Note: As always, if you hit any problems, please check out our documentation:
To get an existing Kedro project to work after you upgrade to Kedro 0.17.0, we recommend that you create a new project against Kedro 0.17.0 and move the code from your existing project into it. Let's go through the changes, but first, note that if you create a new Kedro project with Kedro 0.17.0 you will not be asked whether you want to include the boilerplate code for the Iris dataset example. We've removed this option (you should now use a Kedro starter if you want to create a project that is pre-populated with code).
To create a new, blank Kedro 0.17.0 project to drop your existing code into, you can create one, as always, with kedro new
. We also recommend creating a new virtual environment for your new project, or you might run into conflicts with existing dependencies.
- Update
pyproject.toml
: Copy the following three keys from the.kedro.yml
of your existing Kedro project into thepyproject.toml
file of your new Kedro 0.17.0 project:
[tools.kedro]
package_name = "<package_name>"
project_name = "<project_name>"
project_version = "0.17.0"
Check your source directory. If you defined a different source directory (source_dir
), make sure you also move that to pyproject.toml
.
-
Copy files from your existing project:
- Copy subfolders of
project/src/project_name/pipelines
from existing to new project - Copy subfolders of
project/src/test/pipelines
from existing to new project - Copy the requirements your project needs into
requirements.txt
and/orrequirements.in
. - Copy your project configuration from the
conf
folder. Take note of the new locations needed for modular pipeline configuration (move it fromconf/<env>/pipeline_name/catalog.yml
toconf/<env>/catalog/pipeline_name.yml
and likewise forparameters.yml
). - Copy from the
data/
folder of your existing project, if needed, into the same location in your new project. - Copy any Hooks from
src/<package_name>/hooks.py
.
- Copy subfolders of
-
Update your new project's README and docs as necessary.
-
Update
settings.py
: For example, if you specified additional Hook implementations inhooks
, or listed plugins underdisable_hooks_by_plugin
in your.kedro.yml
, you will need to move them tosettings.py
accordingly:
from <package_name>.hooks import MyCustomHooks, ProjectHooks
HOOKS = (ProjectHooks(), MyCustomHooks())
DISABLE_HOOKS_FOR_PLUGINS = ("my_plugin1",)
- **Mig...
0.16.6
Major features and improvements
- Added documentation with a focus on single machine and distributed environment deployment; the series includes Docker, Argo, Prefect, Kubeflow, AWS Batch, AWS Sagemaker and extends our section on Databricks
- Added kedro-starter-spaceflights alias for generating a project:
kedro new --starter spaceflights
.
Bug fixes and other changes
- Fixed
TypeError
when converting dict inputs to a node made from a wrappedpartial
function. PartitionedDataSet
improvements:- Supported passing arguments to the underlying filesystem.
- Improved handling of non-ASCII word characters in dataset names.
- For example, a dataset named
jalapeño
will be accessible asDataCatalog.datasets.jalapeño
rather thanDataCatalog.datasets.jalape__o
.
- For example, a dataset named
- Fixed
kedro install
for an Anaconda environment defined inenvironment.yml
. - Fixed backwards compatibility with templates generated with older Kedro versions <0.16.5. No longer need to update
.kedro.yml
to usekedro lint
andkedro jupyter notebook convert
. - Improved documentation.
- Added documentation using MinIO with Kedro.
- Improved error messages for incorrect parameters passed into a node.
- Fixed issue with saving a
TensorFlowModelDataset
in the HDF5 format with versioning enabled. - Added missing
run_result
argument inafter_pipeline_run
Hooks spec. - Fixed a bug in IPython script that was causing context hooks to be registered twice. To apply this fix to a project generated with an older Kedro version, apply the same changes made in this PR to your
00-kedro-init.py
file.
Thanks for supporting contributions
Deepyaman Datta, Bhavya Merchant, Lovkush Agarwal, Varun Krishna S, Sebastian Bertoli, noklam, Daniel Petti, Waylon Walker
0.16.5
Major features and improvements
- Added the following new datasets.
Type | Description | Location |
---|---|---|
email.EmailMessageDataSet |
Manage email messages using the Python standard library | kedro.extras.datasets.email |
- Added support for
pyproject.toml
to configure Kedro.pyproject.toml
is used if.kedro.yml
doesn't exist (Kedro configuration should be under[tool.kedro]
section). - Projects created with this version will have no
pipeline.py
, having been replaced byhooks.py
. - Added a set of registration hooks, as the new way of registering library components with a Kedro project:
register_pipelines()
, to replace_get_pipelines()
register_config_loader()
, to replace_create_config_loader()
register_catalog()
, to replace_create_catalog()
These can be defined insrc/<package-name>/hooks.py
and added to.kedro.yml
(orpyproject.toml
). The order of execution is: plugin hooks,.kedro.yml
hooks, hooks inProjectContext.hooks
.
- Added ability to disable auto-registered Hooks using
.kedro.yml
(orpyproject.toml
) configuration file.
Bug fixes and other changes
- Added option to run asynchronously via the Kedro CLI.
- Absorbed
.isort.cfg
settings intosetup.cfg
. project_name
,project_version
andpackage_name
now have to be defined in.kedro.yml
for projects generated using Kedro 0.16.5+.- Packaging a modular pipeline raises an error if the pipeline directory is empty or non-existent.
Thanks for supporting contributions
0.16.4
Release 0.16.4
Major features and improvements
- Enabled auto-discovery of hooks implementations coming from installed plugins.
Bug fixes and other changes
- Fixed a bug for using
ParallelRunner
on Windows. - Modified
GBQTableDataSet
to load customised results using customised queries from Google Big Query tables. - Documentation improvements.
Thanks for supporting contributions
Ajay Bisht, Vijay Sajjanar, Deepyaman Datta, Sebastian Bertoli, Shahil Mawjee, Louis Guitton, Emanuel Ferm
0.16.3
0.16.2
Major features and improvements
- Added the following new datasets.
Type | Description | Location |
---|---|---|
pandas.AppendableExcelDataSet |
Works with Excel file opened in append mode |
kedro.extras.datasets.pandas |
tensorflow.TensorFlowModelDataset |
Works with TensorFlow models using TensorFlow 2.X |
kedro.extras.datasets.tensorflow |
holoviews.HoloviewsWriter |
Works with Holoviews objects (saves as image file) |
kedro.extras.datasets.holoviews |
kedro install
will now compile project dependencies (by runningkedro build-reqs
behind the scenes) before the installation if thesrc/requirements.in
file doesn't exist.- Added
only_nodes_with_namespace
inPipeline
class to filter only nodes with a specified namespace. - Added the
kedro pipeline delete
command to help delete unwanted or unused pipelines (it won't remove references to the pipeline in yourcreate_pipelines()
code). - Added the
kedro pipeline package
command to help package up a modular pipeline. It will bundle up the pipeline source code, tests, and parameters configuration into a .whl file.
Bug fixes and other changes
- Improvement in
DataCatalog
:- Introduced regex filtering to the
DataCatalog.list()
method. - Non-alphanumeric characters (except underscore) in dataset name are replaced with
__
inDataCatalog.datasets
, for ease of access to transcoded datasets.
- Introduced regex filtering to the
- Improvement in Datasets:
- Improved initialization speed of
spark.SparkHiveDataSet
. - Improved S3 cache in
spark.SparkDataSet
. - Added support of options for building
pyarrow
table inpandas.ParquetDataSet
.
- Improved initialization speed of
- Improvement in
kedro build-reqs
CLI command:kedro build-reqs
is now called with-q
option and will no longer print out compiled requirements to the console for security reasons.- All unrecognized CLI options in
kedro build-reqs
command are now passed to pip-compile call (e.g.kedro build-reqs --generate-hashes
).
- Improvement in
kedro jupyter
CLI command:- Improved error message when running
kedro jupyter notebook
,kedro jupyter lab
orkedro ipython
with Jupyter/IPython dependencies not being installed. - Fixed
%run_viz
line magic for showing kedro viz inside a Jupyter notebook. For the fix to be applied on existing Kedro project, please see the migration guide. - Fixed the bug in IPython startup script (issue 298).
- Improved error message when running
- Documentation improvements:
- Updated community-generated content in FAQ.
- Added find-kedro and kedro-static-viz to the list of community plugins.
- Add missing
pillow.ImageDataSet
entry to the documentation.
Breaking changes to the API
Migration guide from Kedro 0.16.1 to 0.16.2
Guide to apply the fix for %run_viz
line magic in existing project
Even though this release ships a fix for project generated with kedro==0.16.2
, after upgrading, you will still need to make a change in your existing project if it was generated with kedro>=0.16.0,<=0.16.1
for the fix to take effect. Specifically, please change the content of your project's IPython init script located at .ipython/profile_default/startup/00-kedro-init.py
with the content of this file. You will also need kedro-viz>=3.3.1
.
Thanks for supporting contributions
Miguel Rodriguez Gutierrez, Joel Schwarzmann, w0rdsm1th, Deepyaman Datta, Tam-Sanh Nguyen, Marcus Gawronsky
0.16.1
Bug fixes and other changes
- Fixed deprecation warnings from
kedro.cli
andkedro.context
when runningkedro jupyter notebook
. - Fixed a bug where
catalog
andcontext
were not available in Jupyter Lab and Notebook. - Fixed a bug where
kedro build-reqs
would fail if you didn't have your project dependencies installed.