Releases: kedro-org/kedro
0.18.5
Release 0.18.5
NOTE: This version of Kedro introduced a bug such that the Kedro-Viz console to fail to show experiment tracking correctly. We recommend that you don't use it and prefer instead to use Kedro version
0.18.6
.
Major features and improvements
- Added new
OmegaConfigLoader
which usesOmegaConf
for loading and merging configuration. - Added the
--conf-source
option tokedro run
, allowing users to specify a source for project configuration for the run. - Added
omegaconf
syntax as option for--params
. Keys and values can now be separated by colons or equals signs. - Added support for generator functions as nodes, i.e. using
yield
instead of return.- Enable chunk-wise processing in nodes with generator functions.
- Save node outputs after every
yield
before proceeding with next chunk.
- Fixed incorrect parsing of Azure Data Lake Storage Gen2 URIs used in datasets.
- Added support for loading credentials from environment variables using
OmegaConfigLoader
. - Added new
--namespace
flag tokedro run
to enable filtering by node namespace. - Added a new argument
node
for all four dataset hooks. - Added the
kedro run
flags--nodes
,--tags
, and--load-versions
to replace--node
,--tag
, and--load-version
.
Bug fixes and other changes
- Commas surrounded by square brackets (only possible for nodes with default names) will no longer split the arguments to
kedro run
options which take a list of nodes as inputs (--from-nodes
and--to-nodes
). - Fixed bug where
micropkg
manifest section inpyproject.toml
isn't recognised as allowed configuration. - Fixed bug causing
load_ipython_extension
not to register the%reload_kedro
line magic when called in a directory that does not contain a Kedro project. - Added
anyconfig
'sac_context
parameter tokedro.config.commons
module functions for more flexibleConfigLoader
customizations. - Change reference to
kedro.pipeline.Pipeline
object throughout test suite withkedro.modular_pipeline.pipeline
factory. - Fixed bug causing the
after_dataset_saved
hook only to be called for one output dataset when multiple are saved in a single node and async saving is in use. - Log level for "Credentials not found in your Kedro project config" was changed from
WARNING
toDEBUG
. - Added safe extraction of tar files in
micropkg pull
to fix vulnerability caused by CVE-2007-4559. - Documentation improvements
- Bug fix in table font size
- Updated API docs links for datasets
- Improved CLI docs for
kedro run
- Revised documentation for visualisation to build plots and for experiment tracking
- Added example for loading external credentials to the Hooks documentation
Breaking changes to the API
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
Upcoming deprecations for Kedro 0.19.0
project_version
will be deprecated inpyproject.toml
please usekedro_init_version
instead.- Deprecated
kedro run
flags--node
,--tag
, and--load-version
in favour of--nodes
,--tags
, and--load-versions
.
0.18.4
Major features and improvements
- Make Kedro instantiate datasets from
kedro_datasets
with higher priority thankedro.extras.datasets
.kedro_datasets
is the namespace for the newkedro-datasets
python package. - The config loader objects now implement
UserDict
and the configuration is accessed throughconf_loader['catalog']
. - You can configure config file patterns through
settings.py
without creating a custom config loader. - Added the following new datasets:
Type | Description | Location |
---|---|---|
svmlight.SVMLightDataSet |
Work with svmlight/libsvm files using scikit-learn library | kedro.extras.datasets.svmlight |
video.VideoDataSet |
Read and write video files from a filesystem | kedro.extras.datasets.video |
video.video_dataset.SequenceVideo |
Create a video object from an iterable sequence to use with VideoDataSet |
kedro.extras.datasets.video |
video.video_dataset.GeneratorVideo |
Create a video object from a generator to use with VideoDataSet |
kedro.extras.datasets.video |
- Implemented support for a functional definition of schema in
dask.ParquetDataSet
to work with thedask.to_parquet
API.
Bug fixes and other changes
- Fixed
kedro micropkg pull
for packages on PyPI. - Fixed
format
insave_args
forSparkHiveDataSet
, previously it didn't allow you to save it as delta format. - Fixed save errors in
TensorFlowModelDataset
when used without versioning; previously, it wouldn't overwrite an existing model. - Added support for
tf.device
inTensorFlowModelDataset
. - Updated error message for
VersionNotFoundError
to handle insufficient permission issues for cloud storage. - Updated Experiment Tracking docs with working examples.
- Updated MatplotlibWriter Dataset, TextDataset, plotly.PlotlyDataSet and plotly.JSONDataSet docs with working examples.
- Modified implementation of the Kedro IPython extension to use
local_ns
rather than a global variable. - Refactored
ShelveStore
to its own module to ensure multiprocessing works with it. kedro.extras.datasets.pandas.SQLQueryDataSet
now takes optional argumentexecution_options
.- Removed
attrs
upper bound to support newer versions of Airflow. - Bumped the lower bound for the
setuptools
dependency to <=61.5.1.
Minor breaking changes to the API
Upcoming deprecations for Kedro 0.19.0
kedro test
andkedro lint
will be deprecated.
Documentation
- Revised the Introduction to shorten it
- Revised the Get Started section to remove unnecessary information and clarify the learning path
- Updated the spaceflights tutorial to simplify the later stages and clarify what the reader needed to do in each phase
- Moved some pages that covered advanced materials into more appropriate sections
- Moved visualisation into its own section
- Fixed a bug that degraded user experience: the table of contents is now sticky when you navigate between pages
- Added redirects where needed on ReadTheDocs for legacy links and bookmarks
Contributions from the Kedroid community
We are grateful to the following for submitting PRs that contributed to this release: jstammers, FlorianGD, yash6318, carlaprv, dinotuku, williamcaicedo, avan-sh, Kastakin, amaralbf, BSGalvan, levimjoseph, daniel-falk, clotildeguinard, avsolatorio, and picklejuicedev for comments and input to documentation changes
0.18.3
Release 0.18.3
Major features and improvements
-
Implemented autodiscovery of project pipelines. A pipeline created with
kedro pipeline create <pipeline_name>
can now be accessed immediately without needing to explicitly register it insrc/<package_name>/pipeline_registry.py
, either individually by name (e.g.kedro run --pipeline=<pipeline_name>
) or as part of the combined default pipeline (e.g.kedro run
). By default, the simplifiedregister_pipelines()
function inpipeline_registry.py
looks like:def register_pipelines() -> Dict[str, Pipeline]: """Register the project's pipelines. Returns: A mapping from pipeline names to ``Pipeline`` objects. """ pipelines = find_pipelines() pipelines["__default__"] = sum(pipelines.values()) return pipelines
-
The Kedro IPython extension should now be loaded with
%load_ext kedro.ipython
. -
The line magic
%reload_kedro
now accepts keywords arguments, e.g.%reload_kedro --env=prod
. -
Improved resume pipeline suggestion for
SequentialRunner
, it will backtrack the closest persisted inputs to resume.
Bug fixes and other changes
- Changed default
False
value for rich loggingshow_locals
, to make sure credentials and other sensitive data isn't shown in logs. - Rich traceback handling is disabled on Databricks so that exceptions now halt execution as expected. This is a workaround for a bug in
rich
. - When using
kedro run -n [some_node]
, ifsome_node
is missing a namespace the resulting error message will suggest the correct node name. - Updated documentation for
rich
logging. - Updated Prefect deployment documentation to allow for reruns with saved versioned datasets.
- The Kedro IPython extension now surfaces errors when it cannot load a Kedro project.
- Relaxed
delta-spark
upper bound to allow compatibility with Spark 3.1.x and 3.2.x. - Added
gdrive
to list of cloud protocols, enabling Google Drive paths for datasets. - Added svg logo resource for ipython kernel.
Upcoming deprecations for Kedro 0.19.0
- The Kedro IPython extension will no longer be available as
%load_ext kedro.extras.extensions.ipython
; use%load_ext kedro.ipython
instead. kedro jupyter convert
,kedro build-docs
,kedro build-reqs
andkedro activate-nbstripout
will be deprecated.
0.18.2
Release 0.18.2
Major features and improvements
- Added
abfss
to list of cloud protocols, enabling abfss paths. - Kedro now uses the Rich library to format terminal logs and tracebacks.
- The file
conf/base/logging.yml
is now optional. See our documentation for details. - Introduced a
kedro.starters
entry point. This enables plugins to create custom starter aliases used bykedro starter list
andkedro new
. - Reduced the
kedro new
prompts to just one question asking for the project name.
Bug fixes and other changes
- Bumped
pyyaml
upper bound to make Kedro compatible with the pyodide stack. - Updated project template's Sphinx configuration to use
myst_parser
instead ofrecommonmark
. - Reduced number of log lines by changing the logging level from
INFO
toDEBUG
for low priority messages. - Kedro's framework-side logging configuration no longer performs file-based logging. Hence superfluous
info.log
/errors.log
files are no longer created in your project root, and running Kedro on read-only file systems such as Databricks Repos is now possible. - The
root
logger is now set to the Python default level ofWARNING
rather thanINFO
. Kedro's logger is still set to emitINFO
level messages. SequentialRunner
now has consistent execution order across multiple runs with sorted nodes.- Bumped the upper bound for the Flake8 dependency to <5.0.
kedro jupyter notebook/lab
no longer reuses a Jupyter kernel.- Required
cookiecutter>=2.1.1
to address a known command injection vulnerability. - The session store no longer fails if a username cannot be found with
getpass.getuser
. - Added generic typing for
AbstractDataSet
andAbstractVersionedDataSet
as well as typing to all datasets. - Rendered the deployment guide flowchart as a Mermaid diagram, and added Dask.
Minor breaking changes to the API
- The module
kedro.config.default_logger
no longer exists; default logging configuration is now set automatically throughkedro.framework.project.LOGGING
. Unless you explicitly importkedro.config.default_logger
you do not need to make any changes.
Upcoming deprecations for Kedro 0.19.0
kedro.extras.ColorHandler
will be removed in 0.19.0.
0.18.1
Major features and improvements
- Added a new hook
after_context_created
that passes theKedroContext
instance ascontext
. - Added a new CLI hook
after_command_run
. - Added more detail to YAML
ParserError
exception error message. - Added option to
SparkDataSet
to specify aschema
load argument that allows for supplying a user-defined schema as opposed to relying on the schema inference of Spark. - The Kedro package no longer contains a built version of the Kedro documentation significantly reducing the package size.
Bug fixes and other changes
- Removed fatal error from being logged when a Kedro session is created in a directory without git.
- Fixed
CONFIG_LOADER_CLASS
validation so thatTemplatedConfigLoader
can be specified in settings.py. AnyCONFIG_LOADER_CLASS
must be a subclass ofAbstractConfigLoader
. - Added runner name to the
run_params
dictionary used in pipeline hooks. - Updated Databricks documentation to include how to get it working with IPython extension and Kedro-Viz.
- Update sections on visualisation, namespacing, and experiment tracking in the spaceflight tutorial to correspond to the complete spaceflights starter.
- Fixed
Jinja2
syntax loading withTemplatedConfigLoader
usingglobals.yml
. - Removed global
_active_session
,_activate_session
and_deactivate_session
. Plugins that need to access objects such as the config loader should now do so throughcontext
in the newafter_context_created
hook. config_loader
is available as a public read-only attribute ofKedroContext
.- Made
hook_manager
argument optional forrunner.run
. kedro docs
now opens an online version of the Kedro documentation instead of a locally built version.
Upcoming deprecations for Kedro 0.19.0
kedro docs
will be removed in 0.19.0.
0.18.0
Release 0.18.0
TL;DR ✨
Kedro 0.18.0 strives to reduce the complexity of the project template and get us closer to a stable release of the framework. We've introduced the full micro-packaging workflow 📦, which allows you to import packages, utility functions and existing pipelines into your Kedro project. Integration with IPython and Jupyter has been streamlined in preparation for enhancements to Kedro's interactive workflow. Additionally, the release comes with long-awaited Python 3.9 and 3.10 support 🐍.
Major features and improvements
Framework
- Added
kedro.config.abstract_config.AbstractConfigLoader
as an abstract base class for allConfigLoader
implementations.ConfigLoader
andTemplatedConfigLoader
now inherit directly from this base class. - Streamlined the
ConfigLoader.get
andTemplatedConfigLoader.get
API and delegated the actualget
method functional implementation to thekedro.config.common
module. - The
hook_manager
is no longer a global singleton. Thehook_manager
lifecycle is now managed by theKedroSession
, and a newhook_manager
will be created every time asession
is instantiated. - Added support for specifying parameters mapping in
pipeline()
without theparams:
prefix. - Added new API
Pipeline.filter()
(previously inKedroContext._filter_pipeline()
) to filter parts of a pipeline. - Added
username
to Session store for logging during Experiment Tracking. - A packaged Kedro project can now be imported and run from another Python project as following:
from my_package.__main__ import main
main(
["--pipleine", "my_pipeline"]
) # or just main() if no parameters are needed for the run
Project template
- Removed
cli.py
from the Kedro project template. By default, all CLI commands, includingkedro run
, are now defined on the Kedro framework side. You can still define custom CLI commands by creating your owncli.py
. - Removed
hooks.py
from the Kedro project template. Registration hooks have been removed in favour ofsettings.py
configuration, but you can still define execution timeline hooks by creating your ownhooks.py
. - Removed
.ipython
directory from the Kedro project template. The IPython/Jupyter workflow no longer uses IPython profiles; it now uses an IPython extension. - The default
kedro
run configuration environment names can now be set insettings.py
using theCONFIG_LOADER_ARGS
variable. The relevant keyword arguments to supply arebase_env
anddefault_run_env
, which are set tobase
andlocal
respectively by default.
DataSets
- Added the following new datasets:
Type | Description | Location |
---|---|---|
pandas.XMLDataSet |
Read XML into Pandas DataFrame. Write Pandas DataFrame to XML | kedro.extras.datasets.pandas |
networkx.GraphMLDataSet |
Work with NetworkX using GraphML files | kedro.extras.datasets.networkx |
networkx.GMLDataSet |
Work with NetworkX using Graph Modelling Language files | kedro.extras.datasets.networkx |
redis.PickleDataSet |
loads/saves data from/to a Redis database | kedro.extras.datasets.redis |
- Added
partitionBy
support and exposedsave_args
forSparkHiveDataSet
. - Exposed
open_args_save
infs_args
forpandas.ParquetDataSet
. - Refactored the
load
andsave
operations forpandas
datasets in order to leveragepandas
own API and delegatefsspec
operations to them. This reduces the need to have our ownfsspec
wrappers. - Merged
pandas.AppendableExcelDataSet
intopandas.ExcelDataSet
. - Added
save_args
tofeather.FeatherDataSet
.
Jupyter and IPython integration
- The only recommended way to work with Kedro in Jupyter or IPython is now the Kedro IPython extension. Managed Jupyter instances should load this via
%load_ext kedro.extras.extensions.ipython
and use the line magic%reload_kedro
. kedro ipython
launches an IPython session that preloads the Kedro IPython extension.kedro jupyter notebook/lab
creates a custom Jupyter kernel that preloads the Kedro IPython extension and launches a notebook with that kernel selected. There is no longer a need to specify--all-kernels
to show all available kernels.
Dependencies
- Bumped the minimum version of
pandas
to 1.3. Anystorage_options
should continue to be specified underfs_args
and/orcredentials
. - Added support for Python 3.9 and 3.10, dropped support for Python 3.6.
- Updated
black
dependency in the project template to a non pre-release version.
Other
- Documented distribution of Kedro pipelines with Dask.
Breaking changes to the API
Framework
- Removed
RegistrationSpecs
and its associatedregister_config_loader
andregister_catalog
hook specifications in favour ofCONFIG_LOADER_CLASS
/CONFIG_LOADER_ARGS
andDATA_CATALOG_CLASS
insettings.py
. - Removed deprecated functions
load_context
andget_project_context
. - Removed deprecated
CONF_SOURCE
,package_name
,pipeline
,pipelines
,config_loader
andio
attributes fromKedroContext
as well as the deprecatedKedroContext.run
method. - Added the
PluginManager
hook_manager
argument toKedroContext
and theRunner.run()
method, which will be provided by theKedroSession
. - Removed the public method
get_hook_manager()
and replaced its functionality by_create_hook_manager()
. - Enforced that only one run can be successfully executed as part of a
KedroSession
.run_id
has been renamed tosession_id
as a result.
Configuration loaders
- The
settings.py
settingCONF_ROOT
has been renamed toCONF_SOURCE
. Default value ofconf
remains unchanged. ConfigLoader
andTemplatedConfigLoader
argumentconf_root
has been renamed toconf_source
.extra_params
has been renamed toruntime_params
inkedro.config.config.ConfigLoader
andkedro.config.templated_config.TemplatedConfigLoader
.- The environment defaulting behaviour has been removed from
KedroContext
and is now implemented in aConfigLoader
class (or equivalent) with thebase_env
anddefault_run_env
attributes.
DataSets
pandas.ExcelDataSet
now usesopenpyxl
engine instead ofxlrd
.pandas.ParquetDataSet
now callspd.to_parquet()
upon saving. Note that the argumentpartition_cols
is not supported.spark.SparkHiveDataSet
API has been updated to reflectspark.SparkDataSet
. Thewrite_mode=insert
option has also been replaced withwrite_mode=append
as per Spark styleguide. This change addresses Issue 725 and Issue 745. Additionally,upsert
mode now leveragescheckpoint
functionality and requires a validcheckpointDir
be set for currentSparkContext
.yaml.YAMLDataSet
can no longer save apandas.DataFrame
directly, but it can save a dictionary. Usepandas.DataFrame.to_dict()
to convert yourpandas.DataFrame
to a dictionary before you attempt to save it to YAML.- Removed
open_args_load
andopen_args_save
from the following datasets:pandas.CSVDataSet
pandas.ExcelDataSet
pandas.FeatherDataSet
pandas.JSONDataSet
pandas.ParquetDataSet
storage_options
are now dropped if they are specified underload_args
orsave_args
for the following datasets:pandas.CSVDataSet
pandas.ExcelDataSet
pandas.FeatherDataSet
pandas.JSONDataSet
pandas.ParquetDataSet
- Renamed
lambda_data_set
,memory_data_set
, andpartitioned_data_set
tolambda_dataset
,memory_dataset
, andpartitioned_dataset
, respectively, inkedro.io
. - The dataset
networkx.NetworkXDataSet
has been renamed tonetworkx.JSONDataSet
.
CLI
- Removed
kedro install
in favour ofpip install -r src/requirements.txt
to install project dependencies. - Removed
--parallel
flag fromkedro run
in favour of--runner=ParallelRunner
. The-p
flag is now an alias for--pipeline
. kedro pipeline package
has been replaced bykedro micropkg package
and, in addition to the--alias
flag used to rename the package, now accepts a module name and path to the pipeline or utility module to package, relative tosrc/<package_name>/
. The--version
CLI option has been removed in favour of setting a__version__
variable in the micro-package's__init__.py
file.kedro pipeline pull
has been replaced bykedro micropkg pull
and now also supports--destination
to provide a location for pulling the package.- Removed
kedro pipeline list
andkedro pipeline describe
in favour ofkedro registry list
andkedro registry describe
. kedro package
andkedro micropkg package
now saveegg
andwhl
ortar
files in the<project_root>/dist
folder (previously<project_root>/src/dist
).- Changed the behaviour of
kedro build-reqs
to compile requirements fromrequirements.txt
instead ofrequirements.in
and save them torequirements.lock
instead ofrequirements.txt
. kedro jupyter notebook/lab
no longer accept--all-kernels
or--idle-timeout
flags.--all-kernels
is now the default behaviour.KedroSession.run
now raisesValueError
rather thanKedroContextError
when the pipeline contains no nodes. The sameValueError
is raised when there are no matching tags.KedroSession.run
now raisesValueError
rather thanKedroContextError
w...
0.17.7
Release 0.17.7
Major features and improvements
pipeline
now acceptstags
and a collection ofNode
s and/orPipeline
s rather than just a singlePipeline
object.pipeline
should be used in preference toPipeline
when creating a Kedro pipeline.pandas.SQLTableDataSet
andpandas.SQLQueryDataSet
now only open one connection per database, at instantiation time (therefore at catalog creation time), rather than one per load/save operation.- Added new command group,
micropkg
, to replacekedro pipeline pull
andkedro pipeline package
withkedro micropkg pull
andkedro micropkg package
for Kedro 0.18.0.kedro micropkg package
saves packages toproject/dist
whilekedro pipeline package
saves packages toproject/src/dist
.
Bug fixes and other changes
- Added tutorial documentation for experiment tracking.
- Added Plotly dataset documentation.
- Added the upper limit
pandas<1.4
to maintain compatibility withxlrd~=1.0
. - Bumped the
Pillow
minimum version requirement to 9.0 (Python 3.7+ only) following CVE-2022-22817. - Fixed
PickleDataSet
to be copyable and hence work with the parallel runner. - Upgraded
pip-tools
, which is used bykedro build-reqs
, to 6.5 (Python 3.7+ only). Thispip-tools
version is compatible withpip>=21.2
, including the most recent releases ofpip
. Python 3.6 users should continue to usepip-tools
6.4 andpip<22
. - Added
astro-iris
as alias forastro-airlow-iris
, so that old tutorials can still be followed. - Added details about Kedro's Technical Steering Committee and governance model.
Upcoming deprecations for Kedro 0.18.0
kedro pipeline pull
andkedro pipeline package
will be deprecated. Please usekedro micropkg
instead.
0.17.6
Release 0.17.6
Major features and improvements
- Added
pipelines
global variable to IPython extension, allowing you to access the project's pipelines inkedro ipython
orkedro jupyter notebook
. - Enabled overriding nested parameters with
params
in CLI, i.e.kedro run --params="model.model_tuning.booster:gbtree"
updates parameters to{"model": {"model_tuning": {"booster": "gbtree"}}}
. - Added option to
pandas.SQLQueryDataSet
to specify afilepath
with a SQL query, in addition to the current method of supplying the query itself in thesql
argument. - Extended
ExcelDataSet
to support saving Excel files with multiple sheets. - Added the following new datasets:
Type | Description | Location |
---|---|---|
plotly.JSONDataSet |
Works with plotly graph object Figures (saves as json file) | kedro.extras.datasets.plotly |
pandas.GenericDataSet |
Provides a 'best effort' facility to read / write any format provided by the pandas library |
kedro.extras.datasets.pandas |
pandas.GBQQueryDataSet |
Loads data from a Google Bigquery table using provided SQL query | kedro.extras.datasets.pandas |
spark.DeltaTableDataSet |
Dataset designed to handle Delta Lake Tables and their CRUD-style operations, including update , merge and delete |
kedro.extras.datasets.spark |
Bug fixes and other changes
- Fixed an issue where
kedro new --config config.yml
was ignoring the config file whenprompts.yml
didn't exist. - Added documentation for
kedro viz --autoreload
. - Added support for arbitrary backends (via importable module paths) that satisfy the
pickle
interface toPickleDataSet
. - Added support for
sum
syntax for connecting pipeline objects. - Upgraded
pip-tools
, which is used bykedro build-reqs
, to 6.4. Thispip-tools
version requirespip>=21.2
while adding support forpip>=21.3
. To upgradepip
, please refer to their documentation. - Relaxed the bounds on the
plotly
requirement forplotly.PlotlyDataSet
and thepyarrow
requirement forpandas.ParquetDataSet
. kedro pipeline package <pipeline>
now raises an error if the<pipeline>
argument doesn't look like a valid Python module path (e.g. has/
instead of.
).- Added new
overwrite
argument toPartitionedDataSet
andMatplotlibWriter
to enable deletion of existing partitions and plots on datasetsave
. kedro pipeline pull
now works when the project requirements contains entries such as-r
,--extra-index-url
and local wheel files (Issue #913).- Fixed slow startup because of catalog processing by reducing the exponential growth of extra processing during
_FrozenDatasets
creations. - Removed
.coveragerc
from the Kedro project template.coverage
settings are now given inpyproject.toml
. - Fixed a bug where packaging or pulling a modular pipeline with the same name as the project's package name would throw an error (or silently pass without including the pipeline source code in the wheel file).
- Removed unintentional dependency on
git
. - Fixed an issue where nested pipeline configuration was not included in the packaged pipeline.
- Deprecated the "Thanks for supporting contributions" section of release notes to simplify the contribution process; Kedro 0.17.6 is the last release that includes this. This process has been replaced with the automatic GitHub feature.
- Fixed a bug where the version on the tracking datasets didn't match the session id and the versions of regular versioned datasets.
- Fixed an issue where datasets in
load_versions
that are not found in the data catalog would silently pass. - Altered the string representation of nodes so that node inputs/outputs order is preserved rather than being alphabetically sorted.
Upcoming deprecations for Kedro 0.18.0
kedro.extras.decorators
andkedro.pipeline.decorators
are being deprecated in favour of Hooks.kedro.extras.transformers
andkedro.io.transformers
are being deprecated in favour of Hooks.- The
--parallel
flag onkedro run
is being removed in favour of--runner=ParallelRunner
. The-p
flag will change to be an alias for--pipeline
. kedro.io.DataCatalogWithDefault
is being deprecated, to be removed entirely in 0.18.0.
Thanks for supporting contributions
Deepyaman Datta,
Brites,
Manish Swami,
Avaneesh Yembadi,
Zain Patel,
Simon Brugman,
Kiyo Kunii,
Benjamin Levy,
Louis de Charsonville,
Simon Picard
0.17.5
Release 0.17.5
Major features and improvements
- Added new CLI group
registry
, with the associated commandskedro registry list
andkedro registry describe
, to replacekedro pipeline list
andkedro pipeline describe
. - Added support for dependency management at a modular pipeline level. When a pipeline with
requirements.txt
is packaged, its dependencies are embedded in the modular pipeline wheel file. Upon pulling the pipeline, Kedro will append dependencies to the project'srequirements.in
. More information is available in our documentation. - Added support for bulk packaging/pulling modular pipelines using
kedro pipeline package/pull --all
andpyproject.toml
. - Removed
cli.py
from the Kedro project template. By default all CLI commands, includingkedro run
, are now defined on the Kedro framework side. These can be overridden in turn by a plugin or acli.py
file in your project. A packaged Kedro project will respect the same hierarchy when executed withpython -m my_package
. - Removed
.ipython/profile_default/startup/
from the Kedro project template in favour of.ipython/profile_default/ipython_config.py
and thekedro.extras.extensions.ipython
. - Added support for
dill
backend toPickleDataSet
. - Imports are now refactored at
kedro pipeline package
andkedro pipeline pull
time, so that aliasing a modular pipeline doesn't break it. - Added the following new datasets to support basic Experiment Tracking:
Type | Description | Location |
---|---|---|
tracking.MetricsDataSet |
Dataset to track numeric metrics for experiment tracking | kedro.extras.datasets.tracking |
tracking.JSONDataSet |
Dataset to track data for experiment tracking | kedro.extras.datasets.tracking |
Bug fixes and other changes
- Bumped minimum required
fsspec
version to 2021.04. - Fixed the
kedro install
andkedro build-reqs
flows when uninstalled dependencies are present in a project'ssettings.py
,context.py
orhooks.py
(Issue #829). - Imports are now refactored at
kedro pipeline package
andkedro pipeline pull
time, so that aliasing a modular pipeline doesn't break it. - Pinned
dynaconf
to<3.1.6
because the method signature for_validate_items
changed which is used in Kedro.
Minor breaking changes to the API
Upcoming deprecations for Kedro 0.18.0
kedro pipeline list
andkedro pipeline describe
are being deprecated in favour of new commandskedro registry list
andkedro registry describe
.kedro install
is being deprecated in favour of usingpip install -r src/requirements.txt
to install project dependencies.
Thanks for supporting contributions
0.17.4
Release 0.17.4
Major features and improvements
- Added the following new datasets:
Type | Description | Location |
---|---|---|
plotly.PlotlyDataSet |
Works with plotly graph object Figures (saves as json file) | kedro.extras.datasets.plotly |
Bug fixes and other changes
- Defined our set of Kedro Principles! Have a read through our docs.
ConfigLoader.get()
now raises aBadConfigException
, with a more helpful error message, if a configuration file cannot be loaded (for instance due to wrong syntax or poor formatting).run_id
now defaults tosave_version
whenafter_catalog_created
is called, similarly to what happens during akedro run
.- Fixed a bug where
kedro ipython
andkedro jupyter notebook
didn't work if thePYTHONPATH
was already set. - Update the IPython extension to allow passing
env
andextra_params
toreload_kedro
similar to how the IPython script works. kedro info
now outputs if a plugin has anyhooks
orcli_hooks
implemented.PartitionedDataSet
now supports lazily materializing data on save.kedro pipeline describe
now defaults to the__default__
pipeline when no pipeline name is provided and also shows the namespace the nodes belong to.- Fixed an issue where spark.SparkDataSet with enabled versioning would throw a VersionNotFoundError when using databricks-connect from a remote machine and saving to dbfs filesystem.
EmailMessageDataSet
added to doctree.- When node inputs do not pass validation, the error message is now shown as the most recent exception in the traceback (Issue #761).
kedro pipeline package
now only packages the parameter file that exactly matches the pipeline name specified and the parameter files in a directory with the pipeline name.- Extended support to newer versions of third-party dependencies (Issue #735).
- Ensured consistent references to
model input
tables in accordance with our Data Engineering convention. - Changed behaviour where
kedro pipeline package
takes the pipeline package version, rather than the kedro package version. If the pipeline package version is not present, then the package version is used. - Launched GitHub Discussions and Kedro Discord Server
- Improved error message when versioning is enabled for a dataset previously saved as non-versioned (Issue #625).