Skip to content

Releases: kedro-org/kedro

0.18.5

20 Feb 17:47
393d9d2
Compare
Choose a tag to compare

Release 0.18.5

NOTE: This version of Kedro introduced a bug such that the Kedro-Viz console to fail to show experiment tracking correctly. We recommend that you don't use it and prefer instead to use Kedro version 0.18.6.

Major features and improvements

  • Added new OmegaConfigLoader which uses OmegaConf for loading and merging configuration.
  • Added the --conf-source option to kedro run, allowing users to specify a source for project configuration for the run.
  • Added omegaconf syntax as option for --params. Keys and values can now be separated by colons or equals signs.
  • Added support for generator functions as nodes, i.e. using yield instead of return.
    • Enable chunk-wise processing in nodes with generator functions.
    • Save node outputs after every yield before proceeding with next chunk.
  • Fixed incorrect parsing of Azure Data Lake Storage Gen2 URIs used in datasets.
  • Added support for loading credentials from environment variables using OmegaConfigLoader.
  • Added new --namespace flag to kedro run to enable filtering by node namespace.
  • Added a new argument node for all four dataset hooks.
  • Added the kedro run flags --nodes, --tags, and --load-versions to replace --node, --tag, and --load-version.

Bug fixes and other changes

  • Commas surrounded by square brackets (only possible for nodes with default names) will no longer split the arguments to kedro run options which take a list of nodes as inputs (--from-nodes and --to-nodes).
  • Fixed bug where micropkg manifest section in pyproject.toml isn't recognised as allowed configuration.
  • Fixed bug causing load_ipython_extension not to register the %reload_kedro line magic when called in a directory that does not contain a Kedro project.
  • Added anyconfig's ac_context parameter to kedro.config.commons module functions for more flexible ConfigLoader customizations.
  • Change reference to kedro.pipeline.Pipeline object throughout test suite with kedro.modular_pipeline.pipeline factory.
  • Fixed bug causing the after_dataset_saved hook only to be called for one output dataset when multiple are saved in a single node and async saving is in use.
  • Log level for "Credentials not found in your Kedro project config" was changed from WARNING to DEBUG.
  • Added safe extraction of tar files in micropkg pull to fix vulnerability caused by CVE-2007-4559.
  • Documentation improvements
    • Bug fix in table font size
    • Updated API docs links for datasets
    • Improved CLI docs for kedro run
    • Revised documentation for visualisation to build plots and for experiment tracking
    • Added example for loading external credentials to the Hooks documentation

Breaking changes to the API

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

Upcoming deprecations for Kedro 0.19.0

  • project_version will be deprecated in pyproject.toml please use kedro_init_version instead.
  • Deprecated kedro run flags --node, --tag, and --load-version in favour of --nodes, --tags, and --load-versions.

0.18.4

05 Dec 16:36
a6e91be
Compare
Choose a tag to compare

Major features and improvements

  • Make Kedro instantiate datasets from kedro_datasets with higher priority than kedro.extras.datasets. kedro_datasets is the namespace for the new kedro-datasets python package.
  • The config loader objects now implement UserDict and the configuration is accessed through conf_loader['catalog'].
  • You can configure config file patterns through settings.py without creating a custom config loader.
  • Added the following new datasets:
Type Description Location
svmlight.SVMLightDataSet Work with svmlight/libsvm files using scikit-learn library kedro.extras.datasets.svmlight
video.VideoDataSet Read and write video files from a filesystem kedro.extras.datasets.video
video.video_dataset.SequenceVideo Create a video object from an iterable sequence to use with VideoDataSet kedro.extras.datasets.video
video.video_dataset.GeneratorVideo Create a video object from a generator to use with VideoDataSet kedro.extras.datasets.video
  • Implemented support for a functional definition of schema in dask.ParquetDataSet to work with the dask.to_parquet API.

Bug fixes and other changes

  • Fixed kedro micropkg pull for packages on PyPI.
  • Fixed format in save_args for SparkHiveDataSet, previously it didn't allow you to save it as delta format.
  • Fixed save errors in TensorFlowModelDataset when used without versioning; previously, it wouldn't overwrite an existing model.
  • Added support for tf.device in TensorFlowModelDataset.
  • Updated error message for VersionNotFoundError to handle insufficient permission issues for cloud storage.
  • Updated Experiment Tracking docs with working examples.
  • Updated MatplotlibWriter Dataset, TextDataset, plotly.PlotlyDataSet and plotly.JSONDataSet docs with working examples.
  • Modified implementation of the Kedro IPython extension to use local_ns rather than a global variable.
  • Refactored ShelveStore to its own module to ensure multiprocessing works with it.
  • kedro.extras.datasets.pandas.SQLQueryDataSet now takes optional argument execution_options.
  • Removed attrs upper bound to support newer versions of Airflow.
  • Bumped the lower bound for the setuptools dependency to <=61.5.1.

Minor breaking changes to the API

Upcoming deprecations for Kedro 0.19.0

  • kedro test and kedro lint will be deprecated.

Documentation

  • Revised the Introduction to shorten it
  • Revised the Get Started section to remove unnecessary information and clarify the learning path
  • Updated the spaceflights tutorial to simplify the later stages and clarify what the reader needed to do in each phase
  • Moved some pages that covered advanced materials into more appropriate sections
  • Moved visualisation into its own section
  • Fixed a bug that degraded user experience: the table of contents is now sticky when you navigate between pages
  • Added redirects where needed on ReadTheDocs for legacy links and bookmarks

Contributions from the Kedroid community

We are grateful to the following for submitting PRs that contributed to this release: jstammers, FlorianGD, yash6318, carlaprv, dinotuku, williamcaicedo, avan-sh, Kastakin, amaralbf, BSGalvan, levimjoseph, daniel-falk, clotildeguinard, avsolatorio, and picklejuicedev for comments and input to documentation changes

0.18.3

20 Sep 14:33
4db0ae3
Compare
Choose a tag to compare

Release 0.18.3

Major features and improvements

  • Implemented autodiscovery of project pipelines. A pipeline created with kedro pipeline create <pipeline_name> can now be accessed immediately without needing to explicitly register it in src/<package_name>/pipeline_registry.py, either individually by name (e.g. kedro run --pipeline=<pipeline_name>) or as part of the combined default pipeline (e.g. kedro run). By default, the simplified register_pipelines() function in pipeline_registry.py looks like:

    def register_pipelines() -> Dict[str, Pipeline]:
        """Register the project's pipelines.
    
        Returns:
            A mapping from pipeline names to ``Pipeline`` objects.
        """
        pipelines = find_pipelines()
        pipelines["__default__"] = sum(pipelines.values())
        return pipelines
  • The Kedro IPython extension should now be loaded with %load_ext kedro.ipython.

  • The line magic %reload_kedro now accepts keywords arguments, e.g. %reload_kedro --env=prod.

  • Improved resume pipeline suggestion for SequentialRunner, it will backtrack the closest persisted inputs to resume.

Bug fixes and other changes

  • Changed default False value for rich logging show_locals, to make sure credentials and other sensitive data isn't shown in logs.
  • Rich traceback handling is disabled on Databricks so that exceptions now halt execution as expected. This is a workaround for a bug in rich.
  • When using kedro run -n [some_node], if some_node is missing a namespace the resulting error message will suggest the correct node name.
  • Updated documentation for rich logging.
  • Updated Prefect deployment documentation to allow for reruns with saved versioned datasets.
  • The Kedro IPython extension now surfaces errors when it cannot load a Kedro project.
  • Relaxed delta-spark upper bound to allow compatibility with Spark 3.1.x and 3.2.x.
  • Added gdrive to list of cloud protocols, enabling Google Drive paths for datasets.
  • Added svg logo resource for ipython kernel.

Upcoming deprecations for Kedro 0.19.0

  • The Kedro IPython extension will no longer be available as %load_ext kedro.extras.extensions.ipython; use %load_ext kedro.ipython instead.
  • kedro jupyter convert, kedro build-docs, kedro build-reqs and kedro activate-nbstripout will be deprecated.

0.18.2

08 Jul 15:55
3772ab9
Compare
Choose a tag to compare

Release 0.18.2

Major features and improvements

  • Added abfss to list of cloud protocols, enabling abfss paths.
  • Kedro now uses the Rich library to format terminal logs and tracebacks.
  • The file conf/base/logging.yml is now optional. See our documentation for details.
  • Introduced a kedro.starters entry point. This enables plugins to create custom starter aliases used by kedro starter list and kedro new.
  • Reduced the kedro new prompts to just one question asking for the project name.

Bug fixes and other changes

  • Bumped pyyaml upper bound to make Kedro compatible with the pyodide stack.
  • Updated project template's Sphinx configuration to use myst_parser instead of recommonmark.
  • Reduced number of log lines by changing the logging level from INFO to DEBUG for low priority messages.
  • Kedro's framework-side logging configuration no longer performs file-based logging. Hence superfluous info.log/errors.log files are no longer created in your project root, and running Kedro on read-only file systems such as Databricks Repos is now possible.
  • The root logger is now set to the Python default level of WARNING rather than INFO. Kedro's logger is still set to emit INFO level messages.
  • SequentialRunner now has consistent execution order across multiple runs with sorted nodes.
  • Bumped the upper bound for the Flake8 dependency to <5.0.
  • kedro jupyter notebook/lab no longer reuses a Jupyter kernel.
  • Required cookiecutter>=2.1.1 to address a known command injection vulnerability.
  • The session store no longer fails if a username cannot be found with getpass.getuser.
  • Added generic typing for AbstractDataSet and AbstractVersionedDataSet as well as typing to all datasets.
  • Rendered the deployment guide flowchart as a Mermaid diagram, and added Dask.

Minor breaking changes to the API

  • The module kedro.config.default_logger no longer exists; default logging configuration is now set automatically through kedro.framework.project.LOGGING. Unless you explicitly import kedro.config.default_logger you do not need to make any changes.

Upcoming deprecations for Kedro 0.19.0

  • kedro.extras.ColorHandler will be removed in 0.19.0.

0.18.1

09 May 21:13
686dba5
Compare
Choose a tag to compare

Major features and improvements

  • Added a new hook after_context_created that passes the KedroContext instance as context.
  • Added a new CLI hook after_command_run.
  • Added more detail to YAML ParserError exception error message.
  • Added option to SparkDataSet to specify a schema load argument that allows for supplying a user-defined schema as opposed to relying on the schema inference of Spark.
  • The Kedro package no longer contains a built version of the Kedro documentation significantly reducing the package size.

Bug fixes and other changes

  • Removed fatal error from being logged when a Kedro session is created in a directory without git.
  • Fixed CONFIG_LOADER_CLASS validation so that TemplatedConfigLoader can be specified in settings.py. Any CONFIG_LOADER_CLASS must be a subclass of AbstractConfigLoader.
  • Added runner name to the run_params dictionary used in pipeline hooks.
  • Updated Databricks documentation to include how to get it working with IPython extension and Kedro-Viz.
  • Update sections on visualisation, namespacing, and experiment tracking in the spaceflight tutorial to correspond to the complete spaceflights starter.
  • Fixed Jinja2 syntax loading with TemplatedConfigLoader using globals.yml.
  • Removed global _active_session, _activate_session and _deactivate_session. Plugins that need to access objects such as the config loader should now do so through context in the new after_context_created hook.
  • config_loader is available as a public read-only attribute of KedroContext.
  • Made hook_manager argument optional for runner.run.
  • kedro docs now opens an online version of the Kedro documentation instead of a locally built version.

Upcoming deprecations for Kedro 0.19.0

  • kedro docs will be removed in 0.19.0.

0.18.0

31 Mar 16:06
35e4ac5
Compare
Choose a tag to compare

Release 0.18.0

TL;DR ✨

Kedro 0.18.0 strives to reduce the complexity of the project template and get us closer to a stable release of the framework. We've introduced the full micro-packaging workflow 📦, which allows you to import packages, utility functions and existing pipelines into your Kedro project. Integration with IPython and Jupyter has been streamlined in preparation for enhancements to Kedro's interactive workflow. Additionally, the release comes with long-awaited Python 3.9 and 3.10 support 🐍.

Major features and improvements

Framework

  • Added kedro.config.abstract_config.AbstractConfigLoader as an abstract base class for all ConfigLoader implementations. ConfigLoader and TemplatedConfigLoader now inherit directly from this base class.
  • Streamlined the ConfigLoader.get and TemplatedConfigLoader.get API and delegated the actual get method functional implementation to the kedro.config.common module.
  • The hook_manager is no longer a global singleton. The hook_manager lifecycle is now managed by the KedroSession, and a new hook_manager will be created every time a session is instantiated.
  • Added support for specifying parameters mapping in pipeline() without the params: prefix.
  • Added new API Pipeline.filter() (previously in KedroContext._filter_pipeline()) to filter parts of a pipeline.
  • Added username to Session store for logging during Experiment Tracking.
  • A packaged Kedro project can now be imported and run from another Python project as following:
from my_package.__main__ import main

main(
    ["--pipleine", "my_pipeline"]
)  # or just main() if no parameters are needed for the run

Project template

  • Removed cli.py from the Kedro project template. By default, all CLI commands, including kedro run, are now defined on the Kedro framework side. You can still define custom CLI commands by creating your own cli.py.
  • Removed hooks.py from the Kedro project template. Registration hooks have been removed in favour of settings.py configuration, but you can still define execution timeline hooks by creating your own hooks.py.
  • Removed .ipython directory from the Kedro project template. The IPython/Jupyter workflow no longer uses IPython profiles; it now uses an IPython extension.
  • The default kedro run configuration environment names can now be set in settings.py using the CONFIG_LOADER_ARGS variable. The relevant keyword arguments to supply are base_env and default_run_env, which are set to base and local respectively by default.

DataSets

  • Added the following new datasets:
Type Description Location
pandas.XMLDataSet Read XML into Pandas DataFrame. Write Pandas DataFrame to XML kedro.extras.datasets.pandas
networkx.GraphMLDataSet Work with NetworkX using GraphML files kedro.extras.datasets.networkx
networkx.GMLDataSet Work with NetworkX using Graph Modelling Language files kedro.extras.datasets.networkx
redis.PickleDataSet loads/saves data from/to a Redis database kedro.extras.datasets.redis
  • Added partitionBy support and exposed save_args for SparkHiveDataSet.
  • Exposed open_args_save in fs_args for pandas.ParquetDataSet.
  • Refactored the load and save operations for pandas datasets in order to leverage pandas own API and delegate fsspec operations to them. This reduces the need to have our own fsspec wrappers.
  • Merged pandas.AppendableExcelDataSet into pandas.ExcelDataSet.
  • Added save_args to feather.FeatherDataSet.

Jupyter and IPython integration

  • The only recommended way to work with Kedro in Jupyter or IPython is now the Kedro IPython extension. Managed Jupyter instances should load this via %load_ext kedro.extras.extensions.ipython and use the line magic %reload_kedro.
  • kedro ipython launches an IPython session that preloads the Kedro IPython extension.
  • kedro jupyter notebook/lab creates a custom Jupyter kernel that preloads the Kedro IPython extension and launches a notebook with that kernel selected. There is no longer a need to specify --all-kernels to show all available kernels.

Dependencies

  • Bumped the minimum version of pandas to 1.3. Any storage_options should continue to be specified under fs_args and/or credentials.
  • Added support for Python 3.9 and 3.10, dropped support for Python 3.6.
  • Updated black dependency in the project template to a non pre-release version.

Other

  • Documented distribution of Kedro pipelines with Dask.

Breaking changes to the API

Framework

  • Removed RegistrationSpecs and its associated register_config_loader and register_catalog hook specifications in favour of CONFIG_LOADER_CLASS/CONFIG_LOADER_ARGS and DATA_CATALOG_CLASS in settings.py.
  • Removed deprecated functions load_context and get_project_context.
  • Removed deprecated CONF_SOURCE, package_name, pipeline, pipelines, config_loader and io attributes from KedroContext as well as the deprecated KedroContext.run method.
  • Added the PluginManager hook_manager argument to KedroContext and the Runner.run() method, which will be provided by the KedroSession.
  • Removed the public method get_hook_manager() and replaced its functionality by _create_hook_manager().
  • Enforced that only one run can be successfully executed as part of a KedroSession. run_id has been renamed to session_id as a result.

Configuration loaders

  • The settings.py setting CONF_ROOT has been renamed to CONF_SOURCE. Default value of conf remains unchanged.
  • ConfigLoader and TemplatedConfigLoader argument conf_root has been renamed to conf_source.
  • extra_params has been renamed to runtime_params in kedro.config.config.ConfigLoader and kedro.config.templated_config.TemplatedConfigLoader.
  • The environment defaulting behaviour has been removed from KedroContext and is now implemented in a ConfigLoader class (or equivalent) with the base_env and default_run_env attributes.

DataSets

  • pandas.ExcelDataSet now uses openpyxl engine instead of xlrd.
  • pandas.ParquetDataSet now calls pd.to_parquet() upon saving. Note that the argument partition_cols is not supported.
  • spark.SparkHiveDataSet API has been updated to reflect spark.SparkDataSet. The write_mode=insert option has also been replaced with write_mode=append as per Spark styleguide. This change addresses Issue 725 and Issue 745. Additionally, upsert mode now leverages checkpoint functionality and requires a valid checkpointDir be set for current SparkContext.
  • yaml.YAMLDataSet can no longer save a pandas.DataFrame directly, but it can save a dictionary. Use pandas.DataFrame.to_dict() to convert your pandas.DataFrame to a dictionary before you attempt to save it to YAML.
  • Removed open_args_load and open_args_save from the following datasets:
    • pandas.CSVDataSet
    • pandas.ExcelDataSet
    • pandas.FeatherDataSet
    • pandas.JSONDataSet
    • pandas.ParquetDataSet
  • storage_options are now dropped if they are specified under load_args or save_args for the following datasets:
    • pandas.CSVDataSet
    • pandas.ExcelDataSet
    • pandas.FeatherDataSet
    • pandas.JSONDataSet
    • pandas.ParquetDataSet
  • Renamed lambda_data_set, memory_data_set, and partitioned_data_set to lambda_dataset, memory_dataset, and partitioned_dataset, respectively, in kedro.io.
  • The dataset networkx.NetworkXDataSet has been renamed to networkx.JSONDataSet.

CLI

  • Removed kedro install in favour of pip install -r src/requirements.txt to install project dependencies.
  • Removed --parallel flag from kedro run in favour of --runner=ParallelRunner. The -p flag is now an alias for --pipeline.
  • kedro pipeline package has been replaced by kedro micropkg package and, in addition to the --alias flag used to rename the package, now accepts a module name and path to the pipeline or utility module to package, relative to src/<package_name>/. The --version CLI option has been removed in favour of setting a __version__ variable in the micro-package's __init__.py file.
  • kedro pipeline pull has been replaced by kedro micropkg pull and now also supports --destination to provide a location for pulling the package.
  • Removed kedro pipeline list and kedro pipeline describe in favour of kedro registry list and kedro registry describe.
  • kedro package and kedro micropkg package now save egg and whl or tar files in the <project_root>/dist folder (previously <project_root>/src/dist).
  • Changed the behaviour of kedro build-reqs to compile requirements from requirements.txt instead of requirements.in and save them to requirements.lock instead of requirements.txt.
  • kedro jupyter notebook/lab no longer accept --all-kernels or --idle-timeout flags. --all-kernels is now the default behaviour.
  • KedroSession.run now raises ValueError rather than KedroContextError when the pipeline contains no nodes. The same ValueError is raised when there are no matching tags.
  • KedroSession.run now raises ValueError rather than KedroContextError w...
Read more

0.17.7

22 Feb 15:59
59bcb50
Compare
Choose a tag to compare

Release 0.17.7

Major features and improvements

  • pipeline now accepts tags and a collection of Nodes and/or Pipelines rather than just a single Pipeline object. pipeline should be used in preference to Pipeline when creating a Kedro pipeline.
  • pandas.SQLTableDataSet and pandas.SQLQueryDataSet now only open one connection per database, at instantiation time (therefore at catalog creation time), rather than one per load/save operation.
  • Added new command group, micropkg, to replace kedro pipeline pull and kedro pipeline package with kedro micropkg pull and kedro micropkg package for Kedro 0.18.0. kedro micropkg package saves packages to project/dist while kedro pipeline package saves packages to project/src/dist.

Bug fixes and other changes

  • Added tutorial documentation for experiment tracking.
  • Added Plotly dataset documentation.
  • Added the upper limit pandas<1.4 to maintain compatibility with xlrd~=1.0.
  • Bumped the Pillow minimum version requirement to 9.0 (Python 3.7+ only) following CVE-2022-22817.
  • Fixed PickleDataSet to be copyable and hence work with the parallel runner.
  • Upgraded pip-tools, which is used by kedro build-reqs, to 6.5 (Python 3.7+ only). This pip-tools version is compatible with pip>=21.2, including the most recent releases of pip. Python 3.6 users should continue to use pip-tools 6.4 and pip<22.
  • Added astro-iris as alias for astro-airlow-iris, so that old tutorials can still be followed.
  • Added details about Kedro's Technical Steering Committee and governance model.

Upcoming deprecations for Kedro 0.18.0

  • kedro pipeline pull and kedro pipeline package will be deprecated. Please use kedro micropkg instead.

0.17.6

09 Dec 15:59
319a917
Compare
Choose a tag to compare

Release 0.17.6

Major features and improvements

  • Added pipelines global variable to IPython extension, allowing you to access the project's pipelines in kedro ipython or kedro jupyter notebook.
  • Enabled overriding nested parameters with params in CLI, i.e. kedro run --params="model.model_tuning.booster:gbtree" updates parameters to {"model": {"model_tuning": {"booster": "gbtree"}}}.
  • Added option to pandas.SQLQueryDataSet to specify a filepath with a SQL query, in addition to the current method of supplying the query itself in the sql argument.
  • Extended ExcelDataSet to support saving Excel files with multiple sheets.
  • Added the following new datasets:
Type Description Location
plotly.JSONDataSet Works with plotly graph object Figures (saves as json file) kedro.extras.datasets.plotly
pandas.GenericDataSet Provides a 'best effort' facility to read / write any format provided by the pandas library kedro.extras.datasets.pandas
pandas.GBQQueryDataSet Loads data from a Google Bigquery table using provided SQL query kedro.extras.datasets.pandas
spark.DeltaTableDataSet Dataset designed to handle Delta Lake Tables and their CRUD-style operations, including update, merge and delete kedro.extras.datasets.spark

Bug fixes and other changes

  • Fixed an issue where kedro new --config config.yml was ignoring the config file when prompts.yml didn't exist.
  • Added documentation for kedro viz --autoreload.
  • Added support for arbitrary backends (via importable module paths) that satisfy the pickle interface to PickleDataSet.
  • Added support for sum syntax for connecting pipeline objects.
  • Upgraded pip-tools, which is used by kedro build-reqs, to 6.4. This pip-tools version requires pip>=21.2 while adding support for pip>=21.3. To upgrade pip, please refer to their documentation.
  • Relaxed the bounds on the plotly requirement for plotly.PlotlyDataSet and the pyarrow requirement for pandas.ParquetDataSet.
  • kedro pipeline package <pipeline> now raises an error if the <pipeline> argument doesn't look like a valid Python module path (e.g. has / instead of .).
  • Added new overwrite argument to PartitionedDataSet and MatplotlibWriter to enable deletion of existing partitions and plots on dataset save.
  • kedro pipeline pull now works when the project requirements contains entries such as -r, --extra-index-url and local wheel files (Issue #913).
  • Fixed slow startup because of catalog processing by reducing the exponential growth of extra processing during _FrozenDatasets creations.
  • Removed .coveragerc from the Kedro project template. coverage settings are now given in pyproject.toml.
  • Fixed a bug where packaging or pulling a modular pipeline with the same name as the project's package name would throw an error (or silently pass without including the pipeline source code in the wheel file).
  • Removed unintentional dependency on git.
  • Fixed an issue where nested pipeline configuration was not included in the packaged pipeline.
  • Deprecated the "Thanks for supporting contributions" section of release notes to simplify the contribution process; Kedro 0.17.6 is the last release that includes this. This process has been replaced with the automatic GitHub feature.
  • Fixed a bug where the version on the tracking datasets didn't match the session id and the versions of regular versioned datasets.
  • Fixed an issue where datasets in load_versions that are not found in the data catalog would silently pass.
  • Altered the string representation of nodes so that node inputs/outputs order is preserved rather than being alphabetically sorted.

Upcoming deprecations for Kedro 0.18.0

  • kedro.extras.decorators and kedro.pipeline.decorators are being deprecated in favour of Hooks.
  • kedro.extras.transformers and kedro.io.transformers are being deprecated in favour of Hooks.
  • The --parallel flag on kedro run is being removed in favour of --runner=ParallelRunner. The -p flag will change to be an alias for --pipeline.
  • kedro.io.DataCatalogWithDefault is being deprecated, to be removed entirely in 0.18.0.

Thanks for supporting contributions

Deepyaman Datta,
Brites,
Manish Swami,
Avaneesh Yembadi,
Zain Patel,
Simon Brugman,
Kiyo Kunii,
Benjamin Levy,
Louis de Charsonville,
Simon Picard

0.17.5

14 Sep 15:11
80c0d3a
Compare
Choose a tag to compare

Release 0.17.5

Major features and improvements

  • Added new CLI group registry, with the associated commands kedro registry list and kedro registry describe, to replace kedro pipeline list and kedro pipeline describe.
  • Added support for dependency management at a modular pipeline level. When a pipeline with requirements.txt is packaged, its dependencies are embedded in the modular pipeline wheel file. Upon pulling the pipeline, Kedro will append dependencies to the project's requirements.in. More information is available in our documentation.
  • Added support for bulk packaging/pulling modular pipelines using kedro pipeline package/pull --all and pyproject.toml.
  • Removed cli.py from the Kedro project template. By default all CLI commands, including kedro run, are now defined on the Kedro framework side. These can be overridden in turn by a plugin or a cli.py file in your project. A packaged Kedro project will respect the same hierarchy when executed with python -m my_package.
  • Removed .ipython/profile_default/startup/ from the Kedro project template in favour of .ipython/profile_default/ipython_config.py and the kedro.extras.extensions.ipython.
  • Added support for dill backend to PickleDataSet.
  • Imports are now refactored at kedro pipeline package and kedro pipeline pull time, so that aliasing a modular pipeline doesn't break it.
  • Added the following new datasets to support basic Experiment Tracking:
Type Description Location
tracking.MetricsDataSet Dataset to track numeric metrics for experiment tracking kedro.extras.datasets.tracking
tracking.JSONDataSet Dataset to track data for experiment tracking kedro.extras.datasets.tracking

Bug fixes and other changes

  • Bumped minimum required fsspec version to 2021.04.
  • Fixed the kedro install and kedro build-reqs flows when uninstalled dependencies are present in a project's settings.py, context.py or hooks.py (Issue #829).
  • Imports are now refactored at kedro pipeline package and kedro pipeline pull time, so that aliasing a modular pipeline doesn't break it.
  • Pinned dynaconf to <3.1.6 because the method signature for _validate_items changed which is used in Kedro.

Minor breaking changes to the API

Upcoming deprecations for Kedro 0.18.0

  • kedro pipeline list and kedro pipeline describe are being deprecated in favour of new commands kedro registry list and kedro registry describe.
  • kedro install is being deprecated in favour of using pip install -r src/requirements.txt to install project dependencies.

Thanks for supporting contributions

Moussa Taifi,
Deepyaman Datta

0.17.4

16 Jun 09:09
392491b
Compare
Choose a tag to compare

Release 0.17.4

Major features and improvements

  • Added the following new datasets:
Type Description Location
plotly.PlotlyDataSet Works with plotly graph object Figures (saves as json file) kedro.extras.datasets.plotly

Bug fixes and other changes

  • Defined our set of Kedro Principles! Have a read through our docs.
  • ConfigLoader.get() now raises a BadConfigException, with a more helpful error message, if a configuration file cannot be loaded (for instance due to wrong syntax or poor formatting).
  • run_id now defaults to save_version when after_catalog_created is called, similarly to what happens during a kedro run.
  • Fixed a bug where kedro ipython and kedro jupyter notebook didn't work if the PYTHONPATH was already set.
  • Update the IPython extension to allow passing env and extra_params to reload_kedro similar to how the IPython script works.
  • kedro info now outputs if a plugin has any hooks or cli_hooks implemented.
  • PartitionedDataSet now supports lazily materializing data on save.
  • kedro pipeline describe now defaults to the __default__ pipeline when no pipeline name is provided and also shows the namespace the nodes belong to.
  • Fixed an issue where spark.SparkDataSet with enabled versioning would throw a VersionNotFoundError when using databricks-connect from a remote machine and saving to dbfs filesystem.
  • EmailMessageDataSet added to doctree.
  • When node inputs do not pass validation, the error message is now shown as the most recent exception in the traceback (Issue #761).
  • kedro pipeline package now only packages the parameter file that exactly matches the pipeline name specified and the parameter files in a directory with the pipeline name.
  • Extended support to newer versions of third-party dependencies (Issue #735).
  • Ensured consistent references to model input tables in accordance with our Data Engineering convention.
  • Changed behaviour where kedro pipeline package takes the pipeline package version, rather than the kedro package version. If the pipeline package version is not present, then the package version is used.
  • Launched GitHub Discussions and Kedro Discord Server
  • Improved error message when versioning is enabled for a dataset previously saved as non-versioned (Issue #625).