Skip to content

Commit

Permalink
CHANGELOG + Kedro update + docs
Browse files Browse the repository at this point in the history
  • Loading branch information
marrrcin committed Aug 10, 2023
1 parent 4c892f9 commit a4668ae
Show file tree
Hide file tree
Showing 7 changed files with 277 additions and 27 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Changelog

## [Unreleased]
- [🚀 New dataset] Added support for `AzureMLAssetDataSet` based on Azure ML SDK v2 (fsspec) by [@tomasvanpottelbergh](https://github.com/tomasvanpottelbergh) & [@froessler](https://github.com/fdroessler)
- [📝 Docs] Updated datasets docs with sections
- Bumped minimal required Kedro version to `0.18.11
- [⚠️ Deprecation warning] - starting from `0.4.0` the plugin is not compatible with ARM macOS versions due to internal azure dependencies (v1 SDKs). V1 SDK-based datasets will be removed in the future

## [0.4.1] - 2023-05-04

Expand Down
7 changes: 6 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,12 @@
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]

autodoc_mock_imports = ["azureml", "pandas", "backoff", "cloudpickle"]
autodoc_mock_imports = [
"azureml",
"pandas",
"backoff",
"cloudpickle",
]

# -- Options for HTML output -------------------------------------------------

Expand Down
36 changes: 31 additions & 5 deletions docs/source/05_data_assets.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
Azure Data Assets
=================

``kedro-azureml`` adds support for two new datasets that can be used in the Kedro catalog, the ``AzureMLFileDataSet``
and the ``AzureMLPandasDataSet`` which translate to `File/Folder dataset`_ and `Tabular dataset`_ respectively in
``kedro-azureml`` adds support for two new datasets that can be used in the Kedro catalog. Right now we support both Azure ML v1 SDK (direct Python) and Azure ML v2 SDK (fsspec-based) APIs.

**For v2 API (fspec-based)** - use ``AzureMLAssetDataSet`` that enables to use Azure ML v2-sdk Folder/File datasets for remote and local runs.

**For v1 API** (deprecated ⚠️) use the ``AzureMLFileDataSet`` and the ``AzureMLPandasDataSet`` which translate to `File/Folder dataset`_ and `Tabular dataset`_ respectively in
Azure Machine Learning. Both fully support the Azure versioning mechanism and can be used in the same way as any
other dataset in Kedro.


Apart from these, ``kedro-azureml`` also adds the ``AzureMLPipelineDataSet`` which is used to pass data between
pipeline nodes when the pipeline is run on Azure ML and the *pipeline data passing* feature is enabled.
By default, data is then saved and loaded using the ``PickleDataSet`` as underlying dataset.
Expand All @@ -24,15 +28,37 @@ For details on usage, see the :ref:`API Reference` below
API Reference
-------------

.. autoclass:: kedro_azureml.datasets.AzureMLPandasDataSet
Pipeline data passing
^^^^^^^^^^^^^

⚠️ Cannot be used when run locally.

.. autoclass:: kedro_azureml.datasets.AzureMLPipelineDataSet
:members:

-----------------

.. autoclass:: kedro_azureml.datasets.AzureMLFileDataSet

V2 SDK
^^^^^^^^^^^^^
Use the dataset below when you're using Azure ML SDK v2 (fsspec-based).

✅ Can be used for both remote and local runs.

.. autoclass:: kedro_azureml.datasets.asset_dataset.AzureMLAssetDataSet
:members:

V1 SDK
^^^^^^^^^^^^^
Use the datasets below when you're using Azure ML SDK v1 (direct Python).

⚠️ Deprecated - will be removed in future version of `kedro-azureml`.

.. autoclass:: kedro_azureml.datasets.AzureMLPandasDataSet
:members:

-----------------

.. autoclass:: kedro_azureml.datasets.AzureMLPipelineDataSet
.. autoclass:: kedro_azureml.datasets.AzureMLFileDataSet
:members:

9 changes: 9 additions & 0 deletions kedro_azureml/datasets/file_dataset.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import typing as t
import warnings
from dataclasses import dataclass

from azureml.core import Dataset, Datastore, Workspace
Expand Down Expand Up @@ -161,6 +162,14 @@ def __init__(
make sure to not pass `path` argument, as it will be built from `azureml_datastore` argument.
"""
# validate that `path` is not part of kwargs, as we are building the `path` from `azureml_datastore` argument.
warnings.warn(
"Dataset AzureMLFileDataSet is deprecated and will"
" be removed in the upcoming release of kedro-azureml due to incompatibility"
" of Azure ML SDK v1 with ARM macOS\n"
"Please use AzureMLAssetDataSet instead",
DeprecationWarning,
stacklevel=2,
)
if "path" in kwargs:
raise ValueError(
f"`path` is not a valid argument for {self.__class__.__name__}"
Expand Down
9 changes: 9 additions & 0 deletions kedro_azureml/datasets/pandas_dataset.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import typing as t
import warnings

import pandas as pd
from azureml.core import Dataset, Datastore, Workspace
Expand Down Expand Up @@ -74,6 +75,14 @@ def __init__(
workspace: AzureML Workspace. If not specified, will attempt to load the workspace automatically.
workspace_args: Additional arguments to pass to `utils.get_workspace()`.
"""
warnings.warn(
"Dataset AzureMLPandasDataSet is deprecated and will"
" be removed in the upcoming release of kedro-azureml due to incompatibility"
" of Azure ML SDK v1 with ARM macOS\n"
"Please use AzureMLAssetDataSet instead",
DeprecationWarning,
stacklevel=2,
)
self._azureml_dataset = azureml_dataset
self._azureml_dataset_save_args = azureml_dataset_save_args or dict()
self._azureml_dataset_load_args = azureml_dataset_load_args or dict()
Expand Down
Loading

0 comments on commit a4668ae

Please sign in to comment.