Skip to content

Commit

Permalink
Merge pull request #64 from getindata/release-0.5.0
Browse files Browse the repository at this point in the history
Release 0.5.0
  • Loading branch information
marrrcin authored Aug 11, 2023
2 parents a040b3c + 9b84f03 commit 58a26ad
Show file tree
Hide file tree
Showing 35 changed files with 2,843 additions and 1,389 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.4.1
current_version = 0.5.0

[bumpversion:file:pyproject.toml]

Expand Down
2 changes: 1 addition & 1 deletion .copier-answers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: Kedro plugin with Azure ML Pipelines support
docs_url: https://kedro-azureml.readthedocs.io/
full_name: Kedro Azure ML Pipelines plugin
github_url: https://github.com/getindata/kedro-azureml
initial_version: 0.4.1
initial_version: 0.5.0
keywords:
- kedro
- mlops
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/tests_and_publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ jobs:
find "../dist" -name "*.tar.gz" | xargs -I@ cp @ kedro-azureml.tar.gz
echo -e "\n./kedro-azureml.tar.gz\n" >> src/requirements.txt
echo -e "kedro-docker\n" >> src/requirements.txt
echo -e "openpyxl\n" >> src/requirements.txt # temp fix for kedro-datasets issues with optional packages
sed -i '/kedro-telemetry/d' src/requirements.txt
echo $(cat src/requirements.txt)
pip install -r src/requirements.txt
Expand All @@ -150,6 +151,7 @@ jobs:
cp ../tests/conf/${{ matrix.e2e_config }}/azureml.yml conf/base/azureml.yml
sed -i 's/{container_registry}/${{ secrets.REGISTRY_LOGIN_SERVER }}/g' conf/base/azureml.yml
sed -i 's/{image_tag}/${{ matrix.e2e_config }}/g' conf/base/azureml.yml
cat conf/base/azureml.yml
- name: Login via Azure CLI
Expand Down
11 changes: 10 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

## [Unreleased]

## [0.5.0] - 2023-08-11

- [🚀 New dataset] Added support for `AzureMLAssetDataSet` based on Azure ML SDK v2 (fsspec) by [@tomasvanpottelbergh](https://github.com/tomasvanpottelbergh) & [@froessler](https://github.com/fdroessler)
- [📝 Docs] Updated datasets docs with sections
- Bumped minimal required Kedro version to \`0.18.11
- [⚠️ Deprecation warning] - starting from `0.4.0` the plugin is not compatible with ARM macOS versions due to internal azure dependencies (v1 SDKs). V1 SDK-based datasets will be removed in the future

## [0.4.1] - 2023-05-04

- [📝 Docs] Revamp the quickstart guide in documentation
Expand Down Expand Up @@ -62,7 +69,9 @@

- Initial plugin release

[Unreleased]: https://github.com/getindata/kedro-azureml/compare/0.4.1...HEAD
[Unreleased]: https://github.com/getindata/kedro-azureml/compare/0.5.0...HEAD

[0.5.0]: https://github.com/getindata/kedro-azureml/compare/0.4.1...0.5.0

[0.4.1]: https://github.com/getindata/kedro-azureml/compare/0.4.0...0.4.1

Expand Down
7 changes: 6 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,12 @@
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]

autodoc_mock_imports = ["azureml", "pandas", "backoff", "cloudpickle"]
autodoc_mock_imports = [
"azureml",
"pandas",
"backoff",
"cloudpickle",
]

# -- Options for HTML output -------------------------------------------------

Expand Down
36 changes: 31 additions & 5 deletions docs/source/05_data_assets.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
Azure Data Assets
=================

``kedro-azureml`` adds support for two new datasets that can be used in the Kedro catalog, the ``AzureMLFileDataSet``
and the ``AzureMLPandasDataSet`` which translate to `File/Folder dataset`_ and `Tabular dataset`_ respectively in
``kedro-azureml`` adds support for two new datasets that can be used in the Kedro catalog. Right now we support both Azure ML v1 SDK (direct Python) and Azure ML v2 SDK (fsspec-based) APIs.

**For v2 API (fspec-based)** - use ``AzureMLAssetDataSet`` that enables to use Azure ML v2-sdk Folder/File datasets for remote and local runs.

**For v1 API** (deprecated ⚠️) use the ``AzureMLFileDataSet`` and the ``AzureMLPandasDataSet`` which translate to `File/Folder dataset`_ and `Tabular dataset`_ respectively in
Azure Machine Learning. Both fully support the Azure versioning mechanism and can be used in the same way as any
other dataset in Kedro.


Apart from these, ``kedro-azureml`` also adds the ``AzureMLPipelineDataSet`` which is used to pass data between
pipeline nodes when the pipeline is run on Azure ML and the *pipeline data passing* feature is enabled.
By default, data is then saved and loaded using the ``PickleDataSet`` as underlying dataset.
Expand All @@ -24,15 +28,37 @@ For details on usage, see the :ref:`API Reference` below
API Reference
-------------

.. autoclass:: kedro_azureml.datasets.AzureMLPandasDataSet
Pipeline data passing
^^^^^^^^^^^^^

⚠️ Cannot be used when run locally.

.. autoclass:: kedro_azureml.datasets.AzureMLPipelineDataSet
:members:

-----------------

.. autoclass:: kedro_azureml.datasets.AzureMLFileDataSet

V2 SDK
^^^^^^^^^^^^^
Use the dataset below when you're using Azure ML SDK v2 (fsspec-based).

✅ Can be used for both remote and local runs.

.. autoclass:: kedro_azureml.datasets.asset_dataset.AzureMLAssetDataSet
:members:

V1 SDK
^^^^^^^^^^^^^
Use the datasets below when you're using Azure ML SDK v1 (direct Python).

⚠️ Deprecated - will be removed in future version of `kedro-azureml`.

.. autoclass:: kedro_azureml.datasets.AzureMLPandasDataSet
:members:

-----------------

.. autoclass:: kedro_azureml.datasets.AzureMLPipelineDataSet
.. autoclass:: kedro_azureml.datasets.AzureMLFileDataSet
:members:

2 changes: 1 addition & 1 deletion kedro_azureml/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "0.4.1"
__version__ = "0.5.0"

import warnings

Expand Down
Empty file added kedro_azureml/auth/__init__.py
Empty file.
74 changes: 74 additions & 0 deletions kedro_azureml/auth/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
import os
from functools import cached_property

from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azureml.core import Datastore, Run, Workspace
from azureml.exceptions import UserErrorException


def get_azureml_credentials():
try:
# On a AzureML compute instance, the managed identity will take precedence,
# while it does not have enough permissions.
# So, if we are on an AzureML compute instance, we disable the managed identity.
is_azureml_managed_identity = "MSI_ENDPOINT" in os.environ
credential = DefaultAzureCredential(
exclude_managed_identity_credential=is_azureml_managed_identity
)
# Check if given credential can get token successfully.
credential.get_token("https://management.azure.com/.default")
except Exception:
# Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
credential = InteractiveBrowserCredential()
return credential


def get_workspace(*args, **kwargs) -> Workspace:
"""
Get an AzureML workspace.
Args:
*args: Positional arguments to pass to the Workspace constructor.
**kwargs: Keyword arguments to pass to the Workspace constructor.
"""
if args or kwargs:
workspace = Workspace(*args, **kwargs)
else:
try:
# if running on azureml compute instance
workspace = Workspace.from_config()
except UserErrorException:
try:
# if running on azureml compute cluster.
workspace = Run.get_context().experiment.workspace
except AttributeError as e:
raise UserErrorException(
"Could not connect to AzureML workspace."
) from e
return workspace


class AzureMLDataStoreMixin:
def __init__(self, workspace_args, azureml_datastore=None, workspace=None):
self._workspace_instance = workspace
self._azureml_datastore_name = azureml_datastore
self._workspace_args = workspace_args or dict()

@cached_property
def _workspace(self) -> Workspace:
return self._workspace_instance or get_workspace(**self._workspace_args)

@cached_property
def _azureml_datastore(self) -> str:
return (
self._azureml_datastore_name or self._workspace.get_default_datastore().name
)

@cached_property
def _datastore_container_name(self) -> str:
ds = Datastore.get(self._workspace, self._azureml_datastore)
return ds.container_name

@cached_property
def _azureml_path(self):
return f"abfs://{self._datastore_container_name}/"
42 changes: 37 additions & 5 deletions kedro_azureml/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@
import logging
import os
from pathlib import Path
from typing import List, Optional, Tuple
from typing import Dict, List, Optional, Tuple

import click
from kedro.framework.cli.project import LOAD_VERSION_HELP
from kedro.framework.cli.utils import _split_load_versions
from kedro.framework.startup import ProjectMetadata

from kedro_azureml.cli_functions import (
Expand Down Expand Up @@ -206,6 +208,14 @@ def init(
multiple=True,
help="Environment variables to be injected in the steps, format: KEY=VALUE",
)
@click.option(
"--load-versions",
"-lv",
type=str,
default="",
help=LOAD_VERSION_HELP,
callback=_split_load_versions,
)
@click.pass_obj
@click.pass_context
def run(
Expand All @@ -218,6 +228,7 @@ def run(
params: str,
wait_for_completion: bool,
env_var: Tuple[str],
load_versions: Dict[str, str],
):
"""Runs the specified pipeline in Azure ML Pipelines; Additional parameters can be passed from command line.
Can be used with --wait-for-completion param to block the caller until the pipeline finishes in Azure ML.
Expand All @@ -236,7 +247,9 @@ def run(

mgr: KedroContextManager
extra_env = parse_extra_env_params(env_var)
with get_context_and_pipeline(ctx, image, pipeline, params, aml_env, extra_env) as (
with get_context_and_pipeline(
ctx, image, pipeline, params, aml_env, extra_env, load_versions
) as (
mgr,
az_pipeline,
):
Expand Down Expand Up @@ -302,6 +315,20 @@ def run(
default="pipeline.yaml",
help="Pipeline YAML definition file.",
)
@click.option(
"--env-var",
type=str,
multiple=True,
help="Environment variables to be injected in the steps, format: KEY=VALUE",
)
@click.option(
"--load-versions",
"-lv",
type=str,
default="",
help=LOAD_VERSION_HELP,
callback=_split_load_versions,
)
@click.pass_obj
def compile(
ctx: CliContext,
Expand All @@ -310,10 +337,15 @@ def compile(
pipeline: str,
params: list,
output: str,
env_var: Tuple[str],
load_versions: Dict[str, str],
):
"""Compiles the pipeline into YAML format"""
params = json.dumps(p) if (p := parse_extra_params(params)) else ""
with get_context_and_pipeline(ctx, image, pipeline, params, aml_env) as (
extra_env = parse_extra_env_params(env_var)
with get_context_and_pipeline(
ctx, image, pipeline, params, aml_env, extra_env, load_versions
) as (
_,
az_pipeline,
):
Expand Down Expand Up @@ -342,14 +374,14 @@ def compile(
@click.option(
"--az-input",
"azure_inputs",
type=(str, click.Path(exists=True, file_okay=False, dir_okay=True)),
type=(str, click.Path(exists=True, file_okay=True, dir_okay=True)),
multiple=True,
help="Name and path of Azure ML Pipeline input",
)
@click.option(
"--az-output",
"azure_outputs",
type=(str, click.Path(exists=True, file_okay=False, dir_okay=True)),
type=(str, click.Path(exists=True, file_okay=True, dir_okay=True)),
multiple=True,
help="Name and path of Azure ML Pipeline output",
)
Expand Down
3 changes: 3 additions & 0 deletions kedro_azureml/cli_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ def get_context_and_pipeline(
params: str,
aml_env: Optional[str] = None,
extra_env: Dict[str, str] = {},
load_versions: Dict[str, str] = {},
):
with KedroContextManager(
ctx.metadata.package_name, ctx.env, parse_extra_params(params, True)
Expand Down Expand Up @@ -50,11 +51,13 @@ def get_context_and_pipeline(
ctx.env,
mgr.plugin_config,
mgr.context.params,
mgr.context.catalog,
aml_env,
docker_image,
params,
storage_account_key,
extra_env,
load_versions,
)
az_pipeline = generator.generate()
yield mgr, az_pipeline
Expand Down
17 changes: 2 additions & 15 deletions kedro_azureml/client.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
import json
import logging
import os
from contextlib import contextmanager
from pathlib import Path
from tempfile import TemporaryDirectory
from typing import Callable, Optional

from azure.ai.ml import MLClient
from azure.ai.ml.entities import Job
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from kedro_azureml.auth.utils import get_azureml_credentials
from kedro_azureml.config import AzureMLConfig

logger = logging.getLogger(__name__)
Expand All @@ -23,19 +22,7 @@ def _get_azureml_client(subscription_id: Optional[str], config: AzureMLConfig):
"workspace_name": config.workspace_name,
}

try:
# On a AzureML compute instance, the managed identity will take precedence,
# while it does not have enough permissions.
# So, if we are on an AzureML compute instance, we disable the managed identity.
is_azureml_managed_identity = "MSI_ENDPOINT" in os.environ
credential = DefaultAzureCredential(
exclude_managed_identity_credential=is_azureml_managed_identity
)
# Check if given credential can get token successfully.
credential.get_token("https://management.azure.com/.default")
except Exception:
# Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
credential = InteractiveBrowserCredential()
credential = get_azureml_credentials()

with TemporaryDirectory() as tmp_dir:
config_path = Path(tmp_dir) / "config.json"
Expand Down
2 changes: 2 additions & 0 deletions kedro_azureml/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from kedro_azureml.datasets.asset_dataset import AzureMLAssetDataSet
from kedro_azureml.datasets.file_dataset import AzureMLFileDataSet
from kedro_azureml.datasets.pandas_dataset import AzureMLPandasDataSet
from kedro_azureml.datasets.pipeline_dataset import AzureMLPipelineDataSet
Expand All @@ -8,6 +9,7 @@

__all__ = [
"AzureMLFileDataSet",
"AzureMLAssetDataSet",
"AzureMLPipelineDataSet",
"AzureMLPandasDataSet",
"KedroAzureRunnerDataset",
Expand Down
Loading

0 comments on commit 58a26ad

Please sign in to comment.