Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualize pipeline objects in notebook #2241

Open
wants to merge 57 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
dc31929
initial draft
ravi-kumar-pilla Jan 15, 2025
d3448dc
adding window config for jupyter users
ravi-kumar-pilla Jan 22, 2025
7755e11
working draft
ravi-kumar-pilla Jan 23, 2025
7c264dd
working final draft
ravi-kumar-pilla Jan 28, 2025
bf08766
working final draft
ravi-kumar-pilla Jan 28, 2025
4483342
clean window pollution
ravi-kumar-pilla Jan 29, 2025
fd532ee
working draft with 2 approaches
ravi-kumar-pilla Jan 29, 2025
563182f
initial bundle draft
ravi-kumar-pilla Jan 29, 2025
e8f7249
update webpack
ravi-kumar-pilla Jan 29, 2025
8b66fec
testing webpack
ravi-kumar-pilla Jan 30, 2025
72bcc28
ignore babel for umd
ravi-kumar-pilla Jan 30, 2025
32632d0
testing with published bundle
ravi-kumar-pilla Jan 30, 2025
d3f9c21
tested bundle
ravi-kumar-pilla Jan 30, 2025
c148a55
merge bundle PR
ravi-kumar-pilla Jan 30, 2025
f7e10a1
optimization code added
ravi-kumar-pilla Jan 31, 2025
5031722
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jan 31, 2025
06fc82d
add optimization to prod bundle
ravi-kumar-pilla Jan 31, 2025
8298408
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Feb 5, 2025
6e5511b
add umd to repo
ravi-kumar-pilla Feb 5, 2025
b49109b
v10.3.0
ravi-kumar-pilla Feb 5, 2025
fd379f7
push umd bundle
ravi-kumar-pilla Feb 5, 2025
d962a9b
remove additional commits
ravi-kumar-pilla Feb 5, 2025
7ad4be9
remove additional commits
ravi-kumar-pilla Feb 5, 2025
47e2b4b
add release note
ravi-kumar-pilla Feb 5, 2025
199a34c
merge main
ravi-kumar-pilla Feb 5, 2025
45dd808
add umd bundle
ravi-kumar-pilla Feb 5, 2025
4f69d7e
testing esm module
ravi-kumar-pilla Feb 5, 2025
27cfd9d
add esm ref
ravi-kumar-pilla Feb 5, 2025
0485f45
add esm
ravi-kumar-pilla Feb 5, 2025
508beaa
test with esm
ravi-kumar-pilla Feb 6, 2025
fac1b3c
add esm draft
ravi-kumar-pilla Feb 6, 2025
ffe1657
add esm ref
ravi-kumar-pilla Feb 6, 2025
08f74e8
clean bundle config
ravi-kumar-pilla Feb 6, 2025
4cf8635
fix lint and format checks
ravi-kumar-pilla Feb 6, 2025
be2d9d8
temp remove gql checks
ravi-kumar-pilla Feb 6, 2025
450a695
fix lint
ravi-kumar-pilla Feb 6, 2025
6f4fcc3
fix lint
ravi-kumar-pilla Feb 7, 2025
cdc2d7a
fix tests
ravi-kumar-pilla Feb 7, 2025
847bc95
fix doc test
ravi-kumar-pilla Feb 7, 2025
6ff45ed
merge main
ravi-kumar-pilla Feb 10, 2025
1115c34
add granularity to notebook visualizer
ravi-kumar-pilla Feb 10, 2025
05ce8de
structured notebook visualizer
ravi-kumar-pilla Feb 11, 2025
fb73d92
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Feb 11, 2025
28a228f
updated js link
ravi-kumar-pilla Feb 11, 2025
884752c
fix lint
ravi-kumar-pilla Feb 11, 2025
67cfa5f
restore global navigation
ravi-kumar-pilla Feb 11, 2025
bb29abf
add default globalNavigation
ravi-kumar-pilla Feb 11, 2025
f956451
fix cache deprecation
ravi-kumar-pilla Feb 11, 2025
14c6c7a
fix based on comments
ravi-kumar-pilla Feb 11, 2025
8d79192
address PR comments
ravi-kumar-pilla Feb 12, 2025
b494af5
remove unused import
ravi-kumar-pilla Feb 12, 2025
8678052
remove test notebook
ravi-kumar-pilla Feb 12, 2025
84f1e07
fix lint
ravi-kumar-pilla Feb 12, 2025
e7b5239
address PR comments2
ravi-kumar-pilla Feb 13, 2025
414df2e
change generate_html
ravi-kumar-pilla Feb 14, 2025
2f09e76
fix broken doc links
ravi-kumar-pilla Feb 14, 2025
fdc523b
merge main
ravi-kumar-pilla Feb 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/install_node_dependencies/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ runs:
shell: bash

- name: Cache Node.js packages
uses: actions/cache@v2
uses: actions/cache@v4
with:
path: "${{ steps.npm-cache-dir.outputs.dir }}"
key: "${{ runner.os }}-node-${{ hashFiles(format('{0}/package-lock.json', inputs.package-path)) }}"
Expand Down
5 changes: 1 addition & 4 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,6 @@ jobs:

- name: Run security scan
run: make security-scan

- name: Verify GraphQL schema is up to date
run: make schema-check


- name: Run Python formatters and linters
run: make format-check lint-check
7 changes: 0 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,6 @@ lint-check:
mypy --config-file=package/mypy.ini package/kedro_viz package/features
mypy --disable-error-code abstract --config-file=package/mypy.ini package/tests

schema-fix:
strawberry export-schema --app-dir=package kedro_viz.api.graphql.schema > src/apollo/schema.graphql
graphqlviz src/apollo/schema.graphql | dot -Tpng -o .github/img/schema.graphql.png

schema-check:
strawberry export-schema --app-dir=package kedro_viz.api.graphql.schema | diff src/apollo/schema.graphql -

secret-scan:
trufflehog --max_depth 1 --exclude_path trufflehog-ignore.txt .

Expand Down
2 changes: 2 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@

## Major features and improvements

- Visualize pipeline objects in notebook. (#2241)

Check warning on line 12 in RELEASE.md

View workflow job for this annotation

GitHub Actions / vale

[vale] RELEASE.md#L12

[Kedro-viz.ukspelling] In general, use UK English spelling instead of 'Visualize'.
Raw output
{"message": "[Kedro-viz.ukspelling] In general, use UK English spelling instead of 'Visualize'.", "location": {"path": "RELEASE.md", "range": {"start": {"line": 12, "column": 4}}}, "severity": "WARNING"}

## Bug fixes and other changes

- Add ESM bundle for Kedro-Viz. (#2268)
Expand Down
18 changes: 9 additions & 9 deletions docs/source/migrate_experiment_tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,14 @@ Update the dataset configurations in your `catalog.yml` to transition to `kedro-

| Kedro-Viz Dataset Type | MLflow Dataset Type | Update Instructions |
|---------------------------------|----------------------------|---------------------------------------------------------|
| `tracking.MetricsDataset` | `MlflowMetricDataset` | Update type to [`MlflowMetricDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/31_API/kedro_mlflow.io.html#kedro_mlflow.io.metrics.mlflow_metric_dataset.MlflowMetricDataset). |
| `tracking.JSONDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/31_API/kedro_mlflow.io.html#kedro_mlflow.io.artifacts.mlflow_artifact_dataset.MlflowArtifactDataset) as `json.JSONDataset`. |
| `plotly.plotlyDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/31_API/kedro_mlflow.io.html#kedro_mlflow.io.artifacts.mlflow_artifact_dataset.MlflowArtifactDataset) as `plotly.HTMLDataset`. |
| `plotly.JSONDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/31_API/kedro_mlflow.io.html#kedro_mlflow.io.artifacts.mlflow_artifact_dataset.MlflowArtifactDataset) as `plotly.HTMLDataset`. |
| `matplotlib.MatplotlibWriter` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/31_API/kedro_mlflow.io.html#kedro_mlflow.io.artifacts.mlflow_artifact_dataset.MlflowArtifactDataset). |
| `tracking.MetricsDataset` | `MlflowMetricDataset` | Update type to [`MlflowMetricDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowmetricdataset). |
| `tracking.JSONDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) as `json.JSONDataset`. |
| `plotly.plotlyDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) as `plotly.HTMLDataset`. |
| `plotly.JSONDataset` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) as `plotly.HTMLDataset`. |
| `matplotlib.MatplotlibWriter` | `MlflowArtifactDataset` | Wrap within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset). |

### Metrics dataset
For `tracking.MetricsDataset`, update its type to [`MlflowMetricDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/31_API/kedro_mlflow.io.html#kedro_mlflow.io.metrics.mlflow_metric_dataset.MlflowMetricDataset):
For `tracking.MetricsDataset`, update its type to [`MlflowMetricDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowmetricdataset):

Before:
```yaml
Expand All @@ -65,7 +65,7 @@ metrics:
```

### JSON dataset
For `tracking.JSONDataset`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/31_API/kedro_mlflow.io.html#kedro_mlflow.io.artifacts.mlflow_artifact_dataset.MlflowArtifactDataset) and configure it as `json.JSONDataset`:
For `tracking.JSONDataset`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) and configure it as `json.JSONDataset`:

Before:
```yaml
Expand All @@ -85,7 +85,7 @@ companies_columns:
```

### Plotly dataset
For `plotly.plotlyDataset` and `plotly.JSONDataset`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/31_API/kedro_mlflow.io.html#kedro_mlflow.io.artifacts.mlflow_artifact_dataset.MlflowArtifactDataset) and configure it as `plotly.HTMLDataset` to render interactive plots in the MLflow UI:
For `plotly.plotlyDataset` and `plotly.JSONDataset`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset) and configure it as `plotly.HTMLDataset` to render interactive plots in the MLflow UI:

Before:
```yaml
Expand All @@ -104,7 +104,7 @@ plotly_json_data:
```

### Matplotlib writer
For `matplotlib.MatplotlibWriter`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/31_API/kedro_mlflow.io.html#kedro_mlflow.io.artifacts.mlflow_artifact_dataset.MlflowArtifactDataset):
For `matplotlib.MatplotlibWriter`, wrap it within [`MlflowArtifactDataset`](https://kedro-mlflow.readthedocs.io/en/stable/source/05_API/01_python_objects/01_Datasets.html#mlflowartifactdataset):

Before:
```yaml
Expand Down
8 changes: 8 additions & 0 deletions package/kedro_viz/data_access/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ class DataAccessManager:
"""Centralised interface for the rest of the application to interact with data repositories."""

def __init__(self):
self._initialize_fields()

def _initialize_fields(self):
"""Initialize or reset all instance variables."""
self.catalog = CatalogRepository()
self.nodes = GraphNodesRepository()
self.registered_pipelines = RegisteredPipelinesRepository()
Expand All @@ -72,6 +76,10 @@ def __init__(self):
self.tracking_datasets = TrackingDatasetsRepository()
self.dataset_stats = {}

def reset_fields(self):
"""Reset all instance variables."""
self._initialize_fields()

def set_db_session(self, db_session_class: sessionmaker):
"""Set db session on repositories that need it."""
self.runs.set_db_session(db_session_class)
Expand Down
4 changes: 4 additions & 0 deletions package/kedro_viz/integrations/notebook/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""`kedro_viz.integrations.notebook` provides interface to integrate Kedro-Viz with Notebook."""

# alias to ease Notebook visualization import
from .visualizer import NotebookVisualizer
54 changes: 54 additions & 0 deletions package/kedro_viz/integrations/notebook/data_loader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
"""`kedro_viz.integrations.notebook.data_loader` provides interface to
load data from a notebook. It takes care of making sure viz can
load data from pipelines created in a range of Kedro versions.
"""

from typing import Dict, Optional, Tuple, Union, cast

from kedro.framework.session.store import BaseSessionStore
from kedro.io import DataCatalog
from kedro.pipeline import Pipeline

from kedro_viz.data_access import data_access_manager
from kedro_viz.server import populate_data


def load_data_for_notebook_users(
notebook_pipeline: Union[Pipeline, Dict[str, Pipeline]],
notebook_catalog: Optional[DataCatalog],
) -> Tuple[DataCatalog, Dict[str, Pipeline], BaseSessionStore, Dict]:
"""Load data from a notebook user's pipeline"""
# Create a dummy data catalog with all datasets as memory datasets
catalog = DataCatalog() if notebook_catalog is None else notebook_catalog
session_store = None
stats_dict: Dict = {}

notebook_user_pipeline = notebook_pipeline

# create a default pipeline if a dictionary of pipelines are sent
if isinstance(notebook_user_pipeline, dict):
notebook_user_pipeline = {
"__default__": notebook_user_pipeline["__default__"]
if "__default__" in notebook_user_pipeline
else cast(Pipeline, sum(notebook_user_pipeline.values()))
}
else:
notebook_user_pipeline = {"__default__": notebook_user_pipeline}

return catalog, notebook_user_pipeline, session_store, stats_dict # type: ignore[return-value]


def load_and_populate_data_for_notebook_users(
notebook_pipeline: Union[Pipeline, Dict[str, Pipeline]],
notebook_catalog: Optional[DataCatalog],
):
"""Loads pipeline data and populates Kedro Viz Repositories for a notebook user"""
catalog, pipelines, session_store, stats_dict = load_data_for_notebook_users(
notebook_pipeline, notebook_catalog
)

# make each cell independent
data_access_manager.reset_fields()

# Creates data repositories which are used by Kedro Viz Backend APIs
populate_data(data_access_manager, catalog, pipelines, session_store, stats_dict)
177 changes: 177 additions & 0 deletions package/kedro_viz/integrations/notebook/visualizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
import json
import logging
import uuid
from contextlib import contextmanager
from typing import Any, Dict, Optional, Union

from IPython.display import HTML, display
from kedro.io.data_catalog import DataCatalog
from kedro.pipeline import Pipeline

from kedro_viz.api.rest.responses.pipelines import get_kedro_project_json_data
from kedro_viz.integrations.notebook.data_loader import (
load_and_populate_data_for_notebook_users,
)
from kedro_viz.utils import Spinner, merge_dicts

DEFAULT_VIZ_OPTIONS = {
"display": {
"expandPipelinesBtn": False,
"globalNavigation": False,
"exportBtn": False,
"labelBtn": False,
"layerBtn": False,
"metadataPanel": False,
"miniMap": False,
"sidebar": False,
"zoomToolbar": False,
},
"expandAllPipelines": False,
"behaviour": {
"reFocus": False,
},
"theme": "dark",
"width": "100%",
"height": "600px",
}

DEFAULT_JS_URL = (
"https://cdn.jsdelivr.net/gh/kedro-org/kedro-viz@main/esm/kedro-viz.production.mjs"
)


class NotebookVisualizer:
"""Represent a Kedro-Viz visualization instance in a notebook"""

def __init__(
self,
pipeline: Union[Pipeline, Dict[str, Pipeline]],
catalog: Optional[DataCatalog] = None,
options: Optional[Dict[str, Any]] = None,
js_url: Optional[str] = None,
):
"""
Initialize NotebookVisualizer.

Args:
pipeline: Kedro pipeline(s) to visualize.
catalog: Kedro data catalog.
options: Visualization options.
(Ref: https://github.com/kedro-org/kedro-viz/blob/main/README.npm.md#configure-kedro-viz-with-options)
js_url: Optional URL for the Kedro-Viz JS bundle.

Returns:
A new ``NotebookVisualizer`` instance.
"""
self.pipeline = pipeline
self.catalog = catalog
self.options = (
DEFAULT_VIZ_OPTIONS
if options is None
else merge_dicts(DEFAULT_VIZ_OPTIONS, options)
)
# Force `globalNavigation` to always be False as it
# breaks visualizer due to security concerns
self.options.setdefault("display", {})["globalNavigation"] = False # type: ignore

self.js_url = js_url or DEFAULT_JS_URL

def _load_viz_data(self) -> Optional[Any]:
"""Load pipeline and catalog data for visualization."""
load_and_populate_data_for_notebook_users(self.pipeline, self.catalog)
return get_kedro_project_json_data()

def generate_html(self) -> str:
"""Generate HTML markup for Kedro-Viz as a string."""
unique_id = uuid.uuid4().hex[:8] # To isolate container for each cell execution
json_data_str = json.dumps(self._load_viz_data())
options_str = json.dumps(self.options)

html_content = (
r"""<!DOCTYPE html>
<html lang='en'>
<head>
<meta charset='UTF-8'>
<meta name='viewport' content='width=device-width, initial-scale=1.0'>
<title>Kedro-Viz</title>
</head>
<body>
<div id=kedro-viz-"""
+ unique_id
+ """ style='height: 600px'></div>
<script type="module">
import { KedroViz, React, createRoot } from '"""
+ self.js_url
+ """';
const viz_container = document.getElementById('kedro-viz-"""
+ unique_id
+ """');

if (createRoot && viz_container) {
const viz_root = createRoot(viz_container);
viz_root.render(
React.createElement(KedroViz, {
data: """
+ json_data_str
+ """,
options: """
+ options_str
+ """
})
);
}
</script>
</body>
</html>"""
)

return html_content

@staticmethod
def _wrap_in_iframe(
html_content: str,
width: str = str(DEFAULT_VIZ_OPTIONS.get("width", "")),
height: str = str(DEFAULT_VIZ_OPTIONS.get("height", "")),
) -> str:
"""Wrap the HTML content in an iframe.

Args:
html_content: The HTML markup template as a string for visualization
width: iframe width
height: iframe height

Returns:
A string containing html markup embedded in an iframe
"""
sanitized_content = html_content.replace('"', "&quot;")
return f"""<iframe srcdoc="{sanitized_content}" style="width:{width}; height:{height}; border:none;" sandbox="allow-scripts"></iframe>"""

@staticmethod
@contextmanager
def _suppress_logs():
logger = logging.getLogger()
previous_level = logger.level
logger.setLevel(logging.CRITICAL) # Suppress logs
try:
yield
finally:
logger.setLevel(previous_level) # Restore the original level

def show(self) -> None:
"""Display Kedro-Viz in a notebook."""
with self._suppress_logs():
try:
spinner = Spinner("Starting Kedro-Viz...")
spinner.start()

html_content = self.generate_html()
iframe_content = self._wrap_in_iframe(
html_content,
str(self.options.get("width", "100%")),
str(self.options.get("height", "600px")),
)
spinner.stop()
display(HTML(iframe_content))
except Exception as exc: # noqa: BLE001
spinner.stop()
display(HTML(f"<strong>Error: {str(exc)}</strong>"))
Loading