Skip to content

Commit

Permalink
[1.9] Update references to AutoMaterializePolicy in docs (#25520)
Browse files Browse the repository at this point in the history
## Summary & Motivation

Removes / updates any references to AutoMaterializePolicy in the docs

## How I Tested These Changes

## Changelog

> Insert changelog entry or delete this section.
  • Loading branch information
OwenKephart authored Oct 24, 2024
1 parent bb2fe45 commit 6b07905
Show file tree
Hide file tree
Showing 17 changed files with 12 additions and 264 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,6 @@ When `blocking` is enabled, downstream assets will wait to execute until the che
This feature has the following limitations:

- **`blocking` is currently only supported with <PyObject object="asset_check" decorator />.** [For checks defined in the same operation as assets](#defining-checks-and-assets-together), you can explicitly raise an exception to block downstream execution.
- **Assets with an <PyObject object="AutoMaterializePolicy" /> currently do not respect blocking asset checks** and will execute even if a blocking check on an upstream asset failed.

---

Expand Down
2 changes: 1 addition & 1 deletion docs/content/concepts/assets/asset-observations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ observed to have a newer data version than the data version it had when a
downstream asset was materialized, then the downstream asset will be given a
label in the Dagster UI that indicates that upstream data has changed.

<PyObject object="AutoMaterializePolicy" pluralize /> can be used to automatically
<PyObject object="AutomationCondition" pluralize /> can be used to automatically
materialize downstream assets when this occurs.

The <PyObject object="observable_source_asset" /> decorator provides a convenient way to define source assets with observation functions. The below observable source asset takes a file hash and returns it as the data version. Every time you run the observation function, a new observation will be generated with this hash set as its data version.
Expand Down
50 changes: 6 additions & 44 deletions docs/content/guides/dagster/managing-ml.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Machine learning models are highly dependent on data at a point in time and must

## Prerequisites

Before proceeding, it is recommended to review [Building machine learning pipelines with Dagster ](/guides/dagster/ml-pipeline) which provides background on using Dagster's assets for machine learning.
Before proceeding, it is recommended to review [Building machine learning pipelines with Dagster](/guides/dagster/ml-pipeline) which provides background on using Dagster's assets for machine learning.

---

Expand All @@ -28,23 +28,21 @@ You might have thought about your data sources, feature sets, and the best model

Whether you have a large or small model, Dagster can help automate data refreshes and model training based on your business needs.

Auto-materializing assets can be used to update a machine learning model when the upstream data is updated. This can be done by setting the `AutoMaterializePolicy` to `eager`, which means that our machine learning model asset will be refreshed anytime our data asset is updated.
Declarative Automation can be used to update a machine learning model when the upstream data is updated. This can be done by setting the `AutomationCondition` to `eager`, which means that our machine learning model asset will be refreshed anytime our data asset is updated.

```python file=/guides/dagster/managing_ml/managing_ml_code.py startafter=eager_materilization_start endbefore=eager_materilization_end
from dagster import AutoMaterializePolicy, asset
from dagster import AutomationCondition, asset


@asset
def my_data(): ...


@asset(
auto_materialize_policy=AutoMaterializePolicy.eager(),
)
@asset(automation_condition=AutomationCondition.eager())
def my_ml_model(my_data): ...
```

Some machine learning models might more be cumbersome to retrain; it also might be less important to update them as soon as new data arrives. For this, a lazy auto-materialization policy which can be used in two different ways. The first, by using it with a `freshness_policy` as shown below. In this case, `my_ml_model` will only be auto-materialized once a week.
Some machine learning models might be more cumbersome to retrain; it also might be less important to update them as soon as new data arrives. For this, the `on_cron` condition may be used, which will cause the asset to be updated on a given cron schedule, but only after all of its upstream dependencies have been updated.

```python file=/guides/dagster/managing_ml/managing_ml_code.py startafter=lazy_materlization_start endbefore=lazy_materlization_end
from dagster import AutoMaterializePolicy, asset, FreshnessPolicy
Expand All @@ -54,46 +52,10 @@ from dagster import AutoMaterializePolicy, asset, FreshnessPolicy
def my_other_data(): ...


@asset(
auto_materialize_policy=AutoMaterializePolicy.lazy(),
freshness_policy=FreshnessPolicy(maximum_lag_minutes=7 * 24 * 60),
)
@asset(automation_condition=AutomationCondition.on_cron("0 9 * * *"))
def my_other_ml_model(my_other_data): ...
```

This can be useful if you know that you want your machine learning model retrained at least once a week. While Dagster allows you to refresh a machine learning model as often as you like, best practice is to re-train as seldomly as possible. Model retraining can be costly to compute and having a minimal number of model versions can reduce the complexity of reproducing results at a later point in time. In this case, the model is updated once a week for `predictions`, ensuring that `my_ml_model` is retrained before it is used.

```python file=/guides/dagster/managing_ml/managing_ml_code.py startafter=without_policy_start endbefore=without_policy_end
from dagster import AutoMaterializePolicy, FreshnessPolicy, asset


@asset
def some_data(): ...


@asset(auto_materialize_policy=AutoMaterializePolicy.lazy())
def some_ml_model(some_data): ...


@asset(
auto_materialize_policy=AutoMaterializePolicy.lazy(),
freshness_policy=FreshnessPolicy(maximum_lag_minutes=7 * 24 * 60),
)
def predictions(some_ml_model): ...
```

A more traditional schedule can also be used to update machine learning assets, causing them to be re-materialized or retrained on the latest data. For example, setting up a [cron schedule on a daily basis](/concepts/automation/schedules).

This can be useful if you have data that is also being scheduled on a cron schedule and want to add your machine model jobs to run on a schedule as well.

```python file=/guides/dagster/managing_ml/managing_ml_code.py startafter=basic_schedule_start endbefore=basic_schedule_end
from dagster import AssetSelection, define_asset_job, ScheduleDefinition

ml_asset_job = define_asset_job("ml_asset_job", AssetSelection.groups("ml_asset_group"))

basic_schedule = ScheduleDefinition(job=ml_asset_job, cron_schedule="0 9 * * *")
```

### Monitoring

Integrating your machine learning models into Dagster allows you to see when the model and its data dependencies were refreshed, or when a refresh process has failed. By using Dagster to monitor performance changes and process failures on your ML model, it becomes possible to set up remediation paths, such as automated model retraining, that can help resolve issues like model drift.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ def missing_dimension_check(duckdb: DuckDBResource) -> dg.AssetCheckResult:
compute_kind="duckdb",
group_name="analysis",
deps=[joined_data],
auto_materialize_policy=dg.AutoMaterializePolicy.eager(),
automation_condition=dg.AutomationCondition.eager(),
)
def monthly_sales_performance(
context: dg.AssetExecutionContext, duckdb: DuckDBResource
Expand Down Expand Up @@ -237,7 +237,7 @@ def monthly_sales_performance(
partitions_def=product_category_partition,
group_name="analysis",
compute_kind="duckdb",
auto_materialize_policy=dg.AutoMaterializePolicy.eager(),
automation_condition=dg.AutomationCondition.eager(),
)
def product_performance(context: dg.AssetExecutionContext, duckdb: DuckDBResource):
product_category_str = context.partition_key
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,14 @@

## eager_materilization_start

from dagster import AutoMaterializePolicy, asset
from dagster import AutomationCondition, asset


@asset
def my_data(): ...


@asset(
auto_materialize_policy=AutoMaterializePolicy.eager(),
)
@asset(automation_condition=AutomationCondition.eager())
def my_ml_model(my_data): ...


Expand All @@ -26,47 +24,13 @@ def my_ml_model(my_data): ...
def my_other_data(): ...


@asset(
auto_materialize_policy=AutoMaterializePolicy.lazy(),
freshness_policy=FreshnessPolicy(maximum_lag_minutes=7 * 24 * 60),
)
@asset(automation_condition=AutomationCondition.on_cron("0 9 * * *"))
def my_other_ml_model(my_other_data): ...


## lazy_materlization_end


## without_policy_start
from dagster import AutoMaterializePolicy, FreshnessPolicy, asset


@asset
def some_data(): ...


@asset(auto_materialize_policy=AutoMaterializePolicy.lazy())
def some_ml_model(some_data): ...


@asset(
auto_materialize_policy=AutoMaterializePolicy.lazy(),
freshness_policy=FreshnessPolicy(maximum_lag_minutes=7 * 24 * 60),
)
def predictions(some_ml_model): ...


## without_policy_end

## basic_schedule_start

from dagster import AssetSelection, define_asset_job, ScheduleDefinition

ml_asset_job = define_asset_job("ml_asset_job", AssetSelection.groups("ml_asset_group"))

basic_schedule = ScheduleDefinition(job=ml_asset_job, cron_schedule="0 9 * * *")

## basic_schedule_end

## conditional_monitoring_start

from sklearn import linear_model
Expand Down

This file was deleted.

2 comments on commit 6b07905

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster-docs ready!

✅ Preview
https://dagster-docs-pbtglgpzq-elementl.vercel.app
https://master.dagster.dagster-docs.io

Built with commit 6b07905.
This pull request is being automatically deployed with vercel-action

@github-actions
Copy link

@github-actions github-actions bot commented on 6b07905 Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster-docs-beta ready!

✅ Preview
https://dagster-docs-beta-64qpqh2hv-elementl.vercel.app

Built with commit 6b07905.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.