Replies: 1 comment 1 reply
-
As we have been using freshness based automation since at least 1.2 (reconciliation sensors), I highly support this part, since automation is one of the two primary benefits of freshness. The second being Freshness based observability, which has been a big component of the benefit of Dagster's UI for our stakeholders. Without seeing the freshness status at a glance nor being able to schedule based off it, it would be a big downgrade. Mainly, Freshness Policy allows a declarative mapping from a data contract, i.e. some promise about asset freshness written in code and version control, to its scheduling and details on the asset page. So apart from automation, the bigger aspect is solving this translation from data contract to the scheduling and asset overview for data assets, which was quite natural with the freshness policy. (Think about FreshnessPolicy from a grab bag of dbt and python assets from multiple code locations, being translated and effortlessly scheduling themselves, while providing an overview of asset freshness and status at a glass. That is what FreshnessPolicy and automation based on it has easily allowed. This functionality would be excellent to carry forward or improve). The functionality of Freshness based checks could already be created based on the Freshness Policy, even before the explicit introduction of freshness checks (or even asset checks to be fair). The "check" seem like a side functionality compared to the above uses of automated scheduling and observability. I would request that the big picture view of freshness as a promise made by the asset be maintained with the new implementation. It should ideally be easy to assign and use this information for scheduling and in the UI, as this the key metric for most assets when building up observability. Checks on data quality are secondary to the data being there or not. (Completely agree the FreshnessPolicy was opaque and hard to grasp. Glad to see a rework, but worried about wholesale tossing out without having some of the core functionality.) |
Beta Was this translation helpful? Give feedback.
-
As of 1.7, the
FreshnessPolicy
API has been deprecated in favor of two new asset check factories which check the freshness of passed-in assets. The following will serve as a guide to perform the necessary migration to the new APIs, as well as answer questions about why we’re making this change and what it entails.A freshness policy on a time-window partitioned asset.
If you have a freshness policy specified on a time-window partitioned asset, you should add a freshness check defined by the
build_time_partition_freshness_checks
API.This API functions differently than the original FreshnessPolicy API in a few ways.
First, a time-window partitioned asset with a freshness policy was considered fresh if ALL completed partitions before the deadline have been materialized, whereas for a time partition freshness check, the asset is considered fresh if the most recent partition before the deadline has arrived, irrespective of earlier partitions.
Second, the freshness policy API has two different parameters; a
cron_schedule
, which defines when we should start check for the existence of an asset; andmaximum_lag_minutes
, which defines the latest partition we should be searching for. For example, acron_schedule
of"0 9 * * *"
and amaximum_lag_minutes
of 540 would mean that we expect all partitions ending before midnight of the previous day are fresh. In contrast, the freshness check will derive which partition to search for from the cron schedule directly, instead of requiring a lag parameter to also be set. So, simply specifying acron_schedule
of"0 9 * * *"
, the check will infer that the partition we expect is the one ending on midnight yesterday, since it’s the earliest partition to finish before 9am today.For the given invocation of a
FreshnessPolicy
, which expects a time-partitioned asset to be fresh by 9am daily:You could instead construct an asset and freshness check like so:
A freshness policy on a non-time-partitioned asset.
We define a non-time-partitioned asset as one that either has no partitions, or has a partitions definition that does not derive from
TimeWindowPartitionsDefinition
. There are two different parameterizations possible for a non-time-partitioned asset.The first is to have a
freshness_cron
and amaximum_lag_minutes
parameter both defined.this is directly analogous to having a freshness check defined by the
build_last_update_freshness_checks
API, but thelower_bound_delta
parameter replacesmaximum_lag_minutes
, anddeadline_cron
replacescron_schedule
.Second, you can have a
maximum_lag_minutes
parameterization only.which is directly analogous to only setting
lower_bound_delta
on the freshness check.F.A.Q.
Beta Was this translation helpful? Give feedback.
All reactions