Releases: dagster-io/dagster
1.4.5 / 0.20.5 (libraries)
New
@graph_asset
now takes aconfig
parameter equivalent to the parameter on@graph
.- Added an optional
dynamic_partitions_store
argument toDynamicPartitionsDefinition
for multi-partition run properly with dynamic partitions (Thanks @elzzz!). - [dagster-grpahql] Added
partitionsByAssets`` to
backfillParams`` for ranged partition backfill (Thanks @ruizh22!). - [dagster-dbt] Support for
dbt-core==1.6
has been added. - [dagster-dbt]
DbtCliResource
now supports configuringprofiles_dir
. - [dagster-k8s] Allow specifying
restart_policy
onk8s_job_op
(Thanks @Taadas!). - [dagster-snowflake] Added
authenticator
toSnowflakePandasIOManager
, which allows specifying the authentication mechanism to use (Thanks @pengw0048!).
Bugfixes
- In some situations, multiple materializations of the same asset could be kicked off when using a lazy
AutoMaterializePolicy
with assets that had at least one source asset parent and at least one non-source asset parent. This has been fixed. - After applying an eager
AutoMaterializePolicy
to a time-partitioned asset downstream of an unpartitioned asset, the latest partition would only ever be materialized a single time, rather than updating in response to any parent updates. This has been fixed. - Fixed an issue that would cause the creation of a
StaticPartitionsDefinition
containing many thousands of partitions could take a significant amount of time. - The run coordinator daemon now uses a fresh request context on each iteration, fixing an issue where stale grpc server references could be used in certain high volume conditions.
- Automatically generated data versions for partitioned assets now correctly reflect the data versions of upstream partitions. Previously, they were computed using the data versions from the most recent materializations of upstream assets regardless of partition.
- [dagster-airbyte] Previously, attempting to load assets from an Airbyte instance in which some of the tables had hyphens in their name would result in an error. This has been fixed.
- [dagster-dbt] Previously, attempting to load assets from a dbt project in which some of the models had hyphens in their name would result in an error. This has been fixed.
- [dagstermill] Fixed a bug where known state for executing dagstermill ops was not correctly passed in (Thanks @motuzov!).
Documentation
- Added the starter project’s template for Dagster University.
- Fixed an incorrect method name in DagsterDbtTranslator Docs (Thanks @akan72!).
Dagster Cloud
- When importing a dbt project on the Dagster Cloud setup page, an
Unexpected exception
error would be raised when scaffolding a pull request on a repository with noprofiles.yml
. This behavior has been updated to raise a more descriptive error message on the repo selection page. - The running multiple agents guide has been revamped to discuss running agent replicas and zero-downtime deployment of the agent.
- The
agentReplicas
config setting on the helm chart has been renamed toisolatedAgents
. In order to use this config setting, your user code dagster version needs to be1.4.3
or greater.
1.4.4 /0.20.4 (libraries)
New
- [ui] When viewing a run for auto-materialized assets, show a tag with information about the assets that were materialized.
- [ui] In the Auto-materialize History view, when one or more of an asset’s parents have been updated, the set of updated parents will be viewable.
- [ui] Link to the auto-materialized history for an asset from the asset DAG view.
- [ui] For runs that were the result of auto-observation, show a tag for this in the Runs list view.
- Added warnings for storage incompatibility with the experimental global op concurrency.
Bugfixes
- [dagster-dbt] Fixed an issue where
dagster-dbt project scaffold
didn’t create a project directory with all the scaffolded files. - Fixed an issue which could cause errors when using the
SpecificPartitionsPartitionMapping
with auto-materialization.
Breaking Change
- Previously, it was possible to set
max_materializations_per_minute
on anAutoMaterializePolicy
to a non-positive number. This will now result in an error.
Community Contributions
- Fix for loading multipartitions paths in
upath_io_manager
from @harrylojames; thank you! - Docs typo fix from @C0DK; thank you!
Documentation
- Revamped the dagster-dbt tutorial to take advantage of
dagster project scaffold
and the new dagster-dbt APIs.
1.4.3 / 0.20.3 (libraries)
New
- [dagster-dbt] When invoking
dagster-dbt project scaffold
on a dbt project directory, if aprofiles.yml
exists in the root of the directory, its contents are used to add dbt adapter packages to the scaffoldedsetup.py
. - The default sentinel value for the multiprocessing executor’s
max_concurrent
field has been changed from0
toNone
to more clearly signal its intent. A value of0
is still interpreted as the sentinel value which dynamically allocatesmax_concurrent
based on detected CPU count.
Bugfixes
- IO managers defined on jobs will now be properly merged with resources defined in
Definitions
, so that jobs are able to override the IO manager used. - [dagster-fivetran] Fixed an issue where
EnvVars
in aFivetranResource
would not be evaluated when loading assets from the Fivetran instance. - [dagster-airbyte] Fixed an issue where
EnvVars
in anAirbyteResource
would not be evaluated when loading assets from the Airbyte resource.
Documentation
- [dagster-dbt] Added API docs for
DbtCliResource
,DbtCliInvocation
,@dbt_assets
,DagsterDbtTranslator
,dagster-dbt project scaffold
- [dagster-dbt] Expanded references for new APIs:
- Added documentation to customize asset definition attributes for dbt assets
- Added documentation to define upstream and downstream dependencies to dbt assets
- Added documentation to define schedules for dbt assets
1.4.2 / 0.20.2 (libraries)
Bugfixes
- Fixes a bug in
dagster-dbt
that was preventing it from correctly materializing subselections of dbt asset.
1.4.1 / 0.20.1 (libraries)
Bugfixes
- Fixes a bug in
dagster-dbt
that was preventing it efficiently loading dbt projects from a manifest.
1.4.0 / 0.20.0 (libraries) "Material Girl"
Major Changes since 1.3.0 (core) / 0.19.0 (libraries)
Core
- Auto-materialize history – We’ve added a UI that tracks why assets were or were not materialized according to their
AutoMaterializePolicy
. It’s located underAssets
→ Select an asset with anAutoMaterializePolicy
→Auto-materialize history
tab. - Auto-materialize performance – We’ve made significant performance improvements to the Asset Daemon, allowing it to keep up with asset graphs containing thousands of assets and assets with a large history of previously-materialized partitions.
- Asset backfill cancellation — Asset backfills can now be canceled, bring them to parity with job backfills. When an asset backfill is requested for cancellation, the daemon cancels runs until all runs are terminated, then marks the backfill as “canceled”.
- non_argument_deps → deps – We’ve deprecated the
non_argument_deps
parameter of@asset
and@multi_asset
in favor of a newdeps
parameter. The new parameter makes it clear that this is a first-class way of defining dependencies, makes code more concise, and acceptsAssetsDefinition
andSourceAsset
objects, in addition to thestr
s andAssetKey
s that the previous parameter accepted. - Group-level asset status UI – the new Assets Overview dashboard, located underneath the Activity tab of the Overview page, shows the status all the assets in your deployment, rolled up by group.
- Op concurrency (experimental) — We’ve added a feature that allows limiting the number of concurrently executing ops across runs. [docs]
DynamicPartitionsDefinition
andSensorResult
are no longer marked experimental.- Automatically observe source assets, without defining jobs (experimental) – The
@observable_source_asset
decorator now accepts anauto_observe_interval_minutes
parameter. If the asset daemon is turned on, then the observation function will automatically be run at this interval. Downstream assets with eager auto-materialize policies will automatically run if the observation function indicates that the source asset has changed. [docs] - Dagit → Dagster UI – To reduce the number of Dagster-specific terms that new users need to learn when learning Dagster, “Dagit” has been renamed to the “The Dagster UI”. The
dagit
package is deprecated in favor of thedagster-webserver
package. - Default config in the Launchpad - When you open the launchpad to kick off a job or asset materialization, Dagster will now automatically populate the default values for each field.
dagster-dbt
- The new
@dbt_assets
decorator allows much more control over how Dagster runs your dbt project. [docs] - The new
dagster-dbt project scaffold
command line interface makes it easy to create files and directories for a Dagster project that wraps an existing dbt project. - Improved APIs for defining asset dependencies – The new
get_asset_key_for_model
andget_asset_key_for_source
utilities make it easy to specify dependencies between upstream dbt assets and downstream non-dbt assets. And you can now more easily specify dependencies between dbt models and upstream non-dbt assets by specifying Dagster asset keys in the dbt metadata for dbt sources.
Since 1.3.14 (core) / 0.19.14 (libraries)
New
- The published Dagster Docker images now use Python 3.10, instead of 3.7.
- We’ve deprecated the
non_argument_deps
parameter of@asset
and@multi_asset
in favor of a newdeps
parameter. The new parameter makes it clear that this is a first-class way of defining dependencies, makes code more concise, and acceptsAssetsDefinition
andSourceAsset
objects, in addition to thestr
s andAssetKey
s that the previous parameter accepted. - The
UPathIOManager
can now be extended to load multiple partitions asynchronously (Thanks Daniel Gafni!). - By default, Dagster will now automatically load default config values into the launchpad. This behavior can be disabled in the user settings page.
- [dagster-k8s] The Helm chart now sets readiness probes on user code deployment servers by default. These can be disabled with
dagster-user-deployments.deployments.[...].readinessProbe.enabled=false
. - [dagster-airbyte] In line with the deprecation of
non_argument_deps
in favor ofdeps
,build_airbyte_assets
now accepts adeps
parameter. - [dagstermill] In line with the deprecation of
non_argument_deps
in favor ofdeps
,define_dagstermill_asset
now accepts adeps
parameter.
Bugfixes
- Duplicate partition keys passed to
StaticPartitionsDefinition
will now raise an error. - Fixed a bug that caused lazy
AutoMaterializePolicy
's to not materialize missing assets. - [ui] Fixed an issue where global search and large DAGs were broken when using
--path-prefix
. - Schedule and sensor run submissions are now kept up to date with the current workspace, fixing an issue where a stale reference to a server would be used in some conditions.
Breaking Changes
- Support for Python 3.7 has been dropped.
build_asset_reconciliation_sensor
(Experimental) has been removed. It was deprecated in 1.3 in favor ofAutoMaterializePolicy
.asset_key(s)
properties onAssetIn
andAssetDefinition
have been removed in favor ofkey(s)
. These APIs were deprecated in 1.0.root_input_manager
andRootInputManagerDefinition
have been removed in favor ofinput_manager
andInputManagerDefinition
. These APIs were deprecated in 1.0.- [dagster-pandas] The
event_metadata_fn
parameter oncreate_dagster_pandas_dataframe_type
has been removed in favor ofmetadata_fn
. - [dagster-dbt] The library has been substantially revamped to support the new
@dbt_assets
andDbtCliResource
. See the migration guide for details.- Group names for dbt assets are now taken from a dbt model's group. Before, group names were determined using the model's subdirectory path.
- Support for
dbt-rpc
has been removed. - The class alias
DbtCloudResourceV2
has been removed. DbtCli
has been renamed toDbtCliResource
. Previously,DbtCliResource
was a class alias forDbtCliClientResource
.load_assets_from_dbt_project
andload_assets_from_dbt_manifest
now default touse_build=True
.- The default assignment of groups to dbt models loaded from
load_assets_from_dbt_project
andload_assets_from_dbt_manifest
has changed. Rather than assigning a group name using the model’s subdirectory, a group name will be assigned using the dbt model’s dbt group. - The argument
node_info_to_definition_metadata_fn
forload_assets_from_dbt_project
andload_assets_from_dbt_manifest
now overrides metadata instead of adding to it. - The arguments for
load_assets_from_dbt_project
andload_assets_from_dbt_manifest
now must be specified using keyword arguments. - When using the new
DbtCliResource
withload_assets_from_dbt_project
andload_assets_from_dbt_manifest
, stdout logs from the dbt process will now appear in the compute logs instead of the event logs.
Deprecations
- The
dagit
python package is deprecated and will be removed in 2.0 in favor ofdagster-webserver
. See the migration guide for details. - The following fields containing “dagit” in the Dagster helm chart schema have been deprecated in favor of “dagsterWebserver” equivalents (see migration guide for details):
dagit
→dagsterWebserver
ingress.dagit
→ingress.dagsterWebserver
ingress.readOnlyDagit
→ingress.readOnlyDagsterWebserver
- [Dagster Cloud ECS Agent] We've introduced performance improvements that rely on the AWS Resource Groups Tagging API. To enable, grant your agent's IAM policy permission to
tag:DescribeResources
. Without this policy, the ECS Agent will log a deprecation warning and fall back to its old behavior (listing all ECS services in the cluster and then listing each service's tags). DbtCliClientResource
,dbt_cli_resource
andDbtCliOutput
are now being deprecated in favor ofDbtCliResource
.- A number of arguments on
load_assets_from_dbt_project
andload_assets_from_dbt_manifest
are now deprecated in favor of other options. See the migration for details.
Community Contributions
- Docs typo fix from @chodera, thank you!
- Run request docstring fix from @Jinior, thank you!
Documentation
- All public methods in the Dagster API now have docstrings.
- The entirety of the documentation has been updated to now refer to the “Dagster webserver” or “Dagster UI” where “Dagit” was previously used for both entities.
1.3.14 (core) / 0.19.14 (libraries)
New
DynamicPartitionsDefinition
andSensorResult
are no longer marked experimentalDagsterInstance
now has aget_status_by_partition
method, which returns the status of each partition for a given asset. Thanks renzhe-brian!DagsterInstance
now has aget_latest_materialization_code_versions
method, which returns the code version of the latest materialization for each of the provided (non-partitioned) assets.- The error message for when an asset illegally depends on itself is now more informative.
- Further performance improvements for the Asset Daemon.
- Performance improvements in the asset graph view for large asset graphs.
- Pandas 2.x is now supported in all dagster packages.
build_asset_context
has been added as an asset focused replacement forbuild_op_context
.build_op_context
now accepts apartition_key_range
parameter.- New
AssetSelection.upstream_source_assets
method allows selecting source assets upstream of the current selection. AssetSelection.key_prefixes
andAssetSelection.groups
now accept an optionalinclude_sources
parameter.- The AutoMaterialize evaluations UI now provides more details about partitions and waiting on upstream assets.
- [dbt] The
DbtCli
resource is no longer marked experimental. - [dbt] The
global_config
parameter of theDbtCli
resource has been renamed toglobal_config_flags
- [dbt]
load_assets_from_dbt_project
andload_assets_from_dbt_manifest
now work with theDbtCli
resource. - [dbt] The
manifest
argument of the@dbt_assets
decorator now additionally can accept aPath
argument representing a path to the manifest file or dictionary argument representing the raw manifest blob. - [dbt] When invoking
DbtCli.cli
from inside a@dbt_assets
-decorated function, you no longer need to supply the manifest argument as long as you provide the context argument. - [dbt] The
DbtManifest
object can now generate schedules using dbt selection syntax.
dbt_manifest.build_schedule(
job_name="materialize_dbt_models",
cron_schedule="0 0 * * *",
dbt_select="fqn:*"
)
- [dbt] When invoking
DbtCli.cli
and the underlying command fails, an exception will now be raised. To suppress the exception, run theDbtCli.cli(..., raise_on_error=False
). - [ui] You can now alphabetically sort your partitions on the asset partitions page
- [ui] A button in the “Run is materializing this asset” and “Run failed to materialize this asset” banners provides direct access to the relevant run logs
Bugfixes
- Fixed a bug that caused asset metadata to not be available available on the
OutputContext
when usingwith_attributes
orAssetsDefinition.from_graph
. - Previously, if a partitioned asset at the root of the graph had more missing partitions than its AutoMaterializePolicy’s
max_materializations_per_minute
parameter, those older partitions would not be properly discarded from consideration on subsequent ticks. This has been fixed. - Fixed a bug that caused AutoMaterializePolicy.lazy() to not materialize missing assets that were downstream of assets without an AutoMaterializePolicy.
- In rare cases, the AssetDaemon could hit an exception when using a combination of freshness policies and observable source assets. This has been fixed.
- Previously, string type annotations (most commonly via modules containing
from __future__ import annotations
) would cause errors in most cases when used with Dagster definitions. This has been fixed for the vast majority of cases. AssetExecutionContext
has returned to being a type alias forOpExecutionContext
.- [ui] Date filtering on the runs page now takes your timezone into consideration
- [ui] Fixed a bug where selecting partitions in the launchpad dialog cleared out your configuration
- [ui] In the run Gantt chart, executed steps that follow skipped steps no longer render off the far right of the visualization.
- [ui] Cancelling a running backfill no longer makes canceled partitions un-selectable on the job partitions page and backfill modal, and cancellation is shown in gray instead of red.
Breaking Changes
- [experimental] The internal
time_window_partition_scope_minutes
parameter of theAutoMaterializePolicy
class has been removed. Instead,max_materializations_per_minute
should be used to limit the number of runs that may be kicked off for a partitioned asset.
Deprecations
- [dbt]
DbtCliResource
has been deprecated in favor ofDbtCli
. - The python package
dagit
has been deprecated in favor of a new packagedagster-webserver
. OpExecutionContext.asset_partition_key_range
has been deprecated in favor ofpartition_key_range
.
Community Contributions
- The
databricks_pyspark_step_launcher
will no longer error when executing steps that target a single partition of aDynamicPartitionsDefinition
(thanks @weberdavid!). - Increased timeout on readinessProbe for example user code images, which prevents breakages in certain scenarios (thanks @leehuwuj)!
- Avoid creation of erroneous local directories by GCS IO manager (thanks @peterjclaw)!
- Fixed typo in intro docs (thanks @adeboyed)!
- Fix typo in bigquery docs (thanks @nigelainscoe)!
- Fix typing on run tag validation (thanks @yuvalgimmunai)!
- Allow passing repositoryCredentials arn as config to ecs run launcher (thanks @armandobelardo)!
Experimental
- The
@observable_source_asset
decorator now accepts anauto_observe_interval_minutes
parameter. If the asset daemon is turned on, then the observation function will automatically be run at this interval. - [dbt]
DbtCliTask
has been renamed toDbtCliInvocation
- [dbt] The
get_asset_key_by_output_name
andget_node_info_by_output_name
methods ofDbtManifest
have been renamed toget_asset_key_for_output_name
andget_node_info_for_output_name
, respectively. - [ui] A new feature flag allows you to switch Asset DAG rendering to a tighter horizontal layout, which may be preferable in some scenarios
Documentation
- Many public methods that were missing in the API docs are now documented. Updated classes include
DagsterInstance
,*MetadataValue
,DagsterType
, and others. dagster-pandera
now has an API docs page.- Deprecated methods in the API docs now are marked with a special badge.
1.3.13 (core) / 0.19.13 (libraries)
Bugfixes
- Fixes a bug in
dagster project from-example
that was preventing it from downloading examples correctly.
1.3.12 (core) / 0.19.12 (libraries)
New
- The
--name
argument is now optional when runningdagster project from-example
. - An asset key can now be directly specified via the asset decorator:
@asset(key=...)
. AssetKey
now has awith_prefix
method.- Significant performance improvements when using
AutoMaterializePolicy
s with large numbers of partitions. dagster instance migrate
now prints information about changes to the instance database schema.- The
dagster-cloud-agent
helm chart now supports setting K8s labels on the agent deployment. - [ui] Step compute logs are shown under “Last Materialization” in the asset sidebar.
- [ui] Truncated asset names now show a tooltip when hovered in the asset graph.
- [ui] The “Propagate changes” button has been removed and replaced with “Materialize Stale and Missing” (which was the “Propagate changes” predecessor).
Bugfixes
-
[ui] Fixed an issue that prevented filtering by date on the job-specific runs tab.
-
[ui] “F” key with modifiers (alt, ctrl, cmd, shift) no longer toggles the filter menu on pages that support filtering.
-
[ui] Fix empty states on Runs table view for individual jobs, to provide links to materialize an asset or launch a run for the specific job, instead of linking to global pages.
-
[ui] When a run is launched from the Launchpad editor while an editor hint popover is open, the popover remained on the page even after navigation. This has been fixed.
-
[ui] Fixed an issue where clicking on the zoom controls on a DAG view would close the right detail panel for selected nodes.
-
[ui] Fixed an issue shift-selecting assets with multi-component asset keys.
-
[ui] Fixed an issue with the truncation of the asset stale causes popover.
-
When using a
TimeWindowPartitionMapping
with astart_offset
orend_offset
specified, requesting the downstream partitions of a given upstream partition would yield incorrect results. This has been fixed. -
When using
AutoMaterializePolicy
s with observable source assets, in rare cases, a second run could be launched in response to the same version being observed twice. This has been fixed. -
When passing in
hook_defs
todefine_asset_job
, if any of those hooks had required resource keys, a missing resource error would surface when the hook was executed. This has been fixed. -
Fixed a typo in a documentation URL in
dagster-duckdb-polars
tests. The URL now works correctly.
Experimental
- [dagster-dbt] Added methods to
DbtManifest
to fetch asset keys of sources and models:DbtManifest.get_asset_key_for_model
,DbtManifest.get_asset_key_for_source
. These methods are utilities for defining python assets as dependencies of dbt assets via@asset(key=manifest.get_asset_key_for_model(...)
. - [dagster-dbt] The use of the
state_path
parameter withDbtManifestAssetSelection
has been deprecated, and will be removed in the next minor release. - Added experimental support for limiting global op/asset concurrency across runs.
Dependencies
- Upper bound on the
grpcio
package (fordagster
) has been removed.
Breaking Changes
- Legacy methods of
PartitionMapping
have been removed. Defining custom partition mappings has been unsupported since 1.1.7.
Community Contributions
- [dagster-airbyte] Added the ability to specify asset groups to
build_airbyte_assets
. Thanks @guy-rvvup!
Documentation
- For Dagster Cloud Serverless users, we’ve added our static IP addresses to the Serverless docs.
1.3.11 (core) / 0.19.11 (libraries)
New
- Assets with lazy auto-materialize policies are no longer auto-materialized if they are missing but don’t need to be materialized in order to help downstream assets meet their freshness policies.
- [ui] The descriptions of auto-materialize policies in the UI now include their skip conditions along with their materialization conditions.
- [dagster-dbt] Customized asset keys can now be specified for nodes in the dbt project, using
meta.dagster.asset_key
. This field takes in a list of strings that are used as the components of the generatedAssetKey
.
version: 2
models:
- name: users
config:
meta:
dagster:
asset_key: ["my", "custom", "asset_key"]
- [dagster-dbt] Customized groups can now be specified for models in the dbt project, using
meta.dagster.group
. This field takes in a string that is used as the Dagster group for the generated software-defined asset corresponding to the dbt model.
version: 2
models:
- name: users
config:
meta:
dagster:
group: "my_group"
Bugfixes
- Fixed an issue where the
dagster-msteams
anddagster-mlflow
packages could be installed with incompatible versions of thedagster
package due to a missing pin. - Fixed an issue where the
dagster-daemon run
command sometimes kept code server subprocesses open longer than it needed to, making the process use more memory. - Previously, when using
@observable_source_asset
s with AutoMaterializePolicies, it was possible for downstream assets to get “stuck”, not getting materialized when other upstream assets changed, or for multiple down materializations to be kicked off in response to the same version being observed multiple times. This has been fixed. - Fixed a case where the materialization count for partitioned assets could be wrong.
- Fixed an error which arose when trying to request resources within run failure sensors.
- [dagster-wandb] Fixed handling for multi-dimensional partitions. Thanks @chrishiste
Experimental
- [dagster-dbt] improvements to
@dbt_assets
project_dir
andtarget_path
inDbtCliTask
are converted from typestr
to typepathlib.Path
.- In the case that dbt logs are not emitted as json, the log will still be redirected to be printed in the Dagster compute logs, under
stdout
.
Documentation
- Fixed a typo in dagster_aws S3 resources. Thanks @akan72
- Fixed a typo in link on the Dagster Instance page. Thanks @PeterJCLaw