Skip to content

Releases: dagster-io/dagster

1.1.7 (core) / 0.17.7 (libraries)

15 Dec 23:31
Compare
Choose a tag to compare

New

  • Definitions is no longer marked as experimental and is the preferred API over @repository for new users of Dagster. Examples, tutorials, and documentation have largely ported to this new API. No migration is needed. Please see GitHub discussion for more details.
  • The “Workspace” section of Dagit has been removed. All definitions for your code locations can be accessed via the “Deployment” section of the app. Just as in the old Workspace summary page, each code location will show counts of its available jobs, assets, schedules, and sensors. Additionally, the code locations page is now available at /locations.
  • Lagged / rolling window partition mappings: TimeWindowPartitionMapping now accepts start_offset and end_offset arguments that allow specifying that time partitions depend on earlier or later time partitions of upstream assets.
  • Asset partitions can now depend on earlier time partitions of the same asset. The asset reconciliation sensor will respect these dependencies when requesting runs.
  • dagit can now accept multiple arguments for the -m and -f flags. For each argument a new code location is loaded.
  • Schedules created by build_schedule_from_partitioned_job now execute more performantly - in constant time, rather than linear in the number of partitions.
  • The QueuedRunCoordinator now supports options dequeue_use_threads and dequeue_num_workers options to enable concurrent run dequeue operations for greater throughput.
  • [dagster-dbt] load_assets_from_dbt_project, load_assets_from_dbt_manifest, and load_assets_from_dbt_cloud_job now support applying freshness policies to loaded nodes. To do so, you can apply dagster_freshness_policy config directly in your dbt project, i.e. config(dagster_freshness_policy={"maximum_lag_minutes": 60}) would result in the corresponding asset being assigned a FreshnessPolicy(maximum_lag_minutes=60).
  • The DAGSTER_RUN_JOB_NAME environment variable is now set in containerized environments spun up by our run launchers and executor.
  • [dagster-airflow] make_dagster_repo_from_airflow_dags_path ,make_dagster_job_from_airflow_dag and make_dagster_repo_from_airflow_dag_bag have a new connections parameter which allows for configuring the airflow connections used by migrated dags.

Bugfixes

  • Fixed a bug where the log property was not available on the RunStatusSensorContext context object provided for run status sensors for sensor logging.

  • Fixed a bug where the re-execute button on runs of asset jobs would incorrectly show warning icon, indicating that the pipeline code may have changed since you last ran it.

  • Fixed an issue which would cause metadata supplied to graph-backed assets to not be viewable in the UI.

  • Fixed an issue where schedules often took up to 5 seconds to start after their tick time.

  • Fixed an issue where Dagster failed to load a dagster.yaml file that specified the folder to use for sqlite storage in the dagster.yaml file using an environment variable.

  • Fixed an issue which would cause the k8s/docker executors to unnecessarily reload CacheableAssetsDefinitions (such as those created when using load_assets_from_dbt_cloud_job) on each step execution.

  • [dagster-airbyte] Fixed an issue where Python-defined Airbyte sources and destinations were occasionally recreated unnecessarily.

  • Fixed an issue with build_asset_reconciliation_sensor that would cause it to ignore in-progress runs in some cases.

  • Fixed a bug where GQL errors would be thrown in the asset explorer when a previously materialized asset had its dependencies changed.

  • [dagster-airbyte] Fixed an error when generating assets for normalization table for connections with non-object streams.

  • [dagster-dbt] Fixed an error where dbt Cloud jobs with dbt run and dbt run-operation were incorrectly validated.

  • [dagster-airflow] use_ephemeral_airflow_db now works when running within a PEX deployment artifact.

Documentation

  • New documentation for Code locations and how to define one using Definitions
  • Lots of updates throughout the docs to reflect the recommended usage of Definitions. Any content not ported to Definitions in this release is in the process of being updated.
  • New documentation for dagster-airflow on how to start writing dagster code from an airflow background.

All Changes

1.1.6...1.1.7

See All Contributors
  • 858b9d2 - Non isolated runs docs (#10860) by @johannkm
  • 24bff5f - [dagit] Fix Gantt chart rendering of per-step resource init log messages (#10943) by @bengotow
  • cde76d9 - [dagit] Fix the “Assets” label on large asset runs (#10932) by @bengotow
  • 6cd734a - [dagit] Fix “Job In” label regression in Chrome v109 (#10934) by @bengotow
  • 523edb0 - [dagit] Pass repository tag when loading runs for Partitions page (#10948) by @bengotow
  • 49f7c4f - [dagit] Updated asset DAG styles, added additional compute tags (#10931) by @bengotow
  • fb574c0 - [docs] - add a guide for scheduling assets (#10949) by @slopp
  • 66959b7 - cap packaging requirement at 22.0 (#10968) by @smackesey
  • 5a45dd1 - Execution result typing (#10919) by @smackesey
  • 7d18ed2 - solid -> node method renames (#10920) by @smackesey
  • f19c6ea - [dagster-airflow] re-enable airflow 2.5.0 tests (#10966) by @Ramshackle-Jamathon
  • 0d80225 - [dagster-airbyte][docs] Use dagster-airbyte CLI alias in docs (#10955) by @benpankow
  • 162e791 - [dagster-slack] create slack_on_freshness_policy_sensor (#10960) by @OwenKephart
  • faf4f30 - [docs ] - fix image dimensions in hello-dagster materialize (#10977) by @slopp
  • 4babd9d - 1.1.6 Changelog (#10978) by @OwenKephart
  • 577e3eb - Automation: versioned docs for 1.1.6 by @elementl-devtools
  • 0b59fa9 - [convert-environment-variables-and-secrets-guide-stack-2] Convert env vars and secrets guide from repository to Definitions by @schrockn
  • 352d077 - unexperimentalize PartitionMapping (#10980) by @sryza
  • 231ddfa - remove validation in AssetGraph.get_child_partition_keys_of_parent an… (#10981) by @sryza
  • 305f1a2 - [convert-development-to-production-1] Move repository to __init__.py by @schrockn
  • d1054f9 - [convert-development-to-production-2] Changing development to production to use snowflake_pandas_io_manager by @schrockn
  • 7d5fabc - [convert-deployment-to-production-3] Convert @repository to Definitions by @schrockn
  • 196c086 - [convert-development-to-production-4] Use base object instead of resource by @schrockn
  • 435ddcd - [convert-development-to-production-6] Use pyproject.toml instead of workspace.yaml by @schrockn
  • 888660d - [convert-development-to-production-7] Convert guide to use Definitions by @schrockn
  • ad41a9f - [dagit] New Code Locations table (#10975) by @hellendag
  • 3008420 - Pin graphene to <3.2 by @schrockn
  • 5373a74 - fix missing metadata in dagit on graph-backed assets (#10988) by @OwenKephart
  • 7bbcb8a - [dagit] Export a few Code Location components for Cloud (#11008) by @hellendag
  • 39c84bb - Temporarily disable some Azure test suites (#11007) by @jmsanders
  • 41101f7 - [graphql] fix for graphene 3.2 (#11011) by @alangenfeld
  • a0e5c3d - [code-location-selector-stack] Code location sensor tests 1/N. Rename workspace_load_target function to create_workspace_load_target by @schrockn
  • 160aba2 - [code-location-selector-stack] Code location sensor tests 2/N Make instance_with_multiple_repos_with_sensors workspace_load_target parameterizable by @schrockn
  • d6ad971 - [code-location-selector-stack] Code location sensor tests 3/N. Refactor instance_with_multiple_repos_with_sensors to handle multiple code locations by @schrockn
  • 6e305c1 - [code-location-selector-stack] Code location sensor tests 4/N Actually add test to test cross code location selector by @schrockn
  • 5fc415a - [code-location-selector-stack Add CodeLocationSelector; Have run_status_sensor accept it by @schrockn
  • 3c2367f - Fix test_persistent by @schrockn
  • bf83710 - chore: auto-assign dependabot pull requests (#10953) by @rexledesma
  • a91525b - fix(dbt-cloud): parse command string to find materialization commands (#10989) by @rexledesma
  • 3be1547 - [code-location-selector-stack] Change typehint on make_slack_on_run_failure_sensor to accept CodeLocationSelector by @schrockn
  • c156ab3 - [docs] re-org snowflake integration guide (#10984) by @jamiedemaria
  • 33affcf - Check scheduler ticks right after each minute boundary instead of once every 5 seconds (#10886) by @gibsondan
  • 3b93947 - [docs] - [definitions] Update Configured API concept doc (#11020) by @erinkcochran87
  • 3cfc612 - Fix bug with re-execution snapshot ids (#10967) by @OwenKephart
  • 3fb76d9 - [bugfix] UPathIOManger load_input type checking (#11022) by @danielgafni
  • 546ece8 - Pathspec typing fix (#11036) by @smackesey
  • c729760 - [dagit] /code-locations -> /locations (#11024) by @hellendag
  • f606250 - Replace partition ranges with subsets (#10909) by @clairelin135
  • cec9f43 - [dagit] With multiple assets selected, backfill “missing” should include partially materialized partitions (#11027) by @bengotow
  • ae946c0 - [dagit] Add empty value string for invalid tag input in Runs filter (#11044) by @hellendag
  • 57e5def - [docs] - [definitions] Update Repository page for Definitions (#10986) by @erinkcochran87
  • 9300fc7 - [definitions-accessors] Add get_job_def to Definitions. by @schrockn
  • 1014add - 1/ definitions in create new project: update dagster project CLI (#10829) by @yuhan
  • 550ec43 - 2/ definitions in create new project: update create-new-project docs (#10830) by @yuhan
  • 7eb520d - [docs] - [definitions] - Update dbt tutorial to use Definitions (#10842) by @erinkcochran87
  • 891dc89 - `2.1/ definitions in in create new...
Read more

1.1.6 (core) / 0.17.6 (libraries)

08 Dec 21:32
Compare
Choose a tag to compare

New

  • [dagit] Throughout Dagit, when the default repository name __repository__ is used for a repo, only the code location name will be shown. This change also applies to URL paths.
  • [dagster-dbt] When attempting to generate software-defined assets from a dbt Cloud job, an error is now raised if none are created.
  • [dagster-dbt] Software-defined assets can now be generated for dbt Cloud jobs that execute multiple commands.

Bugfixes

  • Fixed a bug that caused load_asset_value to error with the default IO manager when a partition_key argument was provided.
  • Previously, trying to access context.partition_key or context.asset_partition_key_for_output when invoking an asset directly (e.g. in a unit test) would result in an error. This has been fixed.
  • Failure hooks now receive the original exception instead of RetryRequested when using a retry policy.
  • The LocationStateChange GraphQL subscription has been fixed (thanks @****roeij !)**
  • Fixed a bug where a sqlite3.ProgrammingError error was raised when creating an ephemeral DagsterInstance, most commonly when build_resources was called without passing in an instance parameter.
  • [dagstermill] Jupyter notebooks now correctly render in Dagit on Windows machines.
  • [dagster-duckdb-pyspark] New duckdb_pyspark_io_manager helper to automatically create a DuckDB I/O manager that can store and load PySpark DataFrames.
  • [dagster-mysql] Fixed a bug where versions of mysql < 8.0.31 would raise an error on some run queries.
  • [dagster-postgres] connection url param “options“ are no longer overwritten in dagit.
  • [dagit] Dagit now allows backfills to be launched for asset jobs that have partitions and required config.
  • [dagit] Dagit no longer renders the "Job in repo@location" label incorrectly in Chrome v109.
  • [dagit] Dagit's run list now shows improved labels on asset group runs of more than three assets
  • [dagit] Dagit's run gantt chart now renders per-step resource initialization markers correctly.
  • [dagit] In op and asset descriptions in Dagit, rendered markdown no longer includes extraneous escape slashes.
  • Assorted typos and omissions fixed in the docs — thanks @C0DK and @akan72!

Experimental

  • As an optional replacement of the workspace/repository concepts, a new Definitions entrypoint for tools and the UI has been added. A single Definitions object per code location may be instantiated, and accepts typed, named arguments, rather than the heterogenous list of definitions returned from an @repository-decorated function. To learn more about this feature, and provide feedback, please refer to the Github Discussion.
  • [dagster-slack] A new make_slack_on_freshness_policy_status_change_sensor allows you to create a sensor to alert you when an asset is out of date with respect to its freshness policy (and when it’s back on time!)

Documentation

1.1.5 (core) / 0.17.5 (libraries)

02 Dec 15:46
Compare
Choose a tag to compare

Bugfixes

  • [dagit] Fixed an issue where the Partitions tab sometimes failed to load for asset jobs.

1.1.4 (core) / 0.17.4 (libraries)

02 Dec 15:45
Compare
Choose a tag to compare

Community Contributions

  • Fixed a typo in GCSComputeLogManager docstring (thanks reidab)!
  • [dagster-airbyte] job cancellation on run termination is now optional. (Thanks adam-bloom)!
  • [dagster-snowflake] Can now specify snowflake role in config to snowflake io manager (Thanks binhnefits)!
  • [dagster-aws] A new AWS systems manager resource (thanks zyd14)!
  • [dagstermill] Retry policy can now be set on dagstermill assets (thanks nickvazz)!
  • Corrected typo in docs on metadata (thanks C0DK)!

New

  • Added a job_name parameter to InputContext.
  • Fixed inconsistent io manager behavior when using execute_in_process on a GraphDefinition (it would use the fs_io_manager instead of the in-memory io manager).
  • Compute logs will now load in Dagit even when websocket connections are not supported.
  • [dagit] A handful of changes have been made to our URLs:
    • The /instance URL path prefix has been removed. E.g. /instance/runs can now be found at /runs.
    • The /workspace URL path prefix has been changed to /locations. E.g. the URL for job my_job in repository foo@bar can now be found at /locations/foo@bar/jobs/my_job.
  • [dagit] The “Workspace” navigation item in the top nav has been moved to be a tab under the “Deployment” section of the app, and is renamed to “Definitions”.
  • [dagstermill] Dagster events can now be yielded from asset notebooks using dagstermill.yield_event.
  • [dagstermill] Failed notebooks can be saved for inspection and debugging using the new save_on_notebook_failure parameter.
  • [dagster-airflow] Added a new option use_ephemeral_airflow_db which will create a job run scoped airflow db for airflow dags running in dagster
  • [dagster-dbt] Materializing software-defined assets using dbt Cloud jobs now supports partitions.
  • [dagster-dbt] Materializing software-defined assets using dbt Cloud jobs now supports subsetting. Individual dbt Cloud models can be materialized, and the proper filters will be passed down to the dbt Cloud job.
  • [dagster-dbt] Software-defined assets from dbt Cloud jobs now support configurable group names.
  • [dagster-dbt] Software-defined assets from dbt Cloud jobs now support configurable AssetKeys.

Bugfixes

  • Fixed regression starting in 1.0.16 for some compute log managers where an exception in the compute log manager setup/teardown would cause runs to fail.
  • The S3 / GCS / Azure compute log managers now sanitize the optional prefix argument to prevent badly constructed paths.
  • [dagit] The run filter typeahead no longer surfaces key-value pairs when searching for tag:. This resolves an issue where retrieving the available tags could cause significant performance problems. Tags can still be searched with freeform text, and by adding them via click on individual run rows.
  • [dagit] Fixed an issue in the Runs tab for job snapshots, where the query would fail and no runs were shown.
  • [dagit] Schedules defined with cron unions displayed “Invalid cron string” in Dagit. This has been resolved, and human-readable versions of all members of the union will now be shown.

Breaking Changes

  • You can no longer set an output’s asset key by overriding get_output_asset_key on the IOManager handling the output. Previously, this was experimental and undocumented.

Experimental

  • Sensor and schedule evaluation contexts now have an experimental log property, which log events that can later be viewed in Dagit. To enable these log views in dagit, navigate to the user settings and enable the Experimental schedule/sensor logging view option. Log links will now be available for sensor/schedule ticks where logs were emitted. Note: this feature is not available for users using the NoOpComputeLogManager.

1.1.3 (core) / 0.17.3 (libraries)

23 Nov 22:38
Compare
Choose a tag to compare

Bugfixes

  • Fixed a bug with the asset reconciliation sensor that caused duplicate runs to be submitted in situations where an asset has a different partitioning than its parents.
  • Fixed a bug with the asset reconciliation sensor that caused it to error on time-partitioned assets.
  • [dagster-snowflake] Fixed a bug when materializing partitions with the Snowflake I/O manager where sql BETWEEN was used to determine the section of the table to replace. BETWEEN included values from the next partition causing the I/O manager to erroneously delete those entries.
  • [dagster-duckdb] Fixed a bug when materializing partitions with the DuckDB I/O manager where sql BETWEEN was used to determine the section of the table to replace. BETWEEN included values from the next partition causing the I/O manager to erroneously delete those entries.

1.1.2 (core) / 0.17.2 (libraries)

19 Nov 00:17
Compare
Choose a tag to compare

Bugfixes

  • In Dagit, assets that had been materialized prior to upgrading to 1.1.1 were showing as "Stale". This is now fixed.
  • Schedules that were constructed with a list of cron strings previously rendered with an error in Dagit. This is now fixed.
  • For users running dagit version >= 1.0.17 (or dagster-cloud) with dagster version < 1.0.17, errors could occur when hitting "Materialize All" and some other asset-related interactions. This has been fixed.

1.1.1 (core) / 0.17.1 (libraries) - Thank U, Next

19 Nov 00:15
Compare
Choose a tag to compare

Major Changes since 1.0.0 (core) / 0.16.0 (libraries)

Core

  • You can now create multi-dimensional partitions definitions for software-defined assets, through the MultiPartitionsDefinition API. In Dagit, you can filter and materialize certain partitions by providing ranges per-dimension, and view your materializations by dimension.
  • The new asset reconciliation sensor automatically materializes assets that have never been materialized or whose upstream assets have changed since the last time they were materialized. It works with partitioned assets too. You can construct it using build_asset_reconciliation_sensor.
  • You can now add a FreshnessPolicy to any of your software-defined assets, to specify how up-to-date you expect that asset to be. You can view the freshness status of each asset in Dagit, alert when assets are missing their targets using the @freshness_policy_sensor, and use the build_asset_reconciliation_sensor to make a sensor that automatically kick off runs to materialize assets based on their freshness policies.
  • You can now version your asset ops and source assets to help you track which of your assets are stale. You can do this by assigning op_version s to software-defined assets or observation_fn s to SourceAssets. When a set of assets is versioned in this way, their “Upstream Changed” status will be based on whether upstream versions have changed, rather than on whether upstream assets have been re-materialized. You can launch runs that materialize only stale assets.
  • The new @multi_asset_sensor decorator enables defining custom sensors that trigger based on the materializations of multiple assets. The context object supplied to the decorated function has methods to fetch latest materializations by asset key, as well as built-in cursor management to mark specific materializations as “consumed”, so that they won’t be returned in future ticks. It can also fetch materializations by partition and mark individual partitions as consumed.
  • RepositoryDefinition now exposes a load_asset_value method, which accepts an asset key and invokes the asset’s I/O manager’s load_input function to load the asset as a Python object. This can be used in notebooks to do exploratory data analysis on assets.
  • With the new asset_selection parameter on @sensor and SensorDefinition, you can now define a sensor that directly targets a selection of assets, instead of targeting a job.
  • When running dagit or dagster-daemon locally, environment variables included in a .env file in the form KEY=value in the same folder as the command will be automatically included in the environment of any Dagster code that runs, allowing you to easily use environment variables during local development.

Dagit

  • The Asset Graph has been redesigned to make better use of color to communicate asset health. New status indicators make it easy to spot missing and stale assets (even on large graphs!) and the UI updates in real-time as displayed assets are materialized.
  • The Asset Details page has been redesigned and features a new side-by-side UI that makes it easier to inspect event metadata. A color-coded timeline on the partitions view allows you to drag-select a time range and inspect the metadata and status quickly. The new view also supports assets that have been partitioned across multiple dimensions.
  • The new Workspace page helps you quickly find and navigate between all your Dagster definitions. It’s also been re-architected to load significantly faster when you have thousands of definitions.
  • The Overview page is the new home for the live run timeline and helps you understand the status of all the jobs, schedules, sensors, and backfills across your entire deployment. The timeline is now grouped by repository and shows a run status rollup for each group.

Integrations

  • dagster-dbt now supports generating software-defined assets from your dbt Cloud jobs.
  • dagster-airbyte and dagster-fivetran now support automatically generating assets from your ETL connections using load_assets_from_airbyte_instance and load_assets_from_fivetran_instance.
  • New dagster-duckdb integration: build_duckdb_io_manager allows you to build an I/O manager that stores and loads Pandas and PySpark DataFrames in DuckDB.

Database migration

  • Optional database schema migration, which can be run via dagster instance migrate:
    • Improves Dagit performance by adding database indexes which should speed up the run view as well as a range of asset-based queries.
    • Enables multi-dimensional asset partitions and asset versioning.

Breaking Changes and Deprecations

  • define_dagstermill_solid, a legacy API, has been removed from dagstermill. Use define_dagstermill_op or define_dagstermill_asset instead to create an op or asset from a Jupyter notebook, respectively.
  • The internal ComputeLogManager API is marked as deprecated in favor of an updated interface: CapturedLogManager. It will be removed in 1.2.0. This should only affect dagster instances that have implemented a custom compute log manager.

Dependency Changes

  • dagster-graphql and dagit now use version 3 of graphene

Since 1.0.17

New

  • The new UPathIOManager base class is now a top-level Dagster export. This enables you to write a custom I/O manager that plugs stores data in any filesystem supported by universal-pathlib and uses different serialization format than pickle (Thanks Daniel Gafni!).
  • The default fs_io_manager now inherits from the UPathIOManager, which means that its base_dir can be a path on any filesystem supported by universal-pathlib (Thanks Daniel Gafni!).
  • build_asset_reconciliation_sensor now works with support partitioned assets.
  • build_asset_reconciliation_sensor now launches runs to keep assets in line with their defined FreshnessPolicies.
  • The FreshnessPolicy object is now exported from the top level dagster package.
  • For assets with a FreshnessPolicy defined, their current freshness status will be rendered in the asset graph and asset details pages.
  • The AWS, GCS, and Azure compute log managers now take an additional config argument upload_interval which specifies in seconds, the interval in which partial logs will be uploaded to the respective cloud storage. This can be used to display compute logs for long-running compute steps.
  • When running dagit or dagster-daemon locally, environment variables included in a .env file in the form KEY=value in the same folder as the command will be automatically included in the environment of any Dagster code that runs, allowing you to easily test environment variables during local development.
  • observable_source_asset decorator creates a SourceAsset with an associated observation_fn that should return a LogicalVersion, a new class that wraps a string expressing a version of an asset’s data value.
  • [dagit] The asset graph now shows branded compute_kind tags for dbt, Airbyte, Fivetran, Python and more.
  • [dagit] The asset details page now features a redesigned event viewer, and separate tabs for Partitions, Events, and Plots. This UI was previously behind a feature flag and is now generally available.
  • [dagit] The asset graph UI has been revamped and makes better use of color to communicate asset status, especially in the zoomed-out view.
  • [dagit] The asset catalog now shows freshness policies in the “Latest Run” column when they are defined on your assets.
  • [dagit] The UI for launching backfills in Dagit has been simplified. Rather than selecting detailed ranges, the new UI allows you to select a large “range of interest” and materialize only the partitions of certain statuses within that range.
  • [dagit] The partitions page of asset jobs has been updated to show per-asset status rather than per-op status, so that it shares the same terminology and color coding as other asset health views.
  • [dagster-k8s] Added an execute_k8s_job function that can be called within any op to run an image within a Kubernetes job. The implementation is similar to the build-in k8s_job_op , but allows additional customization - for example, you can incorporate the output of a previous op into the launched Kubernetes job by passing it into execute_k8s_job. See the dagster-k8s API docs for more information.
  • [dagster-databricks] Environment variables used by dagster cloud are now automatically set when submitting databricks jobs if they exist, thank you @zyd14!
  • [dagstermill] define_dagstermill_asset now supports RetryPolicy . Thanks @nickvazz!
  • [dagster-airbyte] When loading assets from an Airbyte instance using load_assets_from_airbyte_instance, users can now optionally customize asset names using connector_to_asset_key_fn.
  • [dagster-fivetran] When loading assets from a Fivetran instance using load_assets_from_fivetran_instance, users can now alter the IO manager using io_manager_key or connector_to_io_manager_key_fn, and customize asset names using connector_to_asset_key_fn.

Bugfixes

  • Fixed a bug where terminating runs from a backfill would fail without notice.
  • Executing a subset of ops within a job that specifies its config value directly on the job, it no longer attempts to use that config value as the default. The default is still presented in the editable interface in dagit.
  • [dagit] The partition step run matrix now reflects historical step status instead of just the last run’s step status for a particular partition.

Documentation

Read more

1.0.17 (core) / 0.16.17 (libraries)

10 Nov 23:19
Compare
Choose a tag to compare

New

  • With the new asset_selection parameter on @sensor and SensorDefinition, you can now define a sensor that directly targets a selection of assets, instead of targeting a job.
  • materialize and materialize_to_memory now accept a raise_on_error argument, which allows you to determine whether to raise an Error if the run hits an error or just return as failed.
  • (experimental) Dagster now supports multi-dimensional asset partitions, through a new MultiPartitionsDefinition object. An optional schema migration enables support for this feature (run via dagster instance migrate). Users who are not using this feature do not need to run the migration.
  • You can now launch a run that targets a range of asset partitions, by supplying the "dagster/asset_partition_range_start" and "dagster/asset_partition_range_end" tags.
  • [dagit] Asset and op graphs in Dagit now show integration logos, making it easier to identify assets backed by notebooks, DBT, Airbyte, and more.
  • [dagit] a -db-pool-recycle cli flag (and dbPoolRecycle helm option) have been added to control how long the pooled connection dagit uses persists before recycle. The default of 1 hour is now respected by postgres (mysql previously already had a hard coded 1hr setting). Thanks @adam-bloom!
  • [dagster-airbyte] Introduced the ability to specify output IO managers when using load_assets_from_airbyte_instance and load_assets_from_airbyte_project.
  • [dagster-dbt] the dbt_cloud_resource resource configuration account_id can now be sourced from the environment. Thanks @sowusu-ba!
  • [dagster-duckdb] The DuckDB integration improvements: PySpark DataFrames are now fully supported, “schema” can be specified via IO Manager config, and API documentation has been improved to include more examples
  • [dagster-fivetran] Introduced experimental load_assets_from_fivetran_instance helper which automatically pulls assets from a Fivetran instance.
  • [dagster-k8s] Fixed an issue where setting the securityContext configuration of the Dagit pod in the Helm chart didn’t apply to one of its containers. Thanks @jblawatt!

Bugfixes

  • Fixed a bug that caused the asset_selection parameter of RunRequest to not be respected when used inside a schedule.
  • Fixed a bug with health checks during delayed Op retries with the k8s_executor and docker_executor.
  • [dagit] The asset graph now live-updates when assets fail to materialize due to op failures.
  • [dagit] The "Materialize" button now respects the backfill permission for multi-run materializations.
  • [dagit] Materializations without metadata are padded correctly in the run logs.
  • [dagster-aws] Fixed an issue where setting the value of task_definition field in the EcsRunLauncher to an environment variable stopped working.
  • [dagster-dbt] Add exposures in load_assets_from_dbt_manifest. This fixed then error when load_assets_from_dbt_manifest failed to load from dbt manifest with exposures. Thanks @sowusu-ba!
  • [dagster-duckdb] In some examples, the duckdb config was incorrectly specified. This has been fixed.

Breaking Changes

  • The behavior of the experimental asset reconciliation sensor, which is accessible via build_asset_reconciliation_sensor has changed to be more focused on reconciliation. It now materializes assets that have never been materialized before and avoids materializing assets that are “Upstream changed”. The build_asset_reconciliation_sensor API no longer accepts wait_for_in_progress_runs and wait_for_all_upstream arguments.

Documentation

All Changes

1.0.16...1.0.17

See All Contributors
Read more

1.0.16 (core) / 0.16.16 (libraries)

03 Nov 21:58
Compare
Choose a tag to compare

New

  • [dagit] The new Overview and Workspace pages have been enabled for all users, after being gated with a feature flag for the last several releases. These changes include design updates, virtualized tables, and more performant querying.
    • The top navigation has been updated to improve space allocation, with main nav links moved to the left.
    • “Overview” is the new Dagit home page and “factory floor” view, were you can find the run timeline, which now offers time-based pagination. The Overview section also contains pages with all of your jobs, schedules, sensors, and backfills. You can filter objects by name, and collapse or expand repository sections.
    • “Workspace” has been redesigned to offer a better summary of your repositories, and to use the same performant table views, querying, and filtering as in the Overview pages.
  • @asset and @multi_asset now accept a retry_policy argument. (Thanks Adam Bloom!)
  • When loading an input that depends on multiple partitions of an upstream asset, the fs_io_manager will now return a dictionary that maps partition keys to the stored values for those partitions. (Thanks andrewgryan!).
  • JobDefinition.execute_in_process now accepts a run_config argument even when the job is partitioned. If supplied, the run config will be used instead of any config provided by the job’s PartitionedConfig.
  • The run_request_for_partition method on jobs now accepts a run_config argument. If supplied, the run config will be used instead of any config provided by the job’s PartitionedConfig.
  • The new NotebookMetadataValue can be used to report the location of executed jupyter notebooks, and Dagit will be able to render the notebook.
  • Resolving asset dependencies within a group now works with multi-assets, as long as all the assets within the multi-asset are in the same group. (Thanks @peay!)
  • UPathIOManager, a filesystem-agnostic IOManager base class has been added - (Thanks @danielgafni!)
  • A threadpool option has been added for the scheduler daemon. This can be enabled via your dagster.yaml file; check out the docs.
  • The default LocalComputeLogManager will capture compute logs by process instead of by step. This means that for the in_process executor, where all steps are executed in the same process, the captured compute logs for all steps in a run will be captured in the same file.
  • [dagster-airflow] make_dagster_job_from_airflow_dag now supports airflow 2, there is also a new mock_xcom parameter that will mock all calls to made by operators to xcom.
  • [helm] volume and volumeMount sections have been added for the dagit and daemon sections of the helm chart.

Bugfixes

  • For partitioned asset jobs whose config is a hardcoded dictionary (rather than a PartitionedConfig), previously run_request_for_partition would produce a run with no config. Now, the run has the hardcoded dictionary as its config.
  • Previously, asset inputs would be resolved to upstream assets in the same group that had the same name, even if the asset input already had a key prefix. Now, asset inputs are only resolved to upstream assets in the same group if the input path only has a single component.
  • Previously, asset inputs could get resolved to outputs of the same AssetsDefinition, through group-based asset dependency resolution, which would later error because of a circular dependency. This has been fixed.
  • Previously, the “Partition Status” and “Backfill Status” fields on the Backfill page in dagit were always incomplete and showed missing partitions. This has been fixed to accurately show the status of the backfill runs.
  • [dagit] When viewing the config dialog for a run with a very long config, scrolling was broken and the “copy” button was not visible. This has been fixed.
  • Executors now compress step worker arguments to avoid CLI length limits with large DAGs.
  • [dagster-msteams] Longer messages can now be used in Teams HeroCard - (Thanks @jayhale!)

Documentation

  • API docs for InputContext have been improved - (Thanks @peay!)
  • [dagster-snowflake] Improved documentation for the Snowflake IO manager

All Changes

1.0.15...1.0.16

See All Contributors
Read more

1.0.15 (core) / 0.16.15 (libraries)

27 Oct 20:07
Compare
Choose a tag to compare

New

  • [dagit] The run timeline now shows all future schedule ticks for the visible time window, not just the next ten ticks.
  • [dagit] Asset graph views in Dagit refresh as materialization events arrive, making it easier to watch your assets update in real-time.
  • [dagster-airbyte] Added support for basic auth login to the Airbyte resource.
  • Configuring a Python Log Level will now also apply to system logs created by Dagster during a run.

Bugfixes

  • Fixed a bug that broke asset partition mappings when using the key_prefix with methods like load_assets_from_modules.
  • [dagster-dbt] When running dbt Cloud jobs with the dbt_cloud_run_op, the op would emit a failure if the targeted job did not create a run_results.json artifact, even if this was the expected behavior. This has been fixed.
  • Improved performance by adding database indexes which should speed up the run view as well as a range of asset-based queries. These migrations can be applied by running dagster instance migrate.
  • An issue that would cause schedule/sensor latency in the daemon during workspace refreshes has been resolved.
  • [dagit] Shift-clicking Materialize for partitioned assets now shows the asset launchpad, allowing you to launch execution of a partition with config.

Community Contributions

  • Fixed a bug where asset keys with - were not being properly sanitized in some situations. Thanks @peay!
  • [dagster-airbyte] A list of connection directories can now be specified in load_assets_from_airbyte_project. Thanks @adam-bloom!
  • [dagster-gcp] Dagster will now retry connecting to GCS if it gets a ServiceUnavailable error. Thanks @cavila-evoliq!
  • [dagster-postgres] Use of SQLAlchemy engine instead of psycopg2 when subscribing to PostgreSQL events. Thanks @peay!

Experimental

  • [dagster-dbt] Added a display_raw_sql flag to the dbt asset loading functions. If set to False, this will remove the raw sql blobs from the asset descriptions. For large dbt projects, this can significantly reduce the size of the generated workspace snapshots.
  • [dagit] A “New asset detail pages” feature flag available in Dagit’s settings allows you to preview some upcoming changes to the way historical materializations and partitions are viewed.

All Changes

1.0.14...1.0.15

See All Contributors