Releases: dagster-io/dagster
1.3.0 (core) / 0.19.0 (libraries) "Smooth Operator"
Major Changes since 1.2.0 (core) / 0.18.0 (libraries)
Core
- Auto-materialize policies replace the asset reconciliation sensor - We significantly renovated the APIs used for specifying which assets are scheduled declaratively. Compared to
build_asset_reconciliation_sensor
s ,AutoMaterializePolicy
s work across code locations, as well as allow you to customize the conditions under which each asset is auto-materialized. [docs] - Asset backfill page - A new page in the UI for monitoring asset backfills shows the progress of each asset in the backfill.
- Clearer labels for tracking changes to data and code - Instead of the opaque “stale” indicator, Dagster’s UI now indicates whether code, upstream data, or dependencies have changed. When assets are in violation of their
FreshnessPolicy
s, Dagster’s UI now marks them as “overdue” instead of “late”. - Auto-materialization and observable source assets - Assets downstream of an observable source asset now use the source asset observations to determine whether upstream data has changed and assets need to be materialized.
- Pythonic Config and Resources - The set of APIs introduced in 1.2 is no longer experimental [community memo]. Examples, integrations, and documentation have largely ported to the new APIs. Existing resources and config APIs will continue to be supported for the foreseeable future. Check out migration guide to learn how to incrementally adopt the new APIs.
Docs
- Improved run concurrency docs - You asked (in support), and we answered! This new guide is a one-stop-shop for understanding and implementing run concurrency, whether you’re on Dagster Cloud or deploying to your own infrastructure.
- Additions to the Intro to Assets tutorial - We’ve added two new sections to the assets tutorial, focused on scheduling and I/O. While we’re close to wrapping things up for the tutorial revamp, we still have a few topics to cover - stay tuned!
- New guide about building machine learning pipelines - Many of our users learn best by example - this guide is one way we’re expanding our library of examples. In this guide, we walk you through building a simple machine learning pipeline using Dagster.
- Re-organized Dagster Cloud docs - We overhauled how the Dagster Cloud docs are organized, bringing them more in line with the UI.
Since 1.2.7 (core) / 0.18.7 (libraries)
New
- Long-running runs can now be terminated after going over a set runtime. See the run termination docs to learn more.
- Adds a performance improvement to partition status caching for multi-partitioned assets containing a time dimension.
- [ui] Asset groups are now included in global search.
- [ui] Assets in the asset catalog have richer status information that matches what is displayed on the asset graph.
- [dagster-aws] New
AthenaClientResource
,ECRPublicResource
,RedshiftClientResource
,S3Resource
,S3FileManagerResource
,ConfigurablePickledObjectS3IOManager
,SecretsManagerResource
follow Pythonic resource system. The existing APIs remain supported. - [dagster-datadog] New
DatadogResource
follows Pythonic resource system. The existingdatadog_resource
remains supported. - [dagster-ge] New
GEContextResource
follows Pythonic resource system. The existingge_context_resource
remains supported. - [dagster-github] New
GithubResource
follows Pythonic resource system. The existinggithub_resource
remains supported. - [dagster-msteams] New
MSTeamsResource
follows Pythonic resource system. The existingmsteams_resource
remains supported. - [dagster-slack] New
SlackResource
follows Pythonic resource system. The existingslack_resource
remains supported.
Bugfixes
- Fixed an issue where using
pdb.set_trace
no longer worked when running Dagster locally usingdagster dev
ordagit
. - Fixed a regression where passing custom metadata on
@asset
orOut
caused an error to be thrown. - Fixed a regression where certain states of the asset graph would cause GQL errors.
- [ui] Fixed a bug where assets downstream of source assets would sometimes incorrectly display a “New data” (previously “stale”) tag for assets with materializations generated from ops (as opposed to SDA materializations).
- [ui] Fixed a bug where URLs for code locations named
pipelines
orjobs
could lead to blank pages. - [ui] When configuring a partition-mapped asset backfill, helpful context no longer appears nested within the “warnings” section
- [ui] For observable source assets,the asset sidebar now shows a “latest observation” instead of a “latest materialization”
Breaking Changes
- By default, resources defined on
Definitions
are now automatically bound to jobs. This will only result in a change in behavior if you a) have a job with no "io_manager" defined in itsresource_defs
and b) have supplied anIOManager
with key "io_manager" to theresource_defs
argument of yourDefinitions
. Prior to 1.3.0, this would result in the job using the default filesystem-basedIOManager
for the key "io_manager". In 1.3.0, this will result in the "io_manager" supplied to yourDefinitions
being used instead. TheBindResourcesToJobs
wrapper, introduced in 1.2 to simulate this behavior, no longer has any effect. - [dagster-celery-k8s] The default kubernetes namespace for run pods when using the Dagster Helm chart with the
CeleryK8sRunLauncher
is now the same namespace as the Helm chart, instead of thedefault
namespace. To restore the previous behavior, you can set theceleryK8sRunLauncher.jobNamespace
field to the stringdefault
. - [dagster-snowflake-pandas] Due to a longstanding issue storing Pandas Timestamps in Snowflake tables, the
SnowflakePandasIOManager
has historically converted all timestamp data to strings before storing it in Snowflake. Now, it will instead ensure that timestamp data has a timezone, and if not, attach the UTC timezone. This allows the timestamp data to be stored as timestamps in Snowflake. If you have been storing timestamp data using theSnowflakePandasIOManager
you can set thestore_timestamps_as_strings=True
configuration to continue storing timestamps as strings. For more information, and instructions for migrating Snowflake tables to use timestamp types, see the Migration Guide.
Changes to experimental APIs
- Pythonic Resources and Config
- Enabled passing
RunConfig
to many APIs which previously would only accept a config dictionary. - Enabled passing raw Python objects as resources to many APIs which previously would only accept
ResourceDefinition
. - Added the ability to pass
execution
config when constructing aRunConfig
object. - Introduced more clear error messages when trying to mutate state on a Pythonic config or resource object.
- Improved direct invocation experience for assets, ops, schedules and sensors using Pythonic config and resources. Config and resources can now be passed directly as args or kwargs.
- Enabled passing
- The
minutes_late
andprevious_minutes_late
properties on the experimentalFreshnesPolicySensorContext
have been renamed tominutes_overdue
andprevious_minutes_overdue
, respectively.
Removal of deprecated APIs
- [previously deprecated, 0.15.0]
metadata_entries
arguments to event constructors have been removed. WhileMetadataEntry
still exists and will only be removed in 2.0, it is no longer passable to any Dagster public API — users should always pass a dictionary of metadata values instead.
Experimental
- Adds a performance improvement to the multi-asset sensor context’s
latest_materialization_records_by_key
function.
Documentation
- The Google BigQuery tutorial and reference pages have been updated to use the new
BigQueryPandasIOManager
andBigQueryPySparkIOManager
. - The Snowflake tutorial and reference pages have been updated to use the new
SnowflakePandasIOManager
andSnowflakePySparkIOManager
.
Dagster Cloud
- Previously, when deprovisioning an agent, code location servers were cleaned up in serial. Now, they’re cleaned up in parallel.
1.2.7 (core) / 0.18.7 (libraries)
New
- Resource access (via both
required_resource_keys
and Pythonic resources) are now supported in observable source assets. - [ui] The asset graph now shows how many partitions of each asset are currently materializing, and blue bands appear on the partition health bar.
- [ui] Added a new page to monitor an asset backfill.
- [ui] Performance improvement for Runs page for runs that materialize large numbers of assets.
- [ui] Performance improvements for Run timeline and left navigation for users with large numbers of jobs or assets.
- [ui] In the run timeline, consolidate “Ad hoc materializations” rows into a single row.
- [dagster-dbt] Python 3.10 is now supported.
- [dagster-aws] The
EcsRunLauncher
now allows you to customize volumes and mount points for the launched ECS task. See the API docs for more information. - [dagster-duckdb, dagster-duckdb-pandas, dagster-duckdb-pyspark] New
DuckDBPandasIOManager
andDuckDBPySparkIOManager
follow Pythonic resource system. The existingduckdb_pandas_io_manager
andduckdb_pyspark_io_manager
remain supported. - [dagster-gcp, dagster-gcp-pandas, dagster-gcp-pyspark] New
BigQueryPandasIOManager
andBigQueryPySparkIOManager
follow Pythonic resource system. The existingbigquery_pandas_io_manager
andbigquery_pyspark_io_manager
remain supported. - [dagster-gcp] The BigQuery resource now accepts authentication credentials as configuration. If you pass GCP authentication credentials to
gcp_crentials
, a temporary file to store the credentials will be created and theGOOGLE_APPLICATION_CREDENTIALS
environment variable will be set to the temporary file. When the BigQuery resource is garbage collected, the environment variable will be unset and the temporary file deleted. - [dagster-snowflake, dagster-snowflake-pandas, dagster-snowflake-pyspark] New
SnowflakePandasIOManager
andSnowflakePySparkIOManager
follow Pythonic resource system. The existingsnowflake_pandas_io_manager
andsnowflake_pyspark_io_manager
remain supported.
Bugfixes
- Fixed an issue where
dagster dev
would periodically emit a harmless but annoying warning every few minutes about a gRPC server being shut down. - Fixed a schedule evaluation error that occurred when schedules returned a
RunRequest(partition_key=...)
object. - Fixed a bug that caused errors in the asset reconciliation sensor when the event log includes asset materializations with partitions that aren’t part of the asset’s
PartitionsDefinition
. - Fixed a bug that caused errors in the asset reconciliation sensor when a partitioned asset is removed.
- Fixed an issue where
run_request_for_partition
would incorrectly raise an error for a job with aDynamicPartitionsDefinition
that was defined with a function. - Fixed an issue where defining a partitioned job with unpartitioned assets via
define_asset_job
would raise an error. - Fixed a bug where source asset observations could not be launched from dagit when the asset graph contained partitioned assets.
- Fixed a bug that caused
__ASSET_JOB has no op named ...
errors when using automatic run retries. - [ui] The asset partition health bar now correctly renders partial failed partitions of multi-dimensional assets in a striped red color.
- [ui] Fixed an issue where steps that were skipped due to an upstream dependency failure were incorrectly listed as “Preparing” in the right-hand column of the runs timeline.
- [ui] Fixed markdown base64 image embeds.
- [ui] Guard against localStorage quota errors when storing launchpad config tabs.
- [dagster-aws] Fixed an issue where the
EcsRunLauncher
would fail to launch runs if theuse_current_ecs_task_config
field was set toFalse
but notask_definition
field was set. - [dagster-k8s] Fixed an issue introduced in 1.2.6 where older versions of the kubernetes Python package were unable to import the package.
Community Contributions
- The
EcsRunLauncher
now allows you to set a capacity provider strategy and customize the ephemeral storage used for launched ECS tasks. See the docs for details. Thanks AranVinkItility! - Fixed an issue where freshness policies were not being correctly applied to assets with key prefixes defined via
AssetsDefinition.from_op
. Thanks @tghanken for the fix! - Added the
minimum_interval_seconds
parameter to enable customizing the evaluation interval on the slack run failure sensor, thanks @ldnicolasmay! - Fixed a docs example and updated references, thanks @NicolasPA!
Experimental
- The
Resource
annotation for Pythonic resource inputs has been renamed toResourceParam
in preparation for the release of the feature in 1.3. - When invoking ops and assets that request resources via parameters directly, resources can now be specified as arguments.
- Improved various error messages related to Pythonic config and resources.
- If the Resources Dagit feature flag is enabled, they will now show up in the overview page and search.
Documentation
- Learn how to limit concurrency in your data pipelines with our new guide!
- Need some help managing a run queue? Check out the new customizing run queue priority guide.
- New tutorial section that adds I/O managers to the tutorial project.
1.2.6 (core) / 0.18.6 (libraries)
Bugfixes
- Fixed a GraphQL resolution error which occurred when retrieving metadata for step failures in the event log.
1.2.5 (core) / 0.18.5 (libraries)
New
materialize
andmaterialize_to_memory
now both accept aselection
argument that allows specifying a subset of assets to materialize.MultiPartitionsDefinition
is no longer marked experimental.- Context methods to access time window partition information now work for
MultiPartitionsDefinition
s with a time dimension. - Improved the performance of the asset reconciliation sensor when a non-partitioned asset depends on a partitioned asset.
load_assets_from_package_module
and similar methods now accept afreshness_policy
, which will be applied to all loaded assets.- When the asset reconciliation sensor is scheduling based on freshness policies, and there are observable source assets, the observed versions now inform the data time of the assets.
build_sensor_context
andbuild_multi_asset_sensor_context
can now take aDefinitions
object in place of aRepositoryDefinition
- [UI] Performance improvement for loading asset partition statuses.
- [dagster-aws]
s3_resource
now acceptsuse_ssl
andverify
configurations.
Bugfixes
- Fixed a bug that caused an error to be raised when passing a multi-asset into the
selection
argument ondefine_asset_job
. - Fixes a graphQL error that displays on Dagit load when an asset’s partitions definition is change from a single-dimensional partitions definition to a
MultiPartitionsDefinition
. - Fixed a bug that caused backfills to fail when spanning assets that live in different code locations.
- Fixed an error that displays when a code location with a
MultiPartitionsMapping
(experimental) is loaded. - Fixed a bug that caused errors with invalid
TimeWindowPartitionMapping
s to not be bubbled up to the UI. - Fixed an issue where the scheduler would sometimes incorrectly handle spring Daylight Savings Time transitions for schedules running at 2AM in a timezone other than UTC.
- Fixed an issue introduced in the 1.2.4 release where running
pdb
stopped working when using dagster dev. - Fixed an issue where it is was possible to create
AssetMaterialization
objects with a nullAssetKey
. - Previously, if you had a
TimeWindowPartitionsDefinition
with a non-standard cron schedule, and also provided aminute_of_hour
or similar argument inbuild_schedule_from_partitioned_job
. Dagster would silently create the wrong cron expression. It now raises an error. - The asset reconciliation sensor now no longer fails when the event log contains materializations that contain partitions that aren’t contained in the asset’s
PartitionsDefinition
. These partitions are now ignored. - Fixed a regression that prevented materializing dynamically partitioned assets from the UI (thanks @planvin!)
- [UI] On the asset graph, the asset health displayed in the sidebar for the selected asset updates as materializations and failures occur.
- [UI] The asset partitions page has been adjusted to make materialization and observation event metadata more clear.
- [UI] Large table schema metadata entries now display within a modal rather than taking up considerable space on the page.
- [UI] Launching a backfill of a partitioned asset with unpartitioned assets immediately upstream no longer shows the “missing partitions” warning.
- [dagster-airflow] fixed a bug in the
PersistentAirflowDatabase
where versions of airflow from 2.0.0 till 2.3.0 would not use the correct connection environment variable name. - [dagster-census] fixed a bug with the
poll_sync_run
function ofdagster-census
that prevented polling from working correctly (thanks @ldincolasmay!)
Deprecations
- The
run_request_for_partition
method onJobDefinition
andUnresolvedAssetJobDefinition
is now deprecated and will be removed in 2.0.0. Instead, directly instantiate a run request with a partition key viaRunRequest(partition_key=...)
.
Documentation
- Added a missing link to next tutorial section (Thanks Mike Kutzma!)
1.2.4 (core) / 0.18.4 (libraries)
New
- Further performance improvements to the asset reconciliation sensor.
- Performance improvements to asset backfills with large numbers of partitions.
- New
AssetsDefinition.to_source_assets
to method convert a set of assets toSourceAsset
objects. - (experimental) Added partition mapping that defines dependency relationships between different
MultiPartitionsDefinitions
. - [dagster-mlflow] Removed the
mlflow
pin from thedagster-mlflow
package. - [ui] Syntax highlighting now supported in rendered markdown code blocks (from metadata).
Bugfixes
-
When using
build_asset_reconciliation_sensor
, in some cases duplicate runs could be produced for the same partition of an asset. This has been fixed. -
When using Pythonic configuration for resources, aliased field names would cause an error. This has been fixed.
-
Fixed an issue where
context.asset_partitions_time_window_for_output
threw an error when an asset was directly invoked withbuild_op_context
. -
[dagster-dbt] In some cases, use of ephemeral dbt models could cause the dagster representation of the dbt dependency graph to become incorrect. This has been fixed.
-
[celery-k8s] Fixed a bug that caused JSON deserialization errors when an Op or Asset emitted JSON that doesn't represent a
DagsterEvent
. -
Fixed an issue where launching a large backfill while running
dagster dev
would sometimes fail with a connection error after running for a few minutes. -
Fixed an issue where
dagster dev
would sometimes hang when running Dagster code that attempted to read in input via stdin. -
Fixed an issue where runs that take a long time to import code would sometimes continue running even after they were stopped by run monitoring for taking too long to start.
-
Fixed an issue where
AssetSelection.groups()
would simultaneously select both source and regular assets and consequently raise an error. -
Fixed an issue where
BindResourcesToJobs
would raise errors encapsulating jobs which had config specified at definition-time. -
Fixed Pythonic config objects erroring when omitting optional values rather than specifying
None
. -
Fixed Pythonic config and resources not supporting Enum values.
-
DagsterInstance.local_temp
andDagsterInstance.ephemeral
now use object instance scoped local artifact storage temporary directories instead of a shared process scoped one, removing a class of thread safety errors that could manifest on initialization. -
Improved direct invocation behavior for ops and assets which specify resource dependencies as parameters, for instance:
class MyResource(ConfigurableResource): pass @op def my_op(x: int, y: int, my_resource: MyResource) -> int: return x + y my_op(4, 5, my_resource=MyResource())
-
[dagster-azure] Fixed an issue with an AttributeError being thrown when using the async
DefaultAzureCredential
(thanks @mpicard) -
[ui] Fixed an issue introduced in 1.2.3 in which no log levels were selected by default when viewing Run logs, which made it appear as if there were no logs at all.
Deprecations
- The
environment_vars
argument toScheduleDefinition
is deprecated (the argument is currently non-functional; environment variables no longer need to be whitelisted for schedules)
Community Contributions
- Typos fixed in
[CHANGES.md](http://CHANGES.md)
(thanks @fridiculous) - Links to telemetry docs fixed (thanks @Abbe98)
--path-prefix
can now be supplied via Helm chart (thanks @mpicard)
Documentation
- New machine learning pipeline with Dagster guide
- New example of multi-asset conditional materialization
- New tutorial section about scheduling
- New images on the Dagster README
All Changes
See All Contributors
- f361ef7 -
[refactor] delete Materialization (#13030)
by @smackesey - 7268f46 -
Add in progress subsets to the partition cache (#13045)
by @johannkm - 0032e2c -
Add multipartitioned assets with dynamic dimension to toys (#13061)
by @clairelin135 - 89c4ed1 -
add docs example for multi-asset conditional materialization (#13054)
by @sryza - 2bd9a12 -
Add docs for source asset observation jobs/schedules (#13062)
by @smackesey - 971010b -
Revert "Add in progress subsets to the partition cache (#13045)"
by @johannkm - 40a569a -
[asset-reconciliation][bug] Fix issue where overly-aggressive runs would be kicked off. (#13069)
by @OwenKephart - 2edf1ee -
tweaks to cross-repo-assets toy (#12973)
by @sryza - 2addaa5 -
Show unauthorized error graphql error message (#13064)
by @salazarm - ece6c4d -
[dagster-io/ui] Make Suggest component a bit more flexible (#13056)
by @hellendag - d8e98d9 -
Fix disabled state for launchpad button submenu (#13078)
by @salazarm - 296dabb -
Re-enable in progress subsets in the partition cache (#13082)
by @johannkm - dce7bc6 -
Add dynamic partitions name resolver to dimension type (#13070)
by @clairelin135 - 60b5e20 -
asset sensor test docs (#13065)
by @prha - 9cafa85 -
Fix issue where backfill fails when gRPC server is replaced mid-backfill (#13085)
by @gibsondan - d86e3a5 -
Use instance from sensor/schedule context to instantiate resources, delay until accessed (#13041)
by @benpankow - e20af6b -
Add materializing subset to asset gql (#13046)
by @johannkm - 90a92bf -
feat(helm): add path-prefix to dagit command (#13080)
by @mpicard - 160f3ec -
Use dynamic partition definition name for dimension of multipartition definition (#13090)
by @salazarm - e86a59c -
[typing/static] Fix @repository decorator typing (#12295)
by @smackesey - 61cc090 -
[instance] make local artifact directory scheme thread safe (#13043)
by @alangenfeld - d1e75d0 -
1.2.3 changelog (#13094)
by @jamiedemaria - 5fd2bb6 -
Ensure pyright venvs use statically legible editable installs (#13089)
by @smackesey - e780948 -
[docs] - Remove finished code from dbt tutorial template (#13091)
by @erinkcochran87 - 961a9f8 -
[ui] Upgrade react-markdown (#13092)
by @hellendag - 1c60da6 -
Fix submitting backfills synchronously from graphql (#13093)
by @gibsondan - dfbabb4 -
Test get and set serialized_in_progress_partition_subset (#13063)
by @johannkm - 6dc4e0c -
ExitStack.pop_all -> close (#13050)
by @alangenfeld - 299cd81 -
Automation: versioned docs for 1.2.3
by @elementl-devtools - 804e113 -
Fraser/rework readme (#12565)
by @frasermarlow - ed8bc7b -
[ui] Use DefaultLogLevels when there is no level state stored (#13109)
by @hellendag - 5646656 -
Set stdin to DEVNULL when opening dagster subprocesses (#13099)
by @gibsondan - 6ff1cd9 -
add a vercel github action to build docs/storybook previews (#13052)
by @prha - d5db43f -
[refactor] Remove
frozen{list,dict,tags}classes (#12293)
by @smackesey - bd4408b -
Docs for setting up Gitlab CI, branch deployment guide (#12998)
by @prha - bb0601a -
Add assets def to op context (#13088)
by @clairelin135 - b73a36e -
make AssetsDefinition.to_source_assets public (#13073)
by @sryza - 0b745c3 -
[docs] New ML pipeline guide PR (#13100)
by @odette-elementl - 0d2c1a7 -
[freshness-refactor][3/n] Update methods on the CachingDataTimeResolver to work with scalar data time (#12906)
by @OwenKephart - f18cede -
Fixing refs to images in the README (#13126)
by @tacastillo - d7e906e -
restrict vercel builds based on paths (#13129)
by @prha - f707541 -
fix missing snapshots (#13134)
by @OwenKephart - 7f9d8f4 -
Telemetry for dynamic partitions (#12605)
by @clairelin135 - a5c572b -
fix missing snapshots (again) (#13136)
by @OwenKephart - fd195cc -
[freshness-refactor][4/n] Simplify scheduling algorithm (#13019)
by @OwenKephart - 3d6f822 -
Deprecate environment_vars argument to
ScheduleDefinition,
@schedule(#13044)
by @smackesey - 157f80e -
[refactor] delete hourly/daily/weekly/monthly schedule decorators, PartitionScheduleDefinition, build_schedule_from_partition (#13006)
by @smackesey - 1d0a09e -
[refactor] simplify dependency dict typing (#12521)
by @smackesey - f3af63b -
celery-k8s executor: handle stdout that's valid json but not a dagster event (#13143)
by @johannkm - 3e41d51 -
fix typo in CHANGES.md (#13140)
by @fridiculous - 009573b -
Fix branch deployment docs (#13131)
by @dpeng817 - b6bd87a -
Update multiple agents docs (#13135)
by @dpeng817 - 1e5d825 -
add toy for eager asset reconciliation (#13066)
by @sryza - cddf96a -
trying again by moving the images to the same directory as the readme (#13127)
by @tacastillo - 4f0dddd -
[tech][templates] moving the scaffold project's asset loader outside of the defs (#13103)
by @tacastillo - bb470bc -
[docs][tutorial-revamp] Adding a section for scheduling to the tutorial (#13101)
by @tacastillo - 3720e26 -
replacing corrupt .png image (#13157)
by @frasermarlow - e6d28a6 -
[dagster-dbt] Fix bug when calculating transitive dependencies (#13128)
by @OwenKephart - 38059c4 -
deploy storybook to prod when landing pushes on master (#13159)
by @prha - 8d8d30a -
[dagster-azure] fix: AttributeError: 'coroutine' object has no attribute 'token' (#13110)
by @mpicard - 42131b0 -
Add support for more asset tags (#13153)
by @braunjj - c3fcacd -
[caching-refactor] Remove use of get_and_update_asset_status_cache. (#13151)
by @OwenKephart - 7ba99aa -
[asset-reconciliation][perf] Cache common properties on the TimeWindowPartitionsDefinition (#12981)
by @OwenKephart - 06e6ba5 -
MultiPartitionMapping (#12950)
by @clairelin135 - 602eeb7 -
[asset-reconciliation] Fix issue with duplicate runs for partitions with in-progress materializations (#13130)
by @OwenKephart - 07f6569 - `u...
1.2.3 (core) / 0.18.3 (libraries)
- Jobs defined via
define_asset_job
now auto-infer their partitions definitions if not explicitly defined. - Observable source assets can now be run as part of a job via
define_asset_job
. This allows putting them on a schedule/sensor. - Added an
instance
property to theHookContext
object that is passed into Op Hook functions, which can be used to access the currentDagsterInstance
object for the hook. - (experimental) Dynamic partitions definitions can now exist as dimensions of multi-partitions definitions.
- [dagster-pandas] New
create_table_schema_metadata_from_dataframe
function to generate aTableSchemaMetadataValue
from a Pandas DataFrame. Thanks @AndyBys! - [dagster-airflow] New option for setting
dag_run
configuration on the integration’s database resources. - [ui] The asset partitions page now links to the most recent failed or in-progress run for the selected partition.
- [ui] Asset descriptions have been moved to the top in the asset sidebar.
- [ui] Log filter switches have been consolidated into a single control, and selected log levels will be persisted locally so that the same selections are used by default when viewing a run.
- [ui] You can now customize the hour formatting in timestamp display: 12-hour, 24-hour, or automatic (based on your browser locale). This option can be found in User Settings.
Bugfixes
- In certain situations a few of the first partitions displayed as “unpartitioned” in the health bar despite being materialized. This has now been fixed, but users may need to run
dagster asset wipe-partitions-status-cache
to see the partitions displayed. - Starting
1.1.18
, users with a gRPC server that could not access the Dagster instance on user code deployments would see an error when launching backfills as the instance could not instantiate. This has been fixed. - Previously, incorrect partition status counts would display for static partitions definitions with duplicate keys. This has been fixed.
- In some situations, having SourceAssets could prevent the
build_asset_reconciliation_sensor
from kicking off runs of downstream assets. This has been fixed. - The
build_asset_reconciliation_sensor
is now much more performant in cases where unpartitioned assets are upstream or downstream of static-partitioned assets with a large number of partitions. - [dagster-airflow] Fixed an issue were the persistent Airflow DB resource required the user to set the correct Airflow database URI environment variable.
- [dagster-celery-k8s] Fixed an issue where run monitoring failed when setting the
jobNamespace
field in the Dagster Helm chart when using theCeleryK8sRunLauncher
. - [ui] Filtering on the asset partitions page no longer results in keys being presented out of order in the left sidebar in some scenarios.
- [ui] Launching an asset backfill outside an asset job page now supports partition mapping, even if your selection shares a partition space.
- [ui] In the run timeline, date/time display at the top of the timeline was sometimes broken for users not using the
en-US
browser locale. This has been fixed.
1.2.2 (core) / 0.18.2 (libraries)
New
-
Dagster is now tested on Python 3.11.
-
Users can now opt in to have resources provided to
Definitions
bind to their jobs. Opt in by wrapping your job definitions inBindResourcesToJobs
. This will become the default behavior in the future.@op(required_resource_keys={"foo") def my_op(context) print(context.foo) @job def my_job(): my_op() defs = Definitions( jobs=BindResourcesToJobs([my_job]) resources={"foo": foo_resource}
-
Added
dagster asset list
anddagster asset materialize
commands to Dagster’s command line interface, for listing and materializing software-defined assets. -
build_schedule_from_partitioned_job
now accepts jobs partitioned with aMultiPartitionsDefinition
that have a time-partitioned dimension. -
Added
SpecificPartitionsPartitionMapping
, which allows an asset, or all partitions of an asset, to depend on a specific subset of the partitions in an upstream asset. -
load_asset_value
now supportsSourceAsset
s. -
[ui] Ctrl+K has been added as a keyboard shortcut to open global search.
-
[ui] In the run logs table, the timestamp column has been moved to the far left, which will hopefully allow for better visual alignment with op names and tags.
-
[dagster-dbt] A new
node_info_to_definition_metadata_fn
toload_assets_from_dbt_project
andload_assets_from_dbt_manifest
allows custom metadata to be attached to the asset definitions generated from these methods. -
[dagster-celery-k8s] The Kubernetes namespace that runs using the
CeleryK8sRunLauncher
are launched in can now be configured by setting thejobNamespace
field in the Dagster Helm chart underceleryK8sRunLauncherConfig
. -
[dagster-gcp] The BigQuery I/O manager now accepts
timeout
configuration. Currently, this configuration will only be applied when working with Pandas DataFrames, and will set the number of seconds to wait for a request before using a retry. -
[dagster-gcp] [dagster-snowflake] [dagster-duckdb] The BigQuery, Snowflake, and DuckDB I/O managers now support self-dependent assets. When a partitioned asset depends on a prior partition of itself, the I/O managers will now load that partition as a DataFrame. For the first partition in the dependency sequence, an empty DataFrame will be returned.
-
[dagster-k8s]
k8s_job_op
now supports running Kubernetes jobs with more than one pod (Thanks @Taadas).
Bugfixes
- Fixed a bug that causes backfill tags that users set in the UI to not be included on the backfill runs, when launching an asset backfill.
- Fixed a bug that prevented resume from failure re-execution for jobs that contained assets and dynamic graphs.
- Fixed an issue where the asset reconciliation sensor would issue run requests for assets that were targeted by an active asset backfill, resulting in duplicate runs.
- Fixed an issue where the asset reconciliation sensor could issue runs more frequently than necessary for assets with FreshnessPolicies having intervals longer than 12 hours.
- Fixed an issue where
AssetValueLoader.load_asset_value()
didn’t load transitive resource dependencies correctly. - Fixed an issue where constructing a
RunConfig
object with optional config arguments would lead to an error. - Fixed the type annotation on
ScheduleEvaluationContext.scheduled_execution_time
to not beOptional
. - Fixed the type annotation on
OpExecutionContext.partition_time_window
****(thanks @elben10). InputContext.upstream_output.log
is no longerNone
when loading a source asset.- Pydantic type constraints are now supported by the Pythonic config API.
- An input resolution bug that occurred in certain conditions when composing graphs with same named ops has been fixed.
- Invoking an op with collisions between positional args and keyword args now throws an exception.
async def
ops are now invoked withasyncio.run
.TimeWindowPartitionDefinition
now throws an error at definition time when passed an invalid cron schedule instead of at runtime.- [ui] Previously, using dynamic partitions with assets that required config would raise an error in the launchpad. This has been fixed.
- [dagster-dbt] Previously, setting a
cron_schedule_timezone
inside of the config for a dbt model would not result in that property being set on the generatedFreshnessPolicy
. This has been fixed. - [dagster-gcp] Added a fallback download url for the
GCSComputeLogManager
when the session does not have permissions to generate signed urls. - [dagster-snowflake] In a previous release, functionality was added for the Snowflake I/O manager to attempt to create a schema if it did not already exist. This caused an issue when the schema already existed but the account did not have permission to create the schema. We now check if a schema exists before attempting to create it so that accounts with restricted permissions do not error, but schemas can still be created if they do not exist.
Breaking Changes
validate_run_config
no longer acceptspipeline_def
ormode
arguments. These arguments refer to legacy concepts that were removed in Dagster 1.0, and since then there have been no valid values for them.
Experimental
-
Added experimental support for resource requirements in sensors and schedules. Resources can be specified using
required_resource_keys
and accessed through the context or specified as parameters:@sensor(job=my_job, required_resource_keys={"my_resource"}) def my_sensor(context): files_to_process = context.my_resource.get_files() ... @sensor(job=my_job) def my_sensor(context, my_resource: MyResource): files_to_process = my_resource.get_files() ...
Documentation
- Added a page on asset selection syntax to the Concepts documentation.
All Changes
See All Contributors
- fb5de00 -
[dagster-gcp-pandas] add timeout config (#12637)
by @jamiedemaria - b91626c -
[docs][tutorial-revamp] Basic tutorial revamp parts 1 through 4 (#12509)
by @tacastillo - f49ec38 -
feat(dbt): add support for
--debug(#12722)
by @rexledesma - 5753806 -
Default useful Dagster helm chart features to on (#12737)
by @gibsondan - 589df01 -
[For 1.2] Allow both protobuf 3 and 4 in dagster (#12466)
by @gibsondan - 953e377 -
[toys repo] Export partitioned assets toys (#12733)
by @salazarm - c9ace2c -
[pythonic config] Allow using 'resource_defs' with resource args in assets (#12679)
by @benpankow - bfc222c -
[refactor] Remove symbols deprecated until 1.2 (#12360)
by @smackesey - 5321cfc -
[dagster-snowflake] fix inconsistencies in snowflake resource (#12633)
by @jamiedemaria - cce3edf -
[ui] Update permissions for launching jobs and materializing assets (#12681)
by @hellendag - 2636f63 -
[For 1.2] Change default run monitoring settings (#11512)
by @gibsondan - 8009dc8 -
[For 1.2] Don't include run/job tags in k8s_job_ops k8s config computations (#12345)
by @gibsondan - 5655fc4 -
Remove Partitioned Schedules from docs, fix dagit staleStatusCauses mocks (#12742)
by @smackesey - aecc22c -
[Sensor Testing] Add "Test again" button (#12735)
by @salazarm - 3e06e5f -
[1.2.0] [refactor] Delete DagsterTypeMaterializer (#12516)
by @smackesey - 2431f5e -
change DynamicPartitionsDefinition.__repr__ (#12754)
by @sryza - a5fdaa8 -
Recommended Project Structure Guide (#12656)
by @odette-elementl - cb54b6f -
[pythonic config] Add test showcasing use of Pydantic validators (#12536)
by @benpankow - e10b0b3 -
fix serialization of TableRecord (#10731)
by @sryza - 3915c7a -
[dagster-gcp-pandas] revert flaky test (#12748)
by @jamiedemaria - 6322734 -
[typing/static] PartitionsDefinition covariant type var (#12284)
by @smackesey - b67f436 -
[rename] LogicalVersion -> DataVersion (#12500)
by @smackesey - aac63ae -
[rename] LogicalVersion -> DataVersion in gql/dagit (#12501)
by @smackesey - f10592f -
[rename] Rename LogicalVersion -> DataVersion in docs (#12503)
by @smackesey - ad9d771 -
[dagster-airflow] add
make_persistent_airflow_db_resource(#12305)
by @Ramshackle-Jamathon - 54cbe5f -
[ui] Remove usePermissionsDEPRECATED (#12771)
by @hellendag - cb8b44c -
[Asset Details Page] Auto-select partition in materialization dialog. (#12734)
by @salazarm - 204fec1 -
feat(databricks)!: remove
create_databricks_job_op(#12600)
by @rexledesma - 383307e -
Make permissions_for_location require keyword args (#12774)
by @gibsondan - 8d696fa -
Remove DynamicPartitionsDefinitions API methods (#12744)
by @clairelin135 - a5e9f77 -
Set default memory/CPU for new ECS tasks based on runtime platform (#12767)
by @gibsondan - a428709 -
cron_schedule validation for time window partitions (#12761)
by @prha - 0181a64 -
update some guide titles and descriptions (#12775)
by @sryza - 909a7a7 -
add primary keys to all run tables (#12711)
by @prha - e13e20c -
document PartitionMappings in partitions concepts page (#12768)
by @sryza - 6531476 -
make sure kv/daemon_heartbeat queries explicitly enumerate columns (#12789)
by @prha - fe6f7c4 -
update API docs for freshness policies (#12788)
by @OwenKephart - 6bcec7d -
[apidoc] asset_selection repository -> Definitions (#11303)
by @yuhan - b2ba3d5 -
Fix typing on mem_io_manager (#12791)
by @dpeng817 - 28821f1 -
[docs] add note about biquery temp tables (#12747)
by @jamiedemaria - c2c365d -
Remove deprecated MetadataEntry constructors, update
entry_datainternal refs (#12724)
by @smackesey - 2f1500e -
[dagster-airflow] persistent db docs (#12485)
by @Ramshackle-Jamathon - 3f8b46d - `[pythonic resources][fix] Fix inheriting attributes when extending Pythonic resources (#12...
1.2.1 (core) / 0.18.1 (libraries)
Bugfixes
- Fixed a bug with postgres storage where daemon heartbeats were failing on instances that had not been migrated with
dagster instance migrate
after upgrading to1.2.0
.
1.2.0 (core) / 0.18.0 (libraries)
Major Changes since 1.1.0 (core) / 0.17.0 (libraries)
Core
- Added a new
dagster dev
command that can be used to run both Dagit and the Dagster daemon in the same process during local development. [docs] - Config and Resources
- Introduced new Pydantic-based APIs to make defining and using config and resources easier (experimental). [Github discussion]
- Repository > Definitions [docs]
- Declarative scheduling
- The asset reconciliation sensor is now 100x more performant in many situations, meaning that it can handle more assets and more partitions.
- You can now set freshness policies on time-partitioned assets.
- You can now hover over a stale asset to learn why that asset is considered stale.
- Partitions
DynamicPartitionsDefinition
allows partitioning assets dynamically - you can add and remove partitions without reloading your definitions (experimental). [docs]- The asset graph in the UI now displays the number of materialized, missing, and failed partitions for each partitioned asset.
- Asset partitions can now depend on earlier time partitions of the same asset. Backfills and the asset reconciliation sensor respect these dependencies when requesting runs [example].
TimeWindowPartitionMapping
now acceptsstart_offset
andend_offset
arguments that allow specifying that time partitions depend on earlier or later time partitions of upstream assets [docs].
- Backfills
- Dagster now allows backfills that target assets with different partitions, such as a daily asset which rolls up into a weekly asset, as long as the root assets in the selection are partitioned in the same way.
- You can now choose to pass a range of asset partitions to a single run rather than launching a backfill with a run per partition [instructions].
Integrations
- Weights and Biases - A new integration
dagster-wandb
with Weights & Biases allows you to orchestrate your MLOps pipelines and maintain ML assets with Dagster. [docs] - Snowflake + PySpark - A new integration
dagster-snowflake-pyspark
allows you to store and load PySpark DataFrames as Snowflake tables using thesnowflake_pyspark_io_manager
. [docs] - Google BigQuery - A new BigQuery I/O manager and new integrations
dagster-gcp-pandas
anddagster-gcp-pyspark
allow you to store and load Pandas and PySpark DataFrames as BigQuery tables using thebigquery_pandas_io_manager
andbigquery_pyspark_io_manager
. [docs] - Airflow The
dagster-airflow
integration library was bumped to 1.x.x, with that major bump the library has been refocused on enabling migration from Airflow to Dagster. Refer to the docs for an in-depth migration guide. - Databricks - Changes:
- Added op factories to create ops for running existing Databricks jobs (
create_databricks_run_now_op
), as well as submitting one-off Databricks jobs (create_databricks_submit_run_op
). - Added a new Databricks guide.
- The previous
create_databricks_job_op
op factory is now deprecated.
- Added op factories to create ops for running existing Databricks jobs (
Docs
- Automating pipelines guide - Check out the best practices for automating your Dagster data pipelines with this new guide. Learn when to use different Dagster tools, such as schedules and sensors, using this guide and its included cheatsheet.
- Structuring your Dagster project guide - Need some help structuring your Dagster project? Learn about our recommendations for getting started and scaling sustainably.
- Tutorial revamp - Goodbye cereals and hello HackerNews! We’ve overhauled our intro to assets tutorial to not only focus on a more realistic example, but to touch on more Dagster concepts as you build your first end-to-end pipeline in Dagster. Check it out here.
Stay tuned, as this is only the first part of the overhaul. We’ll be adding more chapters - including automating materializations, using resources, using I/O managers, and more - in the next few weeks.
Since 1.1.21 (core) / 0.17.21 (libraries)
New
- Freshness policies can now be assigned to assets constructed with
@graph_asset
and@graph_multi_asset
. - The
project_fully_featured
example now uses the built in DuckDB and Snowflake I/O managers. - A new “failed” state on asset partitions makes it more clear which partitions did not materialize successfully. The number of failed partitions is shown on the asset graph and a new red state appears on asset health bars and status dots.
- Hovering over “Stale” asset tags in the Dagster UI now explains why the annotated assets are stale. Reasons can include more recent upstream data, changes to code versions, and more.
- [dagster-airflow] support for persisting airflow db state has been added with
make_persistent_airflow_db_resource
this enables support for Airflow features like pools and cross-dagrun state sharing. In particular retry-from-failure now works for jobs generated from Airflow DAGs. - [dagster-gcp-pandas] The
BigQueryPandasTypeHandler
now usesgoogle.bigquery.Client
methodsload_table_from_dataframe
andquery
rather than thepandas_gbq
library to store and fetch DataFrames. - [dagster-k8s] The Dagster Helm chart now only overrides
args
instead of bothcommand
andargs
for user code deployments, allowing to include a custom ENTRYPOINT in your the Dockerfile that loads your code. - The
protobuf<4
pin in Dagster has been removed. Installing either protobuf 3 or protobuf 4 will both work with Dagster. - [dagster-fivetran] Added the ability to specify op_tags to build_fivetran_assets (thanks @Sedosa!)
@graph_asset
and@graph_multi_asset
now support passing metadata (thanks @askvinni)!
Bugfixes
- Fixed a bug that caused descriptions supplied to
@graph_asset
and@graph_multi_asset
to be ignored. - Fixed a bug that serialization errors occurred when using
TableRecord
. - Fixed an issue where partitions definitions passed to
@multi_asset
and other functions would register as type errors for mypy and other static analyzers. - [dagster-aws] Fixed an issue where the EcsRunLauncher failed to launch runs for Windows tasks.
- [dagster-airflow] Fixed an issue where pendulum timezone strings for Airflow DAG
start_date
would not be converted correctly causing runs to fail. - [dagster-airbyte] Fixed an issue when attaching I/O managers to Airbyte assets would result in errors.
- [dagster-fivetran] Fixed an issue when attaching I/O managers to Fivetran assets would result in errors.
Database migration
- Optional database schema migrations, which can be run via
dagster instance migrate
:- Improves Dagit performance by adding a database index which should speed up job run views.
- Enables dynamic partitions definitions by creating a database table to store partition keys. This feature is experimental and may require future migrations.
- Adds a primary key
id
column to thekvs
,daemon_heartbeats
andinstance_info
tables, enforcing that all tables have a primary key.
Breaking Changes
- The minimum
grpcio
version supported by Dagster has been increased to 1.44.0 so that Dagster can support bothprotobuf
3 andprotobuf
4. Similarly, the minimumprotobuf
version supported by Dagster has been increased to 3.20.0. We are working closely with the gRPC team on resolving the upstream issues keeping the upper-boundgrpcio
pin in place in Dagster, and hope to be able to remove it very soon. - Prior to 0.9.19, asset keys were serialized in a legacy format. This release removes support for querying asset events serialized with this legacy format. Contact #dagster-support for tooling to migrate legacy events to the supported version. Users who began using assets after 0.9.19 will not be affected by this change.
- [dagster-snowflake] The
execute_query
andexecute_queries
methods of theSnowflakeResource
now have consistent behavior based on the values of thefetch_results
anduse_pandas_result
parameters. Iffetch_results
is True, the standard Snowflake result will be returned. Iffetch_results
anduse_pandas_result
are True, a pandas DataFrame will be returned. Iffetch_results
is False anduse_pandas_result
is True, an error will be raised. If both are False, no result will be returned. - [dagster-snowflake] The
execute_queries
command now returns a list of DataFrames whenuse_pandas_result
is True, rather than appending the results of each query to a single DataFrame. - [dagster-shell] The default behavior of the
execute
andexecute_shell_command
functions is now to include any environment variables in the calling op. To restore the previous behavior, you can pass inenv={}
to these functions. - [dagster-k8s] Several Dag...
1.1.21 (core) / 0.17.21 (libraries)
New
- Further performance improvements for
build_asset_reconciliation_sensor
. - Dagster now allows you to backfill asset selections that include mapped partition definitions, such as a daily asset which rolls up into a weekly asset, as long as the root assets in your selection share a partition definition.
- Dagit now includes information about the cause of an asset’s staleness.
- Improved the error message for non-matching cron schedules in
TimeWindowPartitionMapping
s with offsets. (Thanks Sean Han!) - [dagster-aws] The EcsRunLauncher now allows you to configure the
runtimePlatform
field for the task definitions of the runs that it launches, allowing it to launch runs using Windows Docker images. - [dagster-azure] Add support for DefaultAzureCredential for adls2_resource (Thanks Martin Picard!)
- [dagster-databricks] Added op factories to create ops for running existing Databricks jobs (
create_databricks_run_now_op
), as well as submitting one-off Databricks jobs (create_databricks_submit_run_op
). See the new Databricks guide for more details. - [dagster-duckdb-polars] Added a dagster-duckdb-polars library that includes a
DuckDBPolarsTypeHandler
for use withbuild_duckdb_io_manager
, which allows loading / storing Polars DataFrames from/to DuckDB. (Thanks Pezhman Zarabadi-Poor!) - [dagster-gcp-pyspark] New PySpark TypeHandler for the BigQuery I/O manager. Store and load your PySpark DataFrames in BigQuery using
bigquery_pyspark_io_manager
. - [dagster-snowflake] [dagster-duckdb] The Snowflake and DuckDB IO managers can now load multiple partitions in a single step - e.g. when a non-partitioned asset depends on a partitioned asset or a single partition of an asset depends on multiple partitions of an upstream asset. Loading occurs using a single SQL query and returns a single
DataFrame
.
Bugfixes
- Previously, if an
AssetSelection
which matched no assets was passed intodefine_asset_job
, the resulting job would target all assets in the repository. This has been fixed. - Fixed a bug that caused the UI to show an error if you tried to preview a future schedule tick for a schedule built using
build_schedule_from_partitioned_job
. - When a non-partitioned non-asset job has an input that comes from a partitioned SourceAsset, we now load all partitions of that asset.
- Updated the
fs_io_manager
to store multipartitioned materializations in directory levels by dimension. This resolves a bug on windows where multipartitioned materializations could not be stored with thefs_io_manager
. - Schedules and sensors previously timed out when attempting to yield many multipartitioned run requests. This has been fixed.
- Fixed a bug where
context.partition_key
would raise an error when executing on a partition range within a single run via Dagit. - Fixed a bug that caused the default IO manager to incorrectly raise type errors in some situations with partitioned inputs.
- [ui] Fixed a bug where partition health would fail to display for certain time window partitions definitions with positive offsets.
- [ui] Always show the “Reload all” button on the code locations list page, to avoid an issue where the button was not available when adding a second location.
- [ui] Fixed a bug where users running multiple replicas of dagit would see repeated
Definitions reloaded
messages on fresh page loads. - [ui] The asset graph now shows only the last path component of linked assets for better readability.
- [ui] The op metadata panel now longer capitalizes metadata keys
- [ui] The asset partitions page, asset sidebar and materialization dialog are significantly smoother when viewing assets with a large number of partitions (100k+)
- [dagster-gcp-pandas] The Pandas TypeHandler for BigQuery now respects user provided
location
information. - [dagster-snowflake]
ProgrammingError
was imported from the wrong library, this has been fixed. Thanks @herbert-allium!
Experimental
- You can now set an explicit logical version on
Output
objects rather than using Dagster’s auto-generated versions. - New
get_asset_provenance
method onOpExecutionContext
allows fetching logical version provenance for an arbitrary asset key. - [ui] - you can now create dynamic partitions from the partition selection UI when materializing a dynamically partitioned asset
Documentation
- Added an example of how to use dynamic asset partitions - in the
examples/assets_dynamic_partitions
folder - New tutorial for using the BigQuery I/O manager.
- New reference page for BigQuery I/O manager features.
- New automating data pipelines guide
All Changes
See All Contributors
- 4343d59 -
dagster-census api docs (#12413)
by @yuhan - 24b7e9b -
graph_asset and graph_multi_asset decorators (#10152)
by @sryza - aa29161 -
[dagster-snowflake-pyspark] fix bug loading partitions (#12472)
by @jamiedemaria - 2197dec -
add graphql fields for querying run tags (#12409)
by @prha - 8889b48 -
Add stale status causes (#11953)
by @smackesey - 323cdc8 -
fix (#12477)
by @salazarm - 8e21900 -
Update Contributing doc with instructions for ruff/pyright (#12481)
by @smackesey - c472edb -
[bigquery] mark bigquery io manager experimental (#12479)
by @jamiedemaria - f2084d7 -
add partial tag autocomplete for run filter input (#12410)
by @prha - 93e7cc1 -
Support env valueFrom in Helm chart (#12425)
by @johannkm - 6601156 -
Update GQL to expose StaleStatus and StaleStatusCause (#11952)
by @smackesey - 0ffe4a7 -
remove timestamp comparisons of code location entries to reduce OSS dagit replica spam (#12407)
by @prha - 6af6f55 -
Fix state status logical version test (#12484)
by @smackesey - a15d965 -
fix ruff (#12486)
by @alangenfeld - 2c97adf -
use opt_nullable_mapping for dagster library versions (#12487)
by @alangenfeld - 965152d -
clarify error when op is missing argument for In (#12456)
by @sryza - 361b0ee -
Remove existing RunConfig class (#12488)
by @benpankow - 75b6a5c -
[pythonic resources] Clean up initialization of env vars, treat resource objects as immutable (#12445)
by @benpankow - e3c8825 -
[structured config] Add support for Selectors w/ pydantic discriminated unions (#11280)
by @benpankow - 2d43d39 -
Allow setting logical version inside op (#12189)
by @smackesey - 39f525a -
Replace usages of
nslookupwith
ncfor user deployments (#11033)
by @michaeljguarino - 2492317 -
Revert "Replace usages of
nslookupwith
ncfor user deployments (#11033)"
by @johannkm - b7fae37 -
[pythonic resources] Last set of class renames (#12490)
by @benpankow - 763283f -
[dagster-azure] Add support for DefaultAzureCredential for adls2_resource (#11309)
by @mpicard - 1d1f5ce -
Add example of customizing task role and execution role arn to the ECS agent docs (#12491)
by @gibsondan - c4e3e87 -
add dagster-duckdb-polars library (#12197)
by @pzarabadip - 3a05583 -
[draft][pythonic config][docs] Introduce intro to Resources doc utilizing Pythonic resources (#12260)
by @benpankow - efed336 -
[pythonic config] Add structured RunConfig object for specifying runtime, job config (#11965)
by @benpankow - 2f4a0e5 -
[draft][pythonic config][docs] Introduce intro to Config doc utilizing Pythonic config (#12349)
by @benpankow - 4439129 -
1.1.20 changelog (#12506)
by @benpankow - 3dc1233 -
refactor(databricks): lift polling methods up to the client (#12382)
by @rexledesma - edac939 -
[fix] fix sphinx airflow version parsing (#12507)
by @benpankow - 215ae70 -
Automation: versioned docs for 1.1.20
by @elementl-devtools - da35211 -
[dagit] Expose range-based asset health, use it for partition status rendering (#12302)
by @bengotow - e3fd740 -
[dagit] Use range-based asset health for asset partitions / job partitions pages (#12434)
by @bengotow - 63b4273 -
Fix 1.1.20 changelog codeblock (#12525)
by @johannkm - 2b1ad7c -
[dagit] Delete generated GraphQL types before regenerating (#12518)
by @hellendag - 024db65 -
[refactor] Delete build_solid_context (#12513)
by @smackesey - 18e7eb6 -
[typing/static] serdes (#12522)
by @smackesey - 3e30026 -
[refactor] make ResolvedRunConfig.to_dict use "ops" (#12514)
by @smackesey - 7e34083 -
Improve multipartition performance for
get_partition(#12431)
by @clairelin135 - e798edf -
feat(databricks): override user agent in resource (#12526)
by @rexledesma - 7036fca -
Bump typing-extensions dep to >=4.4.0 (#12529)
by @smackesey - 781a8ea -
[docs] Snowflake reference page fixes (#12455)
by @jamiedemaria - 2cc338b -
[bugfix] Make assets downstream of partitions never stale in dagit (#12528)
by @smackesey - 65e92a1 -
Enable internal testing for writing asset cached status data (#12497)
by @clairelin135 - 281ec4b -
[typing] fix typing in daemon tests (#12475)
by @dpeng817 - f596b2f -
dynamic partitions toy (#12533)
by @sryza - 7875f44 -
pare down multi-partition runtime type checking in upath IO manager (#12508)
by @sryza - 102c0bc -
[dagit] Upgrade to Jest 29, allow more time for coverage collection (#12534)
by @bengotow - a3b7dfc -
Add description to invariant check (#12496)
by @CodeMySky - e00ba72 -
[fix] Correctly resolve asset jobs with empty selections (#12531)
by @OwenKephart - 9d9b072 -
[dagit] Support backfills on partition-mapped asset selections (#12458)
by @bengotow - 4fc6e3e -
[dagit] Remove unnecessary usage of <TestProvider> (#12519)
by @bengotow - 31022e0 -
Fix multipartitions w/ fs_io_manager on windows (#12414)
by @clairelin135 - dd4ba07 - `Fix mas...
1.1.20 (core) / 0.17.20 (libraries)
New
-
The new
@graph_asset
and@graph_multi_asset
decorators make it more ergonomic to define graph-backed assets. -
Dagster will auto-infer dependency relationships between single-dimensionally partitioned assets and multipartitioned assets, when the single-dimensional partitions definition is a dimension of the
MultiPartitionsDefinition
. -
A new
Test sensor
/Test schedule
button that allows you to perform a dry-run of your sensor / schedule. Check out the docs on this functionality here for sensors and here for schedules. -
[dagit] Added (back) tag autocompletion in the runs filter, now with improved query performance.
-
[dagit] The Dagster libraries and their versions that were used when loading definitions can now be viewed in the actions menu for each code location.
-
New
bigquery_pandas_io_manager
can store and load Pandas dataframes in BigQuery. -
[dagster-snowflake, dagster-duckdb] SnowflakeIOManagers and DuckDBIOManagers can now default to loading inputs as a specified type if a type annotation does not exist for the input.
-
[dagster-dbt] Added the ability to use the “state:” selector
-
[dagster-k8s] The Helm chart now supports the full kubernetes env var spec for Dagit and the Daemon. E.g.
dagit: env: - name: “FOO” valueFrom: fieldRef: fieldPath: metadata.uid
Bugfixes
- Previously, graphs would fail to resolve an input with a custom type and an input manager key. This has been fixed.
- Fixes a bug where negative partition counts were displayed in the asset graph.
- Previously, when an asset sensor did not yield run requests, it returned an empty result. This has been updated to yield a meaningful message.
- Fix an issue with a non-partitioned asset downstream of a partitioned asset with self-dependencies causing a GQL error in dagit.
- [dagster-snowflake-pyspark] Fixed a bug where the PySparkTypeHandler was incorrectly loading partitioned data.
- [dagster-k8s] Fixed an issue where run monitoring sometimes failed to detect that the kubernetes job for a run had stopped, leaving the run hanging.
Documentation
- Updated contributor docs to reference our new toolchain (
ruff
,pyright
). - (experimental) Documentation for the dynamic partitions definition is now added.
- [dagster-snowflake] The Snowflake I/O Manager reference page now includes information on working with partitioned assets.
All Changes
See All Contributors
- c488fdb -
disable check_same_thread on in-memory sqlite storage (#12229)
by @alangenfeld - 370093d -
[direct invoke] yield implicit Nothing Output (#12309)
by @alangenfeld - 8a42a96 -
Fix multipartitions run length encoding error (#12329)
by @clairelin135 - 268ac07 -
[freshness-policies] Allow setting freshness policies when using graph-backed assets (#12357)
by @OwenKephart - baab234 -
Add skip reason to asset sensor (#12343)
by @OwenKephart - 49fb47f -
Fix partitions backfill deserialization error (#12238)
by @clairelin135 - 840fda0 -
Move CachingRepositoryData.from_list and from_dict into standalone function (#12321)
by @schrockn - 9732826 -
refactor(databricks): divest from databricks_api in favor of databricks-cli (#12153)
by @rexledesma - b573daa -
nullsafe array index access (#12362)
by @salazarm - 7962972 -
Change schedule button text (#12361)
by @dpeng817 - 17ea06c -
[dagit] add full serialized error to graphql errors (#12228)
by @alangenfeld - 5ebe64a -
[dagster-pandas][dagster-pandera] assign a typing_type for generated pandas dataframe DagsterTypes (#12363)
by @OwenKephart - 5238e6f -
[typing/static] Execution API types (#12330)
by @smackesey - 84aa559 -
[refactor] DependencyDefinition renames (#12338)
by @smackesey - 2779527 -
[refactor] execute_step renames (#12354)
by @smackesey - e81c9b1 -
[typing/runtime] Standardize StepInputSource.load_input_object (#12342)
by @smackesey - 915feb5 -
[refactor] Delete NodeInput.solid_name (#12339)
by @smackesey - 3fe5dcd -
[refactor] NodeDefiniton.iterate_solid_defs -> iterate_op_defs (#12336)
by @smackesey - 85226fd -
[refactor] GraphDefinition method renames (#12335)
by @smackesey - 0d62593 -
[2/n][structured config] Enable struct config resources, IO managers to depend on other resources (#11645)
by @benpankow - c8e4fb2 -
[refactor] local var/private arg solid -> node (#12337)
by @smackesey - ec70f8a -
DagsterLibraryRegistry (#12266)
by @alangenfeld - 47dc694 -
[refactor] misc core solid -> node renames (#12368)
by @smackesey - 15a5c59 -
[refactor] dagster._core.definitions.solid_container -> node_container (#12369)
by @smackesey - ef8da99 -
[refactor] Assorted local var solid -> node (#12370)
by @smackesey - cf847f4 -
Fix intermittent dynamic partitions table SQLite concurrency error (#12367)
by @clairelin135 - 7f398b5 -
change storage signature for run tags (#12348)
by @prha - 447f931 -
1.1.19 Changelog (#12378)
by @OwenKephart - 85c1ac5 -
guide to how assets relate to ops and graphs (#12204)
by @sryza - 39500d8 -
[pythonic config] Rename pythonic config classes (#12235)
by @benpankow - 8783d2f -
updates tests to handle new kubernetes resources field (#12395)
by @alangenfeld - b592861 -
[structured config] Migrate resources from project-fully-featured to struct config resources (#11785)
by @benpankow - fbd6a8f -
refactor(databricks): add types to databricks.py (#12364)
by @rexledesma - cc6ddf9 -
refactor(databricks): consolidate types (#12366)
by @rexledesma - d9f0bda -
add dagster_libraries to ListRepositoriesResponse (#12267)
by @alangenfeld - 18cc0c1 -
[graphql] add RepositoryLocation.dagsterLibraryVersions (#12268)
by @alangenfeld - b31c14f -
[dagit] add dagster libraries menu to code location row (#12315)
by @alangenfeld - 19dac72 -
1.1.19 changelog: reorder code block (#12402)
by @yuhan - c251806 -
refactor(databricks): use databricks_cli's raw api client (#12377)
by @rexledesma - 532ced5 -
[docs] [snowflake] Add partitions to snowflake guide (#12231)
by @jamiedemaria - cf0779b -
Add valid start time check to materialized time partitions subsets (#12403)
by @clairelin135 - 55ec34a -
Add api docs for some PartitionsDefinition and PartitionMapping classes (#12365)
by @sryza - 3fd1174 -
Add text to timestamp dropdown (#12379)
by @dpeng817 - 866a100 -
[refactor] IExecutionStep.solid_handle -> node_handle (#12371)
by @smackesey - 82901d3 -
[asset-reconciliation] Factor in more run statuses (#12412)
by @OwenKephart - c132dce -
[refactor] *ExecutionContext.solid_config -> op_config (#12372)
by @smackesey - fa3418e -
[refactor] ResolvedRunConfig.solids -> ops (#12373)
by @smackesey - 5f4cb11 -
Automation: versioned docs for 1.1.19
by @elementl-devtools - d109e89 -
[refactor] assorted Dagstermill renames (#12380)
by @smackesey - 4ce1f6f -
[refactor] Context
solidrenames (#12374)
by @smackesey - ded407f -
lambda_solid -> solid (#10816)
by @smackesey - a090efa -
[refactor] Assorted pipeline_run -> dagster_run (#12383)
by @smackesey - a1d5d14 -
Make a script to template out new dagster packages (#12389)
by @jamiedemaria - e660cd8 -
[library template] add registry call (#12418)
by @alangenfeld - 074ae45 -
[db io managers] connection refactor (#12258)
by @jamiedemaria - 4a04e26 -
Code location alerting docs (#12411)
by @dpeng817 - be7050e -
[refactor] remove @solid decorator (#10952)
by @smackesey - 9a3a8e2 -
[refactor] Delete PipelineRunsFilter (#12384)
by @smackesey - c39e007 -
[db io managers] add default_load_type (#12356)
by @jamiedemaria - 5eeab19 -
[refactor] Delete RunRecord.pipeline_run (#12385)
by @smackesey - 4ba803c -
[refactor] pipeline_run_from_storage -> dagster_run_from_storage (#12386)
by @smackesey - 1884193 -
[Docs RFC] Dynamic Partitions (#12227)
by @clairelin135 - 2e2eca1 -
Auto infer multipartition <-> single dimension mapping (#12400)
by @clairelin135 - 79f9ecf -
[refactor] execution pipeline_run -> dagster_run (#12388)
by @smackesey - e1d3579 -
[test-api-update] execution_tests/dynamic_tests (#12427)
by @smackesey - d48a889 -
Consider the run worker unhealthy is the job has no active pods but the run is in a non-terminal state (#11510)
by @gibsondan - 4474da1 -
fix: only inspect schema when we may create tables (#12269)
by @plaflamme - 61ed1c6 -
More helpful asset key mismatch errors (#12008)
by @benpankow - 7536b6f -
Add docs for testing schedules/sensors via UI (#12381)
by @dpeng817 - ad6f84f -
document missing breaking change in 1.1.19 changelog (#12424)
by @sryza - 636b58f -
BigQuery IO manager (#11425)
by @jamiedemaria - 8ea58dc -
[dagster-gcp-pandas] API docs fix (#12450)
by @jamiedemaria - 392ee40 -
[graphql] launch backfills over assets with different partitionings, if all roots have same partitioning (#11827)
by @sryza - 1783fad -
Fix resolution error with input manager key and custom dagster type (#12449)
by @clairelin135 - bf84d43 -
[dagster-dbt] Add ability to use the "state:" selector (#12432)
by @OwenKephart - 98bc51c -
Revert "More helpful asset key mismatch errors (#12008)" (#12459)
by @benpankow - 2aba792 -
[dagit] Add missing React keys to prevent new warning toasts (#12210)
by @bengotow - 49a07e6 -
[CustomConfirmationDialog] Allow overriding the button text (#12444)
by @salazarm - 8f5f31b -
add another todo to create_dagster_package (#12453)
by @jamiedemaria - 74f70c9 -
[dagster-gcp-pandas] register library in init (#12469)
by @jamiedemaria - 2fc07dc - `[bugfix] fix projected logical versi...