From e1d6278c9385b47f01d42d7d43235de434a1cf79 Mon Sep 17 00:00:00 2001 From: Alex Langenfeld Date: Thu, 22 Jun 2023 11:58:48 -0500 Subject: [PATCH] 1.3.11 changelog (#14907) Co-authored-by: Ben Pankow --- CHANGES.md | 138 ++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 94 insertions(+), 44 deletions(-) diff --git a/CHANGES.md b/CHANGES.md index 7c08cc6c15294..383eac64ab213 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -1,5 +1,57 @@ # Changelog +# 1.3.11 (core) / 0.19.11 (libraries) + +### New + +- Assets with lazy auto-materialize policies are no longer auto-materialized if they are missing but don’t need to be materialized in order to help downstream assets meet their freshness policies. +- [ui] The descriptions of auto-materialize policies in the UI now include their skip conditions along with their materialization conditions. +- [dagster-dbt] Customized asset keys can now be specified for nodes in the dbt project, using `meta.dagster.asset_key`. This field takes in a list of strings that are used as the components of the generated `AssetKey`. + +```yaml +version: 2 + +models: + - name: users + config: + meta: + dagster: + asset_key: ["my", "custom", "asset_key"] +``` + +- [dagster-dbt] Customized groups can now be specified for models in the dbt project, using `meta.dagster.group`. This field takes in a string that is used as the Dagster group for the generated software-defined asset corresponding to the dbt model. + +```yaml +version: 2 + +models: + - name: users + config: + meta: + dagster: + group: "my_group" +``` + +### Bugfixes + +- Fixed an issue where the `dagster-msteams` and `dagster-mlflow` packages could be installed with incompatible versions of the `dagster` package due to a missing pin. +- Fixed an issue where the `dagster-daemon run` command sometimes kept code server subprocesses open longer than it needed to, making the process use more memory. +- Previously, when using `@observable_source_asset`s with AutoMaterializePolicies, it was possible for downstream assets to get “stuck”, not getting materialized when other upstream assets changed, or for multiple down materializations to be kicked off in response to the same version being observed multiple times. This has been fixed. +- Fixed a case where the materialization count for partitioned assets could be wrong. +- Fixed an error which arose when trying to request resources within run failure sensors. +- [dagster-wandb] Fixed handling for multi-dimensional partitions. Thanks @chrishiste + +### Experimental + +- [dagster-dbt] improvements to `@dbt_assets` + - `project_dir` and `target_path` in `DbtCliTask` are converted from type `str` to type `pathlib.Path`. + - In the case that dbt logs are not emitted as json, the log will still be redirected to be printed in the Dagster compute logs, under `stdout`. + +### Documentation + +- Fixed a typo in dagster_aws S3 resources. Thanks @akan72 +- Fixed a typo in link on the Dagster Instance page. Thanks @PeterJCLaw + # 1.3.10 (core) / 0.19.10 (libraries) ### New @@ -16,9 +68,9 @@ models: config: dagster_freshness_policy: maximum_lag_minutes: 60 - cron_schedule: '0 9 * * *' + cron_schedule: "0 9 * * *" dagster_auto_materialize_policy: - type: 'lazy' + type: "lazy" ``` After: @@ -33,38 +85,38 @@ models: dagster: freshness_policy: maximum_lag_minutes: 60 - cron_schedule: '0 9 * * *' + cron_schedule: "0 9 * * *" auto_materialize_policy: - type: 'lazy' + type: "lazy" ``` - Added support for Pythonic Config classes to the `@configured` API, which makes reusing op and asset definitions easier: - ```python - class GreetingConfig(Config): - message: str + ```python + class GreetingConfig(Config): + message: str - @op - def greeting_op(config: GreetingConfig): - print(config.message) + @op + def greeting_op(config: GreetingConfig): + print(config.message) - class HelloConfig(Config): - name: str + class HelloConfig(Config): + name: str - @configured(greeting_op) - def hello_op(config: HelloConfig): - return GreetingConfig(message=f"Hello, {config.name}!") - ``` + @configured(greeting_op) + def hello_op(config: HelloConfig): + return GreetingConfig(message=f"Hello, {config.name}!") + ``` - Added `AssetExecutionContext` to replace `OpExecutionContext` as the context object passed in to `@asset` functions. -- `TimeWindowPartitionMapping` now contains an `allow_nonexistent_upstream_partitions` argument that, when set to `True`, allows a downstream partition subset to have nonexistent upstream parents. +- `TimeWindowPartitionMapping` now contains an `allow_nonexistent_upstream_partitions` argument that, when set to `True`, allows a downstream partition subset to have nonexistent upstream parents. - Unpinned the `alembic` dependency in the `dagster` package. - [ui] A new “Assets” tab is available from the Overview page. - [ui] The Backfills table now includes links to the assets that were targeted by the backfill. ### Bugfixes -- Dagster is now compatible with a breaking change introduced in `croniter==1.4.0`. Users of earlier versions of Dagster can pin `croniter<1.4`. +- Dagster is now compatible with a breaking change introduced in `croniter==1.4.0`. Users of earlier versions of Dagster can pin `croniter<1.4`. - Fixed an issue introduced in 1.3.8 which prevented resources from being bound to sensors when the specified job required late-bound resources. - Fixed an issue which prevented specifying resource requirements on a `@run_failure_sensor`. - Fixed an issue where the asset reconciliation sensor failed with a “invalid upstream partitions” error when evaluating time partitions definitions with different start times. @@ -82,10 +134,10 @@ models: - Evaluation history for `AutoMaterializePolicy`s will now be cleared after 1 week. - [dagster-dbt] Several improvements to `@dbt_assets`: - - `profile` and `target` can now be customized on the `DbtCli` resource. - - If a `partial_parse.msgpack` is detected in the target directory of your dbt project, it is now copied into the target directories created by `DbtCli` to take advantage of [partial parsing](https://docs.getdbt.com/reference/parsing). - - The metadata of assets generated by `@dbt_assets` can now be customized by overriding `DbtManifest.node_info_to_metadata`. - - Execution duration of dbt models is now added as default metadata to `AssetMaterialization`s. + - `profile` and `target` can now be customized on the `DbtCli` resource. + - If a `partial_parse.msgpack` is detected in the target directory of your dbt project, it is now copied into the target directories created by `DbtCli` to take advantage of [partial parsing](https://docs.getdbt.com/reference/parsing). + - The metadata of assets generated by `@dbt_assets` can now be customized by overriding `DbtManifest.node_info_to_metadata`. + - Execution duration of dbt models is now added as default metadata to `AssetMaterialization`s. ### Documentation @@ -101,7 +153,6 @@ models: - Fixed an issue in the `1.3.8` release where the Dagster Cloud agent would sometimes fail to start up with an import error. - # 1.3.8 (core) / 0.19.8 (libraries) ### New @@ -134,9 +185,9 @@ models: - `@observable_source_asset`-decorated functions can now return a `DataVersionsByPartition` to record versions for partitions. - `@dbt_assets` - - `DbtCliTask`'s created by invoking `DbtCli.cli(...)` now have a method `.is_successful()`, which returns a boolean representing whether the underlying CLI process executed the dbt command successfully. - - Descriptions of assets generated by `@dbt_assets` can now be customized by overriding `DbtManifest.node_info_to_description`. - - IO Managers can now be configured on `@dbt_assets`. + - `DbtCliTask`'s created by invoking `DbtCli.cli(...)` now have a method `.is_successful()`, which returns a boolean representing whether the underlying CLI process executed the dbt command successfully. + - Descriptions of assets generated by `@dbt_assets` can now be customized by overriding `DbtManifest.node_info_to_description`. + - IO Managers can now be configured on `@dbt_assets`. ### Documentation @@ -160,7 +211,7 @@ models: ### Bugfixes -- Fixed an issue where setting a resource in an op didn’t work if the Dagster job was only referenced within a schedule or sensor and wasn’t included in the `jobs` argument to `Definitions`. +- Fixed an issue where setting a resource in an op didn’t work if the Dagster job was only referenced within a schedule or sensor and wasn’t included in the `jobs` argument to `Definitions`. - [dagster-slack][dagster-pagerduty][dagster-msteams][dagster-airflow] Fixed issue where pre-built sensors and hooks which created urls to the runs page in the UI would use the old `/instance/runs` path instead of the new `/runs`. ### Community Contributions @@ -249,7 +300,7 @@ models: - [dagster-airflow] persistent database URI can now be passed via environment variable - [dagster-azure] New `ConfigurablePickledObjectADLS2IOManager` that uses pythonic config - [dagster-fivetran] Fivetran connectors that are broken or incomplete are now ignored -- [dagster-gcp] New `DataProcResource` follows the Pythonic resource system. The existing `dataproc_resource` remains supported. +- [dagster-gcp] New `DataProcResource` follows the Pythonic resource system. The existing `dataproc_resource` remains supported. - [dagster-k8s] The K8sRunLauncher and k8s_job_executor will now retry the api call to create a Kubernetes Job when it gets a transient error code (500, 503, 504, or 401). - [dagster-snowflake] The `SnowflakeIOManager` now supports `private_key`s that have been `base64` encoded to avoid issues with newlines in the private key. Non-base64 encoded keys are still supported. See the `SnowflakeIOManager` documentation for more information on `base64` encoded private keys. - [ui] Unpartitioned assets show up on the backfill page @@ -259,7 +310,7 @@ models: ### Bugfixes - The server side polling for events during a live run has had its rate adjusted and no longer uses a fixed interval. -- [dagster-postgres] Fixed an issue where primary key constraints were not being created for the `kvs`, `instance_info`, and `daemon_hearbeats` table for existing Postgres storage instances that were migrating from before `1.2.2`. This should unblock users relying on the existence of a primary key constraint for replication. +- [dagster-postgres] Fixed an issue where primary key constraints were not being created for the `kvs`, `instance_info`, and `daemon_hearbeats` table for existing Postgres storage instances that were migrating from before `1.2.2`. This should unblock users relying on the existence of a primary key constraint for replication. - Fixed a bug that could cause incorrect counts to be shown for missing asset partitions when partitions are in progress - Fixed an issue within `SensorResult` evaluation where multipartitioned run requests containing a dynamic partition added in a dynamic partitions request object would raise an invalid partition key error. - [ui] When trying to terminate a queued or in-progress run from a Run page, forcing termination was incorrectly given as the only option. This has been fixed, and these runs can now be terminated normally. @@ -272,7 +323,7 @@ models: ### Community Contributions -- [dagster-airbyte] When supplying an `airbyte_resource` to `load_assets_from_connections` , you may now provide an instance of the `AirbyteResource` class, rather than just `airbyte_resource.configured(...)` (thanks **[@joel-olazagasti](https://github.com/joel-olazagasti)!)** +- [dagster-airbyte] When supplying an `airbyte_resource` to `load_assets_from_connections` , you may now provide an instance of the `AirbyteResource` class, rather than just `airbyte_resource.configured(...)` (thanks **[@joel-olazagasti](https://github.com/joel-olazagasti)!)** - [dagster-airbyte] Fixed an issue connecting to destinations that support normalization (thanks [@nina-j](https://github.com/nina-j)!) - Fix an error in the docs code snippets for IO managers (thanks [out-running-27](https://github.com/out-running-27)!) - Added [an example](https://github.com/dagster-io/dagster/tree/master/examples/project_analytics) to show how to build the Dagster's Software-Defined Assets for an analytics workflow with different deployments for a local and prod environment. (thanks [@PedramNavid](https://github.com/PedramNavid)!) @@ -297,17 +348,17 @@ models: - `RunFailureSensorContext` now has a `get_step_failure_events` method. - The Pythonic resource system now supports a set of lifecycle hooks which can be used to manage setup and teardown: - ```python - class MyAPIClientResource(ConfigurableResource): - api_key: str - _internal_client: MyAPIClient = PrivateAttr() + ```python + class MyAPIClientResource(ConfigurableResource): + api_key: str + _internal_client: MyAPIClient = PrivateAttr() - def setup_for_execution(self, context): - self._internal_client = MyAPIClient(self.api_key) + def setup_for_execution(self, context): + self._internal_client = MyAPIClient(self.api_key) - def get_all_items(self): - return self._internal_client.items.get() - ``` + def get_all_items(self): + return self._internal_client.items.get() + ``` - Added support for specifying input and output config on `ConfigurableIOManager`. - `QueuedRunCoordinator` and `SubmitRunContext` are now exposed as public dagster exports. @@ -316,8 +367,8 @@ models: - [dagster-aws] Allow the S3 compute log manager to specify a `show_url_only: true` config option, which will display a URL to the S3 file in dagit, instead of the contents of the log file. - [dagster-aws] `PickledObjectS3IOManager` now fully supports loading partitioned inputs. - [dagster-azure] `PickedObjectADLS2IOManager` now fully supports loading partitioned inputs. -- [dagster-gcp] New `GCSResource` and `ConfigurablePickledObjectGCSIOManager` follow the Pythonic resource system. The existing `gcs_resource` and `gcs_pickle_io_manager` remain supported. -- [dagster-gcp] New `BigQueryResource` follows the Pythonic resource system. The existing `bigquery_resource` remains supported. +- [dagster-gcp] New `GCSResource` and `ConfigurablePickledObjectGCSIOManager` follow the Pythonic resource system. The existing `gcs_resource` and `gcs_pickle_io_manager` remain supported. +- [dagster-gcp] New `BigQueryResource` follows the Pythonic resource system. The existing `bigquery_resource` remains supported. - [dagster-gcp] `PickledObjectGCSIOManager` now fully supports loading partitioned inputs. - [dagster-postgres] The event watching implementation has been moved from listen/notify based to the polling watcher used by MySQL and SQLite. - [dagster-slack] Add `monitor_all_repositories` to `make_slack_on_run_failure_sensor`, thanks @danielgafni! @@ -346,8 +397,8 @@ models: - Ever wanted to know more about the files in Dagster projects, including where to put them in your project? Check out the new [Dagster project files reference](https://docs.dagster.io/getting-started/project-file-reference) for more info! - We’ve made some improvements to the sidenav / information architecture of our docs! - - The **Guides** section now contains several new categories, including **Working with data assets** and **Working with tasks** - - The **Community** section is now under **About** + - The **Guides** section now contains several new categories, including **Working with data assets** and **Working with tasks** + - The **Community** section is now under **About** - The Backfills concepts page now includes instructions on how to launch backfills that target ranges of partitions in a single run. # 1.3.2 (core) / 0.19.2 (libraries) @@ -371,7 +422,6 @@ models: ### Breaking Changes - - Yielding run requests for experimental dynamic partitions via `run_request_for_partition` now throws an error. Instead, users should yield directly instantiated run requests via `RunRequest(partition_key=...)`. - `graph_asset` and `graph_multi_asset` now support specifying `resource_defs` directly (thanks [@kmontag42](https://github.com/KMontag42))!