Skip to content

Releases: dagster-io/dagster

0.7.15

29 May 02:59
Compare
Choose a tag to compare

New

  • Improve dagster scheduler state reconciliation.

0.7.14

22 May 01:19
Compare
Choose a tag to compare

New

  • Dagit now allows re-executing arbitrary step subset via step selector syntax, regardless of whether
    the previous pipeline failed or not.
  • Added a search filter for the root Assets page
  • Adds tooltip explanations for disabled run actions
  • The last output of the cron job command created by the scheduler is now stored in a file. A new dagster schedule logs {schedule_name} command will show the log file for a given schedule. This helps uncover errors like missing environment variables and import errors.
  • The dagit schedule page will now show inconsistency errors between schedule state and the cron tab that were previously only displayed by the dagster schedule debug command. As before, these errors can be resolve using dagster schedule up

Bugfix

  • Fixes an issue with config schema validation on Arrays
  • Fixes an issue with initializing K8sRunLauncher when configured via dagster.yaml
  • Fixes a race condition in Airflow injection logic that happens when multiple Operators try to
    create PipelineRun entries simultaneously.
  • Fixed an issue with schedules that had invalid config not logging the appropriate error.

0.7.13

14 May 23:13
Compare
Choose a tag to compare

Breaking Changes

  • dagster pipeline backfill command no longer takes a mode flag. Instead, it uses the mode specified on the PartitionSetDefinition. Similarly, the runs created from the backfill also use the solid_subset specified on the PartitionSetDefinition

BugFix

  • Fixes a bug where using solid subsets when launching pipeline runs would fail config validation.
  • (dagster-gcp) allow multiple "bq_solid_for_queries" solids to co-exist in a pipeline
  • Improve scheduler state reconciliation with dagster-cron scheduler. dagster schedule debug command will display issues related to missing crob jobs, extraneous cron jobs, and duplicate cron jobs. Running dagster schedule up will fix any issues.

New

  • The dagster-airflow package now supports loading Airflow dags without depending on an initialized Airflow database.
  • Improvements to the longitudinal partitioned schedule view, including live updates, run filtering, and better default states.
  • Added user warning for dagster library packages that are out of sync with the core dagster package.

0.7.12

11 May 23:17
Compare
Choose a tag to compare

Bugfix

  • We now only render the subset of an execution plan that has actually executed, and persist that subset information along with the snapshot.
  • @pipeline and @composite_solid now correctly capture doc from the function they decorate.
  • Fixed a bug with using solid subsets in the Dagit playground

0.7.11

09 May 21:38
Compare
Choose a tag to compare

0.7.11

Bugfix

  • Fixed an issue with strict snapshot ID matching when loading historical snapshots, which caused
    errors on the Runs page when viewing historical runs.
  • Fixed an issue where dagster_celery had introduced a spurious dependency on dagster_k8s
    (#2435)
  • Fixed an issue where our Airflow, Celery, and Dask integrations required S3 or GCS storage and
    prevented use of filesystem storage. Filesystem storage is now also permitted, to enable use of
    these integrations with distributed filesystems like NFS (#2436).

0.7.10

09 May 21:38
Compare
Choose a tag to compare

New

  • RepositoryDefinition now takes schedule_defs and partition_set_defs directly. The loading
    scheme for these definitions via repository.yaml under the scheduler: and partitions: keys
    is deprecated and expected to be removed in 0.8.0.
  • Mark published modules as python 3.8 compatible.
  • The dagster-airflow package supports loading all Airflow DAGs within a directory path, file path,
    or Airflow DagBag.
  • The dagster-airflow package supports loading all 23 DAGs in Airflow example_dags folder and
    execution of 17 of them (see: make_dagster_repo_from_airflow_example_dags).
  • The dagster-celery CLI tools now allow you to pass additional arguments through to the underlying
    celery CLI, e.g., running dagster-celery worker start -n my-worker -- --uid=42 will pass the
    --uid flag to celery.
  • It is now possible to create a PresetDefinition that has no environment defined.
  • Added dagster schedule debug command to help debug scheduler state.
  • The SystemCronScheduler now verifies that a cron job has been successfully been added to the
    crontab when turning a schedule on, and shows an error message if unsuccessful.

Breaking Changes

  • A dagster instance migrate is required for this release to support the new experimental assets
    view.
  • Runs created prior to 0.7.8 will no longer render their execution plans as DAGs. We are only
    rendering execution plans that have been persisted. Logs are still available.
  • Path is no longer valid in config schemas. Use str or dagster.String instead.
  • Removed the @pyspark_solid decorator - its functionality, which was experimental, is subsumed by
    requiring a StepLauncher resource (e.g. emr_pyspark_step_launcher) on the solid.

Dagit

  • Merged "re-execute", "single-step re-execute", "resume/retry" buttons into one "re-execute" button
    with three dropdown selections on the Run page.

Experimental

  • Added new asset_key string parameter to Materializations and created a new “Assets” tab in Dagit
    to view pipelines and runs associated with these keys. The API and UI of these asset-based are
    likely to change, but feedback is welcome and will be used to inform these changes.
  • Added an emr_pyspark_step_launcher that enables launching PySpark solids in EMR. The
    "simple_pyspark" example demonstrates how it’s used.

Bugfix

  • Fixed an issue when running Jupyter notebooks in a Python 2 kernel through dagstermill with dagster
    running in Python 3.
  • Improved error messages produced when dagstermill spins up an in-notebook context.
  • Fixed an issue with retrieving step events from CompositeSolidResult objects.

0.7.9

09 May 21:38
Compare
Choose a tag to compare

Breaking Changes

  • If you are launching runs using DagsterInstance.launch_run, this method now takes a run id instead of an instance of PipelineRun. Additionally, DagsterInstance.create_run and DagsterInstance.create_empty_run have been replaced by DagsterInstance.get_or_create_run and DagsterInstance.create_run_for_pipeline.
  • If you have implemented your own RunLauncher, there are two required changes:
    • RunLauncher.launch_run takes a pipeline run that has already been created. You should remove any calls to instance.create_run in this method.
    • Instead of calling startPipelineExecution (defined in the dagster_graphql.client.query.START_PIPELINE_EXECUTION_MUTATION) in the run launcher, you should call startPipelineExecutionForCreatedRun (defined in dagster_graphql.client.query.START_PIPELINE_EXECUTION_FOR_CREATED_RUN_MUTATION`
    • Refer to the RemoteDagitRunLauncher for an example implementation.

New

  • Improvements to preset and solid subselection in the playground. An inline preview of the pipeline instead of a modal when doing subselection, and the correct subselection is chosen when selecting a preset.
  • Improvements to the log searching. Tokenization and autocompletion for searching messages types and for specific steps.
  • You can now view the structure of pipelines from historical runs, even if that pipeline no longer exists in the loaded repository or has changed structure.
  • Historical execution plans are now viewable, even if the pipeline has changed structure.
  • Added metadata link to raw compute logs for all StepStart events in PipelineRun view and Step view.
  • Improved error handling for the scheduler. If a scheduled run has config errors, the errors are persisted to the event log for the run and can be viewed in Dagit.

Bugfix

  • No longer manually dispose sqlalchemy engine in dagster-postgres
  • Made boto3 dependency in dagster-aws more flexible (#2418)
  • Fixed tooltip UI cleanup in partitioned schedule view

Documentation

  • Brand new documentation site, available at https://docs.dagster.io
  • The tutorial has been restructured to multiple sections, and the examples in intro_tutorial have been rearranged to separate folders to reflect this.

0.7.8

09 May 21:37
Compare
Choose a tag to compare

Breaking Changes

  • The execute_pipeline_with_mode and execute_pipeline_with_preset APIs have been dropped in
    favor of new top level arguments to execute_pipeline, mode and preset.
  • The use of RunConfig to pass options to execute_pipeline has been deprecated, and RunConfig
    will be removed in 0.8.0.
  • The execute_solid_within_pipeline and execute_solids_within_pipeline APIs, intended to support
    tests, now take new top level arguments mode and preset.

New

  • The dagster-aws Redshift resource now supports providing an error callback to debug failed
    queries.
  • We now persist serialized execution plans for historical runs. They will render correctly even if
    the pipeline structure has changed or if it does not exist in the current loaded repository.
  • Clicking on a pipeline tag in the Runs view will apply that tag as a filter.

Bugfix

  • Fixed a bug where telemetry logger would create a log file (but not write any logs) even when
    telemetry was disabled.

Experimental

  • The dagster-airflow package supports ingesting Airflow dags and running them as dagster pipelines
    (see: make_dagster_pipeline_from_airflow_dag). This is in the early experimentation phase.
  • Improved the layout of the experimental partition runs table on the Schedules detailed view.

Documentation

  • Fixed a grammatical error (Thanks @flowersw!)

0.7.7

09 May 21:37
Compare
Choose a tag to compare

Breaking Changes

  • The default sqlite and dagster-postgres implementations have been altered to extract the
    event step_key field as a column, to enable faster per-step queries. You will need to run
    dagster instance migrate to update the schema. You may optionally migrate your historical event
    log data to extract the step_key using the migrate_event_log_data function. This will ensure
    that your historical event log data will be captured in future step-key based views. This
    event_log data migration can be invoked as follows:

    from dagster.core.storage.event_log.migration import migrate_event_log_data
    from dagster import DagsterInstance
    
    migrate_event_log_data(instance=DagsterInstance.get())
  • We have made pipeline metadata serializable and persist that along with run information.
    While there are no user-facing features to leverage this yet, it does require an instance migration.
    dagster instance migrate. If you have already run the migration for the event_log changes
    above, you do not need to run it again. Any unforeseen errors related the the new snapshot_id
    in the runs table or the new snapshots table are related to this migration.

  • dagster-pandas ColumnTypeConstraint has been removed in favor of ColumnDTypeFnConstraint and
    ColumnDTypeInSetConstraint.

New

  • You can now specify that dagstermill output notebooks be yielded as an output from dagstermill
    solids, in addition to being materialized.
  • You may now set the extension on files created using the FileManager machinery.
  • dagster-pandas typed PandasColumn constructors now support pandas 1.0 dtypes.
  • The Dagit Playground has been restructured to make the relationship between Preset, Partition
    Sets, Modes, and subsets more clear. All of these buttons have be reconciled and moved to the
    left side of the Playground.
  • Config sections that are required but not filled out in the Dagit playground are now detected
    and labeled in orange.
  • dagster-celery config now support using env: to load from environment variables.

Bugfix

  • Fixed a bug where selecting a preset in dagit would not populate tags specified on the pipeline
    definition.
  • Fixed a bug where metadata attached to a raised Failure was not displayed in the error modal in
    dagit.
  • Fixed an issue where reimporting dagstermill and calling dagstermill.get_context() outside of
    the parameters cell of a dagstermill notebook could lead to unexpected behavior.
  • Fixed an issue with connection pooling in dagster-postgres, improving responsiveness when using
    the Postgres-backed storages.

Experimental

  • Added a longitudinal view of runs for on the Schedule tab for scheduled, partitioned pipelines.
    Includes views of run status, execution time, and materializations across partitions. The UI is
    in flux and is currently optimized for daily schedules, but feedback is welcome.

0.7.6

03 Apr 19:00
Compare
Choose a tag to compare

Breaking Changes

  • default_value in Field no longer accepts native instances of python enums. Instead
    the underlying string representation in the config system must be used.
  • default_value in Field no longer accepts callables.
  • The dagster_aws imports have been reorganized; you should now import resources from
    dagster_aws.<AWS service name>. dagster_aws provides s3, emr, redshift, and cloudwatch
    modules.
  • The dagster_aws S3 resource no longer attempts to model the underlying boto3 API, and you can
    now just use any boto3 S3 API directly on a S3 resource, e.g.
    context.resources.s3.list_objects_v2. (#2292)

New

  • New Playground view in dagit showing an interactive config map
  • Improved storage and UI for showing schedule attempts
  • Added the ability to set default values in InputDefinition
  • Added CLI command dagster pipeline launch to launch runs using a configured RunLauncher
  • Added ability to specify pipeline run tags using the CLI
  • Added a pdb utility to SolidExecutionContext to help with debugging, available within a solid as context.pdb
  • Added PresetDefinition.with_additional_config to allow for config overrides
  • Added resource name to log messages generated during resource initialization
  • Added grouping tags for runs that have been retried / reexecuted.

Bugfix

  • Fixed a bug where date range partitions with a specified end date was clipping the last day
  • Fixed an issue where some schedule attempts that failed to start would be marked running forever.
  • Fixed the @weekly partitioned schedule decorator
  • Fixed timezone inconsistencies between the runs view and the schedules view
  • Integers are now accepted as valid values for Float config fields
  • Fixed an issue when executing dagstermill solids with config that contained quote characters.

dagstermill

  • The Jupyter kernel to use may now be specified when creating dagster notebooks with the --kernel flag.

dagster-dbt

  • dbt_solid now has a Nothing input to allow for sequencing

dagster-k8s

  • Added get_celery_engine_config to select celery engine, leveraging Celery infrastructure

Documentation

  • Improvements to the airline and bay bikes demos
  • Improvements to our dask deployment docs (Thanks jswaney!!)