create time filtering for bulk actions table #23560

jamiedemaria · 2024-08-09T21:04:05Z

Summary & Motivation

Adds ability to filter on create time to the BulkActions table so that we dont need to filter out results manually when serving the RunsFeed. Does this by adding a BulkActionsFilter like RunsFilter so that if we want to add more filtering capabilities we can add them to this object rather than as new parameters on the get_backfills method

companion internal pr https://github.com/dagster-io/internal/pull/11031

How I Tested These Changes

jamiedemaria · 2024-08-09T21:04:21Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @jamiedemaria and the rest of your teammates on Graphite

vercel · 2024-08-12T18:27:39Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
dagster-docs-next	❌ Failed (Inspect)			Aug 12, 2024 9:38pm

python_modules/dagster/dagster/_core/storage/runs/sql_run_storage.py

github-actions · 2024-08-14T18:55:39Z

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-64ffca6a5-elementl.vercel.app
https://jamie-bulk-actions-table-filters.core-storybook.dagster-docs.io

Built with commit c2976ee.
This pull request is being automatically deployed with vercel-action

jamiedemaria · 2024-08-15T18:15:45Z

python_modules/dagster/dagster/_core/storage/runs/sql_run_storage.py

-        if status:
-            query = query.where(BulkActionsTable.c.status == status.value)
+        query = db_select([BulkActionsTable.c.body, BulkActionsTable.c.timestamp])
+        if status or (filters and filters.status):


unfortunate that we have to do all of this checking around status. I assume we'd have to do a deprecation warning cycle if we wanted to have status filtering be done via the BulkActionsFilter?

I think so? I don't have a great sense for how often people are manually calling this from user code with the status arg.

It's not marked as a @public method on the instance, but it is exposed on the cloud graphql implementation. If you wanted to be super-conservative, you could log some volume metrics in Cloud first before deprecating.

similarly, should we take this opportunity to make it a statuses (plural) filter?

Hmm. Will we need them to render the different tabs in the Runs view?

I'm not sure if we need the plural statuses, but RunsFilter does statuses and this internal fn also uses statuses so it could be nice to align

the Runs page supports filtering for multiple statuses, so i'm updating the BulkActions filter to support multiple statuses in this stacked PR #23772

prha

This looks good, but can you split apart the core / graphql changes into separate PRs?

We'd also want to add tests for the storage implementations in our abstract test suite that flexes the backfill filtering. This ensures that our cloud implementations are also under test.

There should be accompanying internal PRs for this, right?

prha · 2024-08-15T19:02:29Z

python_modules/dagster/dagster/_core/storage/runs/sql_run_storage.py

-        if status:
-            query = query.where(BulkActionsTable.c.status == status.value)
+        query = db_select([BulkActionsTable.c.body, BulkActionsTable.c.timestamp])
+        if status or (filters and filters.status):


I think so? I don't have a great sense for how often people are manually calling this from user code with the status arg.

It's not marked as a @public method on the instance, but it is exposed on the cloud graphql implementation. If you wanted to be super-conservative, you could log some volume metrics in Cloud first before deprecating.

python_modules/dagster/dagster/_core/storage/runs/sql_run_storage.py

jamiedemaria · 2024-08-15T19:35:19Z

There should be accompanying internal PRs for this, right?

yeah. i'm finishing it up and will post soon

prha

Mapping out that getting filter parity with runs is still a big (incomplete endeavor). I don't think any of that should block this PR though.

prha · 2024-08-15T22:16:13Z

python_modules/dagster/dagster/_core/execution/backfill.py

@@ -38,6 +40,13 @@ def from_graphql_input(graphql_str):
        return BulkActionStatus(graphql_str)


+@record
+class BulkActionsFilter:
+    status: Optional[BulkActionStatus] = None


Probably we will need to expose most of the fields that we currently do on the RunsFilter. Support for those fields is probably out of scope for this PR though.

The question about supporting single vs multiple statuses though will probably depend on the mapping of BulkActionStatus to the various types in the Runs view UI status filters.

We should flag that as a thing we need to figure out (status), as well as some of the other filters (e.g. job_name, tags, etc).

yep - to add more context if someone ends up looking at this PR in the future: the main goal of adding this filter now is to remove this block of code https://github.com/dagster-io/dagster/pull/23560/files#diff-c46424223e58f81b69323dee30256f0a38733eac79198279ebbcf5cec19ee16eL461-L492

I noted these things down for the filtering discussion

prha · 2024-08-16T17:02:15Z

python_modules/dagster/dagster/_core/storage/runs/base.py

@@ -373,6 +373,7 @@ def get_backfills(
        status: Optional[BulkActionStatus] = None,
        cursor: Optional[str] = None,
        limit: Optional[int] = None,
+        filters: Optional[BulkActionsFilter] = None,


Should we consider changing the order of args? And maybe (in a separate PR) switching callers from using status to using filters?

yeah i think ideally filters would be before status, but there could be callsites that dont use kwargs right? get_backfills(BulkActionStatus.COMPLETED)

I don't see any callsites that don't use kwargs.

We would potentially worry about user code callsites, but get_backfills is not marked as @public, so I think you could add a changelog note and be fine about changing it.

i'll stack a branch to do this

branch for re-ordering #23773

jamiedemaria · 2024-08-19T16:17:16Z

python_modules/dagster/dagster/_core/storage/runs/sql_run_storage.py

        if cursor:
            cursor_query = db_select([BulkActionsTable.c.id]).where(
                BulkActionsTable.c.key == cursor
            )
            query = query.where(BulkActionsTable.c.id < cursor_query)
+        if filters and filters.created_after:
+            query = query.where(BulkActionsTable.c.timestamp > filters.created_after)


@prha the time-based filters for the RunsTable replace the timezone filters.created_after.replace(tzinfo=None) but the columns in the RunsTable are DATETIME. In the BulkActionsTable they are TIMESTAMP and i found that the postgres run storage tests failed when i replaced the timezone with None. The tests all pass when i leave the tzinfo as is, but i haven't been able to find a reason why online. Do you forsee any issues with not doing the timezone conversion here?

more info -
when we add the backfill to the DB we call datetime_from_timestamp which will return a UTC datetime

dagster/python_modules/dagster/dagster/_core/storage/runs/sql_run_storage.py

Line 878 in b1884c3

timestamp=datetime_from_timestamp(partition_backfill.backfill_timestamp),

In the graphene layer, I'm doing the same conversion so anything coming from the UI will also get converted to UTC and filtering will be correct https://github.com/dagster-io/dagster/pull/23682/files#diff-61fefa33db2a378b1c50f360e3aa830124007f7bc8f4195005286b529b6e60cdR388

But maybe the right thing to do is to have a custom constructor on BulkActionFilters that takes timestamp instead of datetime and does the conversion there so that all of the datetimes on the filter are guranteed to be converted to UTC

I think either way works...

jamiedemaria mentioned this pull request Aug 9, 2024

Graphene layer for serving the Runs Feed #23375

Merged

jamiedemaria force-pushed the jamie/graphene-mega-runs branch from e9bda94 to 55ab112 Compare August 12, 2024 18:27

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch from 738c84f to da4bd56 Compare August 12, 2024 18:27

vercel bot had a problem deploying to Preview August 12, 2024 18:28 Failure

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch from da4bd56 to f36fa63 Compare August 12, 2024 21:07

vercel bot had a problem deploying to Preview August 12, 2024 21:08 Failure

jamiedemaria force-pushed the jamie/graphene-mega-runs branch from 1b3f915 to 89a653e Compare August 12, 2024 21:38

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch from f36fa63 to bc64137 Compare August 12, 2024 21:38

vercel bot had a problem deploying to Preview August 12, 2024 21:38 Failure

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch from bc64137 to 89321ea Compare August 13, 2024 20:55

Base automatically changed from jamie/graphene-mega-runs to master August 14, 2024 13:53

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch from 89321ea to b75f5db Compare August 14, 2024 15:56

jamiedemaria marked this pull request as ready for review August 14, 2024 16:26

jamiedemaria commented Aug 14, 2024

View reviewed changes

python_modules/dagster/dagster/_core/storage/runs/sql_run_storage.py Outdated Show resolved Hide resolved

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch from 4143158 to cf67d4a Compare August 15, 2024 18:13

jamiedemaria commented Aug 15, 2024

View reviewed changes

jamiedemaria requested review from alangenfeld and prha August 15, 2024 18:16

prha requested changes Aug 15, 2024

View reviewed changes

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch from c2976ee to ecb72d4 Compare August 15, 2024 19:40

jamiedemaria mentioned this pull request Aug 15, 2024

BulkActions filters in GQL layer #23682

Merged

jamiedemaria changed the title ~~basic create filtering for bulk actions table~~ create time filtering for bulk actions table Aug 15, 2024

prha approved these changes Aug 15, 2024

View reviewed changes

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch from 0261690 to fba1cfa Compare August 16, 2024 15:07

prha reviewed Aug 16, 2024

View reviewed changes

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch from b484605 to b341c13 Compare August 19, 2024 14:34

jamiedemaria commented Aug 19, 2024

View reviewed changes

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch 4 times, most recently from 68d0651 to 1e70e2b Compare August 20, 2024 20:57

This was referenced Aug 20, 2024

filter by multiple statuses in BulkActionsFilter #23772

Merged

Reorder status and filters params for get_backfills #23773

Merged

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch from 1e70e2b to 489a7ae Compare August 21, 2024 14:11

jamiedemaria added 12 commits August 22, 2024 10:15

basic create filtering for bulk actions table

61f40b7

pass single datetime rather than converting

707e547

wip

21ba318

debugging

196289a

actually pass the correct cursor

5c65a33

split branch

bf1f747

test suite

5ffe3cf

fix

4015656

fix tests

7b8d04f

missed one thing when i split the branch

f1013a9

try removing timezone none

b5481c5

assert

997e591

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch 2 times, most recently from c58431c to d215c69 Compare August 22, 2024 15:21

comment

42de3cc

jamiedemaria force-pushed the jamie/bulk-actions-table-filters branch from d215c69 to 42de3cc Compare August 22, 2024 16:06

jamiedemaria merged commit 8fd3c25 into master Aug 22, 2024
1 check failed

jamiedemaria deleted the jamie/bulk-actions-table-filters branch August 22, 2024 17:32

sryza pushed a commit that referenced this pull request Aug 24, 2024

create time filtering for bulk actions table (#23560)

1779e75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create time filtering for bulk actions table #23560

create time filtering for bulk actions table #23560

jamiedemaria commented Aug 9, 2024 •

edited

Loading

jamiedemaria commented Aug 9, 2024 •

edited

Loading

vercel bot commented Aug 12, 2024 •

edited

Loading

github-actions bot commented Aug 14, 2024 •

edited

Loading

jamiedemaria Aug 15, 2024

prha Aug 15, 2024

jamiedemaria Aug 15, 2024

prha Aug 15, 2024

jamiedemaria Aug 15, 2024

jamiedemaria Aug 21, 2024

prha left a comment

prha Aug 15, 2024

jamiedemaria commented Aug 15, 2024

prha left a comment

prha Aug 15, 2024

jamiedemaria Aug 16, 2024

prha Aug 16, 2024

jamiedemaria Aug 16, 2024

prha Aug 16, 2024

prha Aug 16, 2024

jamiedemaria Aug 19, 2024

jamiedemaria Aug 21, 2024

jamiedemaria Aug 19, 2024

jamiedemaria Aug 19, 2024

prha Aug 19, 2024

create time filtering for bulk actions table #23560

create time filtering for bulk actions table #23560

Conversation

jamiedemaria commented Aug 9, 2024 • edited Loading

Summary & Motivation

How I Tested These Changes

jamiedemaria commented Aug 9, 2024 • edited Loading

vercel bot commented Aug 12, 2024 • edited Loading

github-actions bot commented Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamiedemaria commented Aug 15, 2024

prha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamiedemaria commented Aug 9, 2024 •

edited

Loading

jamiedemaria commented Aug 9, 2024 •

edited

Loading

vercel bot commented Aug 12, 2024 •

edited

Loading

github-actions bot commented Aug 14, 2024 •

edited

Loading