-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create time filtering for bulk actions table #23560
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @jamiedemaria and the rest of your teammates on Graphite |
e9bda94
to
55ab112
Compare
738c84f
to
da4bd56
Compare
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
da4bd56
to
f36fa63
Compare
1b3f915
to
89a653e
Compare
f36fa63
to
bc64137
Compare
bc64137
to
89321ea
Compare
89321ea
to
b75f5db
Compare
python_modules/dagster/dagster/_core/storage/runs/sql_run_storage.py
Outdated
Show resolved
Hide resolved
Deploy preview for dagit-core-storybook ready! ✅ Preview Built with commit c2976ee. |
4143158
to
cf67d4a
Compare
if status: | ||
query = query.where(BulkActionsTable.c.status == status.value) | ||
query = db_select([BulkActionsTable.c.body, BulkActionsTable.c.timestamp]) | ||
if status or (filters and filters.status): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunate that we have to do all of this checking around status. I assume we'd have to do a deprecation warning cycle if we wanted to have status filtering be done via the BulkActionsFilter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so? I don't have a great sense for how often people are manually calling this from user code with the status arg.
It's not marked as a @public
method on the instance, but it is exposed on the cloud graphql implementation. If you wanted to be super-conservative, you could log some volume metrics in Cloud first before deprecating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, should we take this opportunity to make it a statuses
(plural) filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Will we need them to render the different tabs in the Runs view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we need the plural statuses, but RunsFilter
does statuses and this internal fn also uses statuses
so it could be nice to align
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the Runs page supports filtering for multiple statuses, so i'm updating the BulkActions filter to support multiple statuses in this stacked PR #23772
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but can you split apart the core / graphql changes into separate PRs?
We'd also want to add tests for the storage implementations in our abstract test suite that flexes the backfill filtering. This ensures that our cloud implementations are also under test.
There should be accompanying internal PRs for this, right?
if status: | ||
query = query.where(BulkActionsTable.c.status == status.value) | ||
query = db_select([BulkActionsTable.c.body, BulkActionsTable.c.timestamp]) | ||
if status or (filters and filters.status): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so? I don't have a great sense for how often people are manually calling this from user code with the status arg.
It's not marked as a @public
method on the instance, but it is exposed on the cloud graphql implementation. If you wanted to be super-conservative, you could log some volume metrics in Cloud first before deprecating.
python_modules/dagster/dagster/_core/storage/runs/sql_run_storage.py
Outdated
Show resolved
Hide resolved
yeah. i'm finishing it up and will post soon |
c2976ee
to
ecb72d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mapping out that getting filter parity with runs is still a big (incomplete endeavor). I don't think any of that should block this PR though.
@@ -38,6 +40,13 @@ def from_graphql_input(graphql_str): | |||
return BulkActionStatus(graphql_str) | |||
|
|||
|
|||
@record | |||
class BulkActionsFilter: | |||
status: Optional[BulkActionStatus] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we will need to expose most of the fields that we currently do on the RunsFilter. Support for those fields is probably out of scope for this PR though.
The question about supporting single vs multiple statuses though will probably depend on the mapping of BulkActionStatus to the various types in the Runs view UI status filters.
We should flag that as a thing we need to figure out (status), as well as some of the other filters (e.g. job_name, tags, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep - to add more context if someone ends up looking at this PR in the future: the main goal of adding this filter now is to remove this block of code https://github.com/dagster-io/dagster/pull/23560/files#diff-c46424223e58f81b69323dee30256f0a38733eac79198279ebbcf5cec19ee16eL461-L492
I noted these things down for the filtering discussion
0261690
to
fba1cfa
Compare
@@ -373,6 +373,7 @@ def get_backfills( | |||
status: Optional[BulkActionStatus] = None, | |||
cursor: Optional[str] = None, | |||
limit: Optional[int] = None, | |||
filters: Optional[BulkActionsFilter] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider changing the order of args? And maybe (in a separate PR) switching callers from using status to using filters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think ideally filters would be before status, but there could be callsites that dont use kwargs right? get_backfills(BulkActionStatus.COMPLETED)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any callsites that don't use kwargs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would potentially worry about user code callsites, but get_backfills
is not marked as @public
, so I think you could add a changelog note and be fine about changing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll stack a branch to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
branch for re-ordering #23773
b484605
to
b341c13
Compare
if cursor: | ||
cursor_query = db_select([BulkActionsTable.c.id]).where( | ||
BulkActionsTable.c.key == cursor | ||
) | ||
query = query.where(BulkActionsTable.c.id < cursor_query) | ||
if filters and filters.created_after: | ||
query = query.where(BulkActionsTable.c.timestamp > filters.created_after) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prha the time-based filters for the RunsTable
replace the timezone filters.created_after.replace(tzinfo=None)
but the columns in the RunsTable
are DATETIME. In the BulkActionsTable
they are TIMESTAMP and i found that the postgres run storage tests failed when i replaced the timezone with None. The tests all pass when i leave the tzinfo as is, but i haven't been able to find a reason why online. Do you forsee any issues with not doing the timezone conversion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more info -
when we add the backfill to the DB we call datetime_from_timestamp
which will return a UTC datetime
timestamp=datetime_from_timestamp(partition_backfill.backfill_timestamp), |
In the graphene layer, I'm doing the same conversion so anything coming from the UI will also get converted to UTC and filtering will be correct https://github.com/dagster-io/dagster/pull/23682/files#diff-61fefa33db2a378b1c50f360e3aa830124007f7bc8f4195005286b529b6e60cdR388
But maybe the right thing to do is to have a custom constructor on BulkActionFilters
that takes timestamp instead of datetime and does the conversion there so that all of the datetimes on the filter are guranteed to be converted to UTC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either way works...
68d0651
to
1e70e2b
Compare
1e70e2b
to
489a7ae
Compare
c58431c
to
d215c69
Compare
d215c69
to
42de3cc
Compare
Summary & Motivation
Adds ability to filter on create time to the BulkActions table so that we dont need to filter out results manually when serving the RunsFeed. Does this by adding a
BulkActionsFilter
likeRunsFilter
so that if we want to add more filtering capabilities we can add them to this object rather than as new parameters on theget_backfills
methodcompanion internal pr https://github.com/dagster-io/internal/pull/11031
How I Tested These Changes