[1/n] Split RootWorkspaceQuery by code location #22296

salazarm · 2024-06-05T09:16:41Z

Summary & Motivation

This PR speeds up workspace fetching by splitting the RootWorkspaceQuery into multiple parallel fetches by code location. We now rely on the CodeLocationStatusQuery to determine these locations. To minimize delays, we cache the code location status query in IndexedDB, enabling immediate fetches after the initial load. It also enables us to fetch only changed code locations rather than needing to fetch all locations when a single one changes.

The WorkspaceProvider is now responsible for refetching code locations when they change by comparing status updates to detect changes. This responsibility was previously with useCodeLocationsStatus, but to avoid a cyclical dependency, it has been moved to the WorkspaceProvider. useCodeLocationsStatus will continue handling toast updates and returning status information for downstream consumers, but now it uses data from the workspace context instead of making direct queries. (In a future PR we could collapse some of this functionality but I wanted to minimize the number of code changes in this PR).

The skip parameter has been removed from useCodeLocationsStatus since it was never set to true in either cloud or OSS environments, both of which poll for code location updates.

Tests now require mocking more queries to create the desired workspace context. To simplify this, a buildWorkspaceMocks function has been added. It accepts location entries as arguments and creates the necessary code location statuses query mock and workspace query mocks by location.

Had to update testing-library/react to get tests to pass because the version we were on had some goofy react act stuff that was fixed.

Next PRs in stack will be:

Cloud PR updating workspace mocks / imports
Fetching schedule/sensor state independently in LeftNavItem and useDaemonStatus (currently they depend on the workspace context but this doesnt refresh anymore)
Making OverviewSensors/OverviewSchedules/OverviewJobs etc. use the root workspace instead of fetching everything independently from scratch every time.

How I Tested These Changes

Ran local host cloud and updated code locations. Tested cached loads with locations different and with locations the same making sure we didn't make extra queries if not necessary
Ran dagster-dev and made sure manual code location reloading still worked
Existing indexeddb test still pass covering the changes in that file
Relying on existing tests using WorkspaceProvider to make sure product functionality is still intact.

A cloud PR will follow updating workspace mocks there

…rkspaceProvider

github-actions · 2024-06-05T09:19:34Z

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-ehfll0hqt-elementl.vercel.app
https://salazarm-workspace-context-shiz.components-storybook.dagster-docs.io

Built with commit 2997ccd.
This pull request is being automatically deployed with vercel-action

github-actions · 2024-06-05T09:19:56Z

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-mtvc05cl3-elementl.vercel.app
https://salazarm-workspace-context-shiz.core-storybook.dagster-docs.io

Built with commit 2997ccd.
This pull request is being automatically deployed with vercel-action

js_modules/dagster-ui/packages/ui-core/src/overview/OverviewSensors.tsx

alangenfeld

exciting! lot of complexity here so think other reviewers eyes would be good

js_modules/dagster-ui/packages/ui-core/src/search/useIndexedDBCachedQuery.tsx

js_modules/dagster-ui/packages/ui-core/src/workspace/WorkspaceQueries.tsx

js_modules/dagster-ui/packages/ui-core/src/workspace/WorkspaceContext.tsx

alangenfeld · 2024-06-05T17:01:43Z

other thing to consider : what if we change the queries but use old cache data that doesn't have those fields. Do we need to include a hash of the query or something in the cache identifier ?

salazarm · 2024-06-05T17:10:40Z

@alangenfeld Yeah we need to be careful about that. The current mechanism is to update the "version" argument passed to indexeddb cache query, but yeah ideally this could be a hash of the query or perhaps the query itself.... If whats in the cache doesn't match then we don't use the cache in that case.

alangenfeld · 2024-06-05T17:15:09Z

can you just rewrite the app to Relay and persisted queries quick k thx

salazarm · 2024-06-05T17:17:48Z

@alangenfeld If you can figure out the backend then yes, it should be pretty straightforward xP

alangenfeld · 2024-06-05T17:25:12Z

looks like there might be some decent options with apollo to have operation hashes at codegen time
https://www.apollographql.com/docs/react/api/link/persisted-queries/
https://www.npmjs.com/package/@apollo/generate-persisted-query-manifest

runtime hashing might not be crazy either - i feel like our queries are not huge generally

…workspace-context-shiz

salazarm · 2024-06-06T08:33:29Z

js_modules/dagster-ui/packages/ui-core/src/nav/useCodeLocationsStatus.tsx

 type EntriesById = Record<string, LocationStatusEntry>;

-export const useCodeLocationsStatus = (skip = false): StatusAndMessage | null => {
-  const {locationEntries, refetch} = useContext(WorkspaceContext);
+export const codeLocationStatusAtom = atom<CodeLocationStatusQuery | undefined>({


This exists for perf. I don't want to put this in WorkspaceContext since it's polled every few seconds and would end up causing a lot of components to re-render (most of which aren't using React.memo)

Seems like a good call to me!

…workspace-context-shiz

salazarm · 2024-06-07T04:43:40Z

Tests added, this PR is ready for full review

bengotow

This looks awesome! I added comments inline -- generally it feels like we're doing a bit of extra work to build this on top of idb-lru-cache because we don't want the LRU behavior, but the implementation looks correct.

I'm a little concerned about building a generalized caching solution on top of indexeddb instead of something like the Cache APIs. If someone (eg: me) comes along and throws a date or asset key, etc into a useIndexedDBCachedQuery key by accident, we could really eat up a lot of disk space.

The WorkspaceContext seems great and it's going to make such a huge difference!

[side rant] This really makes me miss the old REST API days... If we had a REST GET per code location and the backend put the code location modification date into the etag header (and then checked it in subsequent requests), the browser would essentially have done all this for us via 302 Not Modified. Really unfortunate that we get no cache primitives at all with GraphQL and have to build this in userland code :(

bengotow · 2024-06-08T03:23:44Z

js_modules/dagster-ui/packages/ui-core/src/nav/useCodeLocationsStatus.tsx

 type EntriesById = Record<string, LocationStatusEntry>;

-export const useCodeLocationsStatus = (skip = false): StatusAndMessage | null => {
-  const {locationEntries, refetch} = useContext(WorkspaceContext);
+export const codeLocationStatusAtom = atom<CodeLocationStatusQuery | undefined>({


Seems like a good call to me!

js_modules/dagster-ui/packages/ui-core/src/search/useIndexedDBCachedQuery.tsx

js_modules/dagster-ui/packages/ui-core/src/workspace/WorkspaceContext.tsx

## Summary & Motivation This PR speeds up workspace fetching by splitting the RootWorkspaceQuery into multiple parallel fetches by code location. We now rely on the CodeLocationStatusQuery to determine these locations. To minimize delays, we cache the code location status query in IndexedDB, enabling immediate fetches after the initial load. It also enables us to fetch only changed code locations rather than needing to fetch all locations when a single one changes. The WorkspaceProvider is now responsible for refetching code locations when they change by comparing status updates to detect changes. This responsibility was previously with useCodeLocationsStatus, but to avoid a cyclical dependency, it has been moved to the WorkspaceProvider. useCodeLocationsStatus will continue handling toast updates and returning status information for downstream consumers, but now it uses data from the workspace context instead of making direct queries. (In a future PR we could collapse some of this functionality but I wanted to minimize the number of code changes in this PR). The skip parameter has been removed from useCodeLocationsStatus since it was never set to true in either cloud or OSS environments, both of which poll for code location updates. Tests now require mocking more queries to create the desired workspace context. To simplify this, a buildWorkspaceMocks function has been added. It accepts location entries as arguments and creates the necessary code location statuses query mock and workspace query mocks by location. Had to update testing-library/react to get tests to pass because the version we were on had some goofy react act stuff that was fixed. Next PRs in stack will be: - [Cloud PR updating workspace mocks / imports](dagster-io/internal#10126) - Fetching schedule/sensor state independently in LeftNavItem and useDaemonStatus (currently they depend on the workspace context but this doesnt refresh anymore) - Making OverviewSensors/OverviewSchedules/OverviewJobs etc. use the root workspace instead of fetching everything independently from scratch every time. ## How I Tested These Changes - Ran local host cloud and updated code locations. Tested cached loads with locations different and with locations the same making sure we didn't make extra queries if not necessary - Ran dagster-dev and made sure manual code location reloading still worked - Existing indexeddb test still pass covering the changes in that file - Relying on existing tests using WorkspaceProvider to make sure product functionality is still intact. A cloud PR will follow updating workspace mocks there

Split RootWorkspaceQuery by code location + move responsibility to Wo…

4640552

…rkspaceProvider

salazarm requested review from alangenfeld, bengotow and prha June 5, 2024 09:16

salazarm changed the title ~~Split RootWorkspaceQuery by code location + move responsibility to Wo…~~ Split RootWorkspaceQuery by code location Jun 5, 2024

salazarm commented Jun 5, 2024

View reviewed changes

js_modules/dagster-ui/packages/ui-core/src/overview/OverviewSensors.tsx Show resolved Hide resolved

salazarm changed the title ~~Split RootWorkspaceQuery by code location~~ [1/n] Split RootWorkspaceQuery by code location Jun 5, 2024

salazarm added 2 commits June 5, 2024 05:36

types

bfd1d69

ts

e087d41

alangenfeld reviewed Jun 5, 2024

View reviewed changes

salazarm added 6 commits June 5, 2024 13:52

Merge branch 'master' of github.com:dagster-io/dagster into salazarm/…

8d0016d

…workspace-context-shiz

ts

6b85752

update test

80d60d2

delete indexeddb for removed locations

d2ee1cb

missing fragment

89b9f07

comment

79eabd8

salazarm commented Jun 6, 2024

View reviewed changes

salazarm added 5 commits June 6, 2024 23:21

tests

08a85d1

Merge branch 'master' of github.com:dagster-io/dagster into salazarm/…

5555631

…workspace-context-shiz

comment

4d7af4c

more defensive...

ec731a7

.

2afd178

salazarm added 2 commits June 7, 2024 00:55

fix toast

2d1879d

.

552d1ec

salazarm requested a review from alangenfeld June 7, 2024 20:02

bengotow approved these changes Jun 9, 2024

View reviewed changes

clear cache data

4c01aea

salazarm force-pushed the salazarm/workspace-context-shiz branch from 7b0be14 to 4c01aea Compare June 10, 2024 01:45

test mock

2997ccd

salazarm merged commit 30cc62a into master Jun 10, 2024
3 checks passed

salazarm deleted the salazarm/workspace-context-shiz branch June 10, 2024 02:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/n] Split RootWorkspaceQuery by code location #22296

[1/n] Split RootWorkspaceQuery by code location #22296

salazarm commented Jun 5, 2024 •

edited

Loading

github-actions bot commented Jun 5, 2024 •

edited

Loading

github-actions bot commented Jun 5, 2024 •

edited

Loading

alangenfeld left a comment

alangenfeld commented Jun 5, 2024

salazarm commented Jun 5, 2024

alangenfeld commented Jun 5, 2024

salazarm commented Jun 5, 2024

alangenfeld commented Jun 5, 2024 •

edited

Loading

salazarm Jun 6, 2024

bengotow Jun 8, 2024

salazarm commented Jun 7, 2024

bengotow left a comment

bengotow Jun 8, 2024

[1/n] Split RootWorkspaceQuery by code location #22296

[1/n] Split RootWorkspaceQuery by code location #22296

Conversation

salazarm commented Jun 5, 2024 • edited Loading

Summary & Motivation

How I Tested These Changes

github-actions bot commented Jun 5, 2024 • edited Loading

github-actions bot commented Jun 5, 2024 • edited Loading

alangenfeld left a comment

Choose a reason for hiding this comment

alangenfeld commented Jun 5, 2024

salazarm commented Jun 5, 2024

alangenfeld commented Jun 5, 2024

salazarm commented Jun 5, 2024

alangenfeld commented Jun 5, 2024 • edited Loading

salazarm Jun 6, 2024

Choose a reason for hiding this comment

bengotow Jun 8, 2024

Choose a reason for hiding this comment

salazarm commented Jun 7, 2024

bengotow left a comment

Choose a reason for hiding this comment

bengotow Jun 8, 2024

Choose a reason for hiding this comment

salazarm commented Jun 5, 2024 •

edited

Loading

github-actions bot commented Jun 5, 2024 •

edited

Loading

github-actions bot commented Jun 5, 2024 •

edited

Loading

alangenfeld commented Jun 5, 2024 •

edited

Loading