-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1/n] Split RootWorkspaceQuery by code location #22296
Conversation
Deploy preview for dagit-storybook ready! ✅ Preview Built with commit 2997ccd. |
Deploy preview for dagit-core-storybook ready! ✅ Preview Built with commit 2997ccd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exciting! lot of complexity here so think other reviewers eyes would be good
js_modules/dagster-ui/packages/ui-core/src/search/useIndexedDBCachedQuery.tsx
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/search/useIndexedDBCachedQuery.tsx
Outdated
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/workspace/WorkspaceQueries.tsx
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/workspace/WorkspaceContext.tsx
Outdated
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/workspace/WorkspaceContext.tsx
Outdated
Show resolved
Hide resolved
other thing to consider : what if we change the queries but use old cache data that doesn't have those fields. Do we need to include a hash of the query or something in the cache identifier ? |
@alangenfeld Yeah we need to be careful about that. The current mechanism is to update the "version" argument passed to indexeddb cache query, but yeah ideally this could be a hash of the query or perhaps the query itself.... If whats in the cache doesn't match then we don't use the cache in that case. |
can you just rewrite the app to Relay and persisted queries quick k thx |
@alangenfeld If you can figure out the backend then yes, it should be pretty straightforward xP |
looks like there might be some decent options with apollo to have operation hashes at codegen time runtime hashing might not be crazy either - i feel like our queries are not huge generally |
type EntriesById = Record<string, LocationStatusEntry>; | ||
|
||
export const useCodeLocationsStatus = (skip = false): StatusAndMessage | null => { | ||
const {locationEntries, refetch} = useContext(WorkspaceContext); | ||
export const codeLocationStatusAtom = atom<CodeLocationStatusQuery | undefined>({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exists for perf. I don't want to put this in WorkspaceContext since it's polled every few seconds and would end up causing a lot of components to re-render (most of which aren't using React.memo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good call to me!
Tests added, this PR is ready for full review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! I added comments inline -- generally it feels like we're doing a bit of extra work to build this on top of idb-lru-cache
because we don't want the LRU behavior, but the implementation looks correct.
I'm a little concerned about building a generalized caching solution on top of indexeddb instead of something like the Cache APIs. If someone (eg: me) comes along and throws a date or asset key, etc into a useIndexedDBCachedQuery
key by accident, we could really eat up a lot of disk space.
The WorkspaceContext
seems great and it's going to make such a huge difference!
[side rant] This really makes me miss the old REST API days... If we had a REST GET
per code location and the backend put the code location modification date into the etag
header (and then checked it in subsequent requests), the browser would essentially have done all this for us via 302 Not Modified. Really unfortunate that we get no cache primitives at all with GraphQL and have to build this in userland code :(
type EntriesById = Record<string, LocationStatusEntry>; | ||
|
||
export const useCodeLocationsStatus = (skip = false): StatusAndMessage | null => { | ||
const {locationEntries, refetch} = useContext(WorkspaceContext); | ||
export const codeLocationStatusAtom = atom<CodeLocationStatusQuery | undefined>({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good call to me!
js_modules/dagster-ui/packages/ui-core/src/search/useIndexedDBCachedQuery.tsx
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/search/useIndexedDBCachedQuery.tsx
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/search/useIndexedDBCachedQuery.tsx
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/search/useIndexedDBCachedQuery.tsx
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/search/useIndexedDBCachedQuery.tsx
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/workspace/WorkspaceContext.tsx
Show resolved
Hide resolved
js_modules/dagster-ui/packages/ui-core/src/workspace/WorkspaceContext.tsx
Outdated
Show resolved
Hide resolved
7b0be14
to
4c01aea
Compare
## Summary & Motivation This PR speeds up workspace fetching by splitting the RootWorkspaceQuery into multiple parallel fetches by code location. We now rely on the CodeLocationStatusQuery to determine these locations. To minimize delays, we cache the code location status query in IndexedDB, enabling immediate fetches after the initial load. It also enables us to fetch only changed code locations rather than needing to fetch all locations when a single one changes. The WorkspaceProvider is now responsible for refetching code locations when they change by comparing status updates to detect changes. This responsibility was previously with useCodeLocationsStatus, but to avoid a cyclical dependency, it has been moved to the WorkspaceProvider. useCodeLocationsStatus will continue handling toast updates and returning status information for downstream consumers, but now it uses data from the workspace context instead of making direct queries. (In a future PR we could collapse some of this functionality but I wanted to minimize the number of code changes in this PR). The skip parameter has been removed from useCodeLocationsStatus since it was never set to true in either cloud or OSS environments, both of which poll for code location updates. Tests now require mocking more queries to create the desired workspace context. To simplify this, a buildWorkspaceMocks function has been added. It accepts location entries as arguments and creates the necessary code location statuses query mock and workspace query mocks by location. Had to update testing-library/react to get tests to pass because the version we were on had some goofy react act stuff that was fixed. Next PRs in stack will be: - [Cloud PR updating workspace mocks / imports](dagster-io/internal#10126) - Fetching schedule/sensor state independently in LeftNavItem and useDaemonStatus (currently they depend on the workspace context but this doesnt refresh anymore) - Making OverviewSensors/OverviewSchedules/OverviewJobs etc. use the root workspace instead of fetching everything independently from scratch every time. ## How I Tested These Changes - Ran local host cloud and updated code locations. Tested cached loads with locations different and with locations the same making sure we didn't make extra queries if not necessary - Ran dagster-dev and made sure manual code location reloading still worked - Existing indexeddb test still pass covering the changes in that file - Relying on existing tests using WorkspaceProvider to make sure product functionality is still intact. A cloud PR will follow updating workspace mocks there
Summary & Motivation
This PR speeds up workspace fetching by splitting the RootWorkspaceQuery into multiple parallel fetches by code location. We now rely on the CodeLocationStatusQuery to determine these locations. To minimize delays, we cache the code location status query in IndexedDB, enabling immediate fetches after the initial load. It also enables us to fetch only changed code locations rather than needing to fetch all locations when a single one changes.
The WorkspaceProvider is now responsible for refetching code locations when they change by comparing status updates to detect changes. This responsibility was previously with useCodeLocationsStatus, but to avoid a cyclical dependency, it has been moved to the WorkspaceProvider. useCodeLocationsStatus will continue handling toast updates and returning status information for downstream consumers, but now it uses data from the workspace context instead of making direct queries. (In a future PR we could collapse some of this functionality but I wanted to minimize the number of code changes in this PR).
The skip parameter has been removed from useCodeLocationsStatus since it was never set to true in either cloud or OSS environments, both of which poll for code location updates.
Tests now require mocking more queries to create the desired workspace context. To simplify this, a buildWorkspaceMocks function has been added. It accepts location entries as arguments and creates the necessary code location statuses query mock and workspace query mocks by location.
Had to update testing-library/react to get tests to pass because the version we were on had some goofy react act stuff that was fixed.
Next PRs in stack will be:
How I Tested These Changes
A cloud PR will follow updating workspace mocks there