-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre computed file facts as options #246
base: main
Are you sure you want to change the base?
Conversation
More context: #236 (comment) Pasting here for quick read. @Boshen, for reference this is an implementation of the same Facts pattern used in Hacklang and used (more complexly) in turborepo and is behind how Haste operates at Meta The overall structure is as follows:
Depending what info you extract out, this leaves us with ~50MB of serialisable state which can answer all the cross-file questions we have on our large repo. Locally, as devs are rotating this cache (cache -> update -> cache). We also have our CI output this (just a dumb zstd S3 store of it, which packs to ~15MB), which allows our devs to start from their mergebase-with-master's cache when they switch branches or take large jumps across many commits (as described in watchman's SCMQuery docs) For some numbers, with oxc_parser we can do this in 8 seconds (without a cache), 2 seconds (with a cache) and 200ms for a delta update. Downstream steps (resolution, linting etc) can read that cache and do the work they need, already knowing the most importing info for them, and hence why we're now limited by resolution (which we do at run-time for each downstream step, due to it not being cachable) We use (/ plan to use) this for:
|
Question: I assume you are using I understood the code and requirements. The next step is to decide whether exposing these as a trait / plugin API or a feature flag. We'll make the decision together with Tom once we understand the broader picture. |
@Boshen yes, we are using it as a crate. I would be keen to follow the conversation you folks have if its possible. FWIW there are even more optimisations possible using this approach:
For cases where we don't want to resolve external dependencies, the whole thing can work without even cloning the repo. |
Motivation
Make oxc-resolver work faster by using pre-computed facts to avoid reading the file system.
Background
We have two types of pre-computed facts in a cache:
We calculate these facts by using
oxc-parser
and store them in a remote cache for our repo (which has ~200k files and growing). Anytime someone creates a branch and changes some files, we only recompute the facts for the changed files and reuse the other facts. Because of this process, computing facts is very cheap once we have a cache in place.Using these facts, we speed up the resolution process by:
read_to_string
forpackage.json
files by using values from package factsis_file
function by asserting against the list of filesis_dir
callsBenchmarks
Resolution for 430k
.resolve
calls for relative imports and package imports (packages that exist in package facts):Using facts: ~2s
Without using facts: ~8s