Replies: 1 comment
-
Tracking issue: #8106 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Motivation
Faster rebuild, optimize rebuild time from
O(project)
toO(changes)
.User Guide
We will introduce
experiments.incremental
to allow users opt-in incremental rebuild, we will have three phases to stabilize it:Currently, incremental is enabled by default in
make
andemitAssets
stages, since Rspack v1.0, and we will enable more stages by default as incremental become stable in more stages.The detailed config is:
Detailed Design
Reads and Writes
If you consider the bundling process as reading and writing data on the module graph and chunk graph, distinguishing between read tasks and write tasks, where write tasks cause data changes, then dependent read tasks need to be updated, and read tasks dependent on those also need updating, until the task’s input hits the cache, and there is no need to continue bubbling up (cut off). This is a very direct idea for implementing incremental builds. The process needs to be abstracted into tasks, where the process is no longer simple functions, but requires storing the dependency relationships between tasks in a structure. The first run generates a task graph, and subsequent task updates only affect dependent tasks.
Turbopack does this, leading to most functions needing the
#[turbo_task]
macro, and the overall architecture is pull-based (or query-based) because it needs to declare the dependency relationships between tasks. The chunk rendering task and the module rendering tasks within the chunk have explicit dependency relationships; thus, a different rendering result for a module would cause a re-render of the corresponding chunk. Additionally, if a task is a global effect, then turbo_task cannot save it; at most, it can utilize the cache provided by turbo_task. Therefore, incremental builds still need optimized algorithms to avoid global effects.However, Rspack does not consider doing this. Rspack, like webpack, is push-based (or pass-based), where each stage advances one after another, and chunk rendering tasks and module rendering tasks within the chunk do not have clear dependency relationships. The module rendering results in the earlier stage are unperceived by the chunk rendering in the later stage. There is a significant architectural difference, and it would introduce too many macros, and making the incremental computation engine itself is extremely complex.
If Turbopack implements full automatic incremental builds through the turbo-tasks incremental computation engine, our goal could be a simpler, semi-automatic incremental build more aligned with existing architecture, using tasks in the hook as the granularity.
The Design
Distinguish between reading and writing data, record data writes as mutations, and before executing tasks in each hook, calculate the data affected by these mutations. Only re-execute tasks for those affected data. The core implementation lies in identifying and updating the affected data at each stage. Therefore, the switch on the config controls whether to find the affected data at this stage. If enabled, it searches for affected data; if not, it uses all the data.
fileDependencies
,contextDependencies
, andmissingDependencies
.buildModule
) as mutations.buildModule
), calculate the modules that need to be updated.setAsync
) as mutations.a -> b -> c -> d
, if we add a top levelawait
inc
, then the affected modules that need to be updated should bea
,b
, andc
, onlyd
is not affected, since adding a top levelawait
will change all the generated code from the module to its root module.buildModule
), calculate the modules that need to be updated.processModuleExports
) as mutations.a -> b -> c -> d
, if we change the exports of moduled
, then the affected modules that need to be updated should bec
, andd
, and if the modulec
has re-export, thenb
is also affected.buildModule
,setAsync
, andprocessModuleExports
), calculate the modules that need to be updated.moduleCodegen
) as mutations.a -> b -> c -> d -> e -> f
, if we add a top levelawait
inc
and change the exports of modulef
, then we need to update the codegen result of modulea
,b
,c
,e
, andf
, wherea
,b
,c
are caused by mutations recorded by infer async modules, ande
andf
are caused by mutations recorded by provided exports.In the example above, most of the modules need to be updated, but in actual projects, most modules do not need to be updated, only a small number of modules need to be updated. This will significantly reduce the computation during the rebuild and improve rebuild performance.
And as you can see, this affected-based incremental is not related about cache at all, it's just about finding the affected data and then update them, so this can be used with cache or without cache.
Summary
If you have read "Build Systems à la Carte: Theory and Practice", you will find that previously Rspack achieved the feature of 'minimality' through cache, but did not achieve 'early cutoff'. The lack of early cutoff led to insufficient minimality, which is a common problem in many existing bundlers and one of the reasons why bundler rebuilds are slow. Affected-based incremental collects mutations from each stage and connects the stages. This allows later stages to be aware of what actions were taken in earlier stages, thus enabling early cutoff of unrelated tasks in subsequent stages, and make Rspack rebuild faster.
Beta Was this translation helpful? Give feedback.
All reactions