-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request - support remote persistent workers #776
Comments
I brought up the topic of remote persistent workers in the Remote Execution API Working Group meeting on 2024-10-08. The current status of remote persistent worker support under Bazel is that of the 2021-03-06-remote-persistent-workers proposal. However, it is not a properly standardized feature under the remote execution protocol. Some remote execution systems do already support that proposal: To my knowledge, BuildFarm, BuildBuddy, and EngFlow. Others have not yet implemented it and some are hesitant to implement it until it’s standardized. The outcome of the discussion in the remote execution working group was that there is appetite to standardize this feature. There are some concerns with the current remote persistent worker proposal, in particular potential resource leakage and lack of multiplexing worker support were mentioned. An advantage of the 2021-03-06 proposal is that it is automatically backward compatible with remote execution systems that do not support persistent workers. There is also a difference in the worker protocol itself, in particular the Bazel worker protocol uses length prefixed protobuf objects over stdin/stdout, which can cause issues when workers inadvertently write to stdout, e.g. due to underlying libraries used. Buck2 on the other hand uses a gRPC protocol over Unix domain sockets, which doesn’t have that issue. One path that was proposed in the meeting was to first define a persistent worker protocol standard and then use that in the remote execution protocol. To my knowledge Meta is currently revisiting the persistent worker protocol and the internal remote persistent worker support. This may be a good opportunity to also keep the open source remote execution protocol in mind. In the working group meeting @mostynb, @allada, and @ulfjack stated that they would be interested in participating in the discussion. @christolliday if this sounds interesting to Meta, could you perhaps share your thoughts on a possible future standardized remote persistent worker feature in the remote execution protocol? |
I'm also interested in participating in discussions surrounding remote persistent workers. |
Please count me as interested as well (from the Bazel side). |
To provide an update here. There's been progress on getting the prototype implementation in #787 closer to a mergeable state by adding testing in Buck2 CI. There is still a bit of coordination in progress on the CI side. Once that's in, the next thing I am planning to do on this topic is to review the existing persistent worker protocols, collect their capabilities, and think about potential additional needs for the remote execution use-case. |
Buck2 has support for persistent workers, however, these are only available for locally executed actions. In contrast, Bazel supports remote persistent workers, see also BuildBuddy docs.
Persistent workers can provide large performance benefits, the Bazel documentation reports 2-4x speed-ups for Java, preliminary experiments on our Haskell builds have shown about 3x speed-ups. Without support for persistent workers in remote execution environments, users have to make a trade-off between the speed-up provided by a persistent worker and the speed-up provided by scaling to many build nodes. It would be preferable if both speed-ups could be combined by supporting persistent workers in remote execution environments.
The text was updated successfully, but these errors were encountered: