-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tail call VM #17849
base: master
Are you sure you want to change the base?
Tail call VM #17849
Conversation
@@ -313,6 +313,18 @@ char *alloca(); | |||
# define ZEND_FASTCALL | |||
#endif | |||
|
|||
#if __has_attribute(preserve_none) && !defined(__SANITIZE_ADDRESS__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an incompatibility between preserve_none
and ASAN, which crashes Clang. I will report the issue.
@@ -8212,9 +8212,9 @@ ZEND_VM_HANDLER(150, ZEND_USER_OPCODE, ANY, ANY) | |||
case ZEND_USER_OPCODE_LEAVE: | |||
ZEND_VM_LEAVE(); | |||
case ZEND_USER_OPCODE_DISPATCH: | |||
ZEND_VM_DISPATCH(opline->opcode, opline); | |||
ZEND_VM_DISPATCH_OPCODE(opline->opcode, opline); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed this rarely used macro so I could re-use its name
@@ -1,3 +1,5 @@ | |||
#include "Zend/zend_vm_opcodes.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes language servers / IDEs happy when viewing zend_vm_execute.h
$str .= "#include <main/php_config.h>\n"; | ||
$str .= "#include \"Zend/zend_portability.h\"\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes language servers / IDEs happy when viewing zend_vm_opcodes.h
Interesting work! I suppose this will require special support for JIT. |
Yes this does require some changes to the JIT to accommodate for the new opcode handler signature and how FP/IP are passed around. I plan to implement them unless there are major issues with the current approach. The fact that
The second one seems reasonable to me. |
@@ -21,6 +21,9 @@ | |||
#ifndef ZEND_VM_OPCODES_H | |||
#define ZEND_VM_OPCODES_H | |||
|
|||
#include <main/php_config.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid this dependency on main?
HYBRID VM generates two handlers for each opcode (C function with standard ABI + non standard GOTO). JIT uses one or the other when suitable. Technically, tail call does the same GOTO, so the same approach might work. CLANG doesn't support global register variables. LLVM may achieve similar thing, using custom calling convention that pin arguments to registers (this technique used for Haskel, Erlang, HHVM ...). Unfortunately, I didn't found a way to introduce new calling convention without LLVM patching (cool OOP style). Using them in CLANG was also problematic. It was long time ago and may be something is changed. |
BTW LLVM/CLANG should support local register variables. So maybe GOTO and HYBRID VMs may be adopted. |
This implements the technique described in https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html, which addresses the issues described in http://lua-users.org/lists/lua-l/2011-02/msg00742.html. Python recently implemented this, which resulted in a 9-15% performance improvements: https://blog.reverberate.org/2025/02/10/tail-call-updates.html.
It turns out that @dstogov already addressed these by using a different technique, enabled when compiling with GCC, so this will not improve performances with this compiler, but it makes PHP on Clang as fast as on GCC.
Benchmarks
Zend/bench.php
:PHP/Clang was 77% slower in this benchmark, now only 1% slower.
Symfony Demo
:PHP/Clang was 5% slower in this benchmark.
Current interpreter
The interpreter is generated by
Zend/zend_vm_gen.php
. Multiple modes are supported, but the default (and only supported mode) is the hybrid one, which generates both a call-based interpreter and a GCC-specific interpreter. Which one is actually compiled depends on the compiler being used.In the call-based interpreter, op code handlers are separate functions, the next
opline
to execute is stored inexecute_data
, andexecute_data
is passed as argument to op handlers:Handlers typically load
execute_data->opline
, execute the operation, updateexecute_data->opline
, and return.There is quite a lot of overhead: The
call
instruction pushes a return address on the stack, the function saves/spills registers, etc. E.g. the code ofZEND_INIT_FCALL_SPEC_CONST_HANDLER()
starts withAlso,
opline
needs to be loaded/stored from/to memory.The GCC interpreter manages to eliminate the overhead.
opline->handler
is a computed-goto target, which calls the actual handler. Hot handlers are inlined, FP/IP (execute_data
/opline
) are register variables, handlers take no arguments and have no return value:Changes
Here I had a variation of the call-based interpreter, enabled when using clang-19:
execute_data
andopline
are passed as op handler arguments, so they are always in registers unless they are spilled on the stackpreserve_none
calling convention: reduces register save/spills.The
musttail
attribute is used to force tail calling.Unfortunately
musttail
rejects calls to function whose signature is not compatible with the caller, so it's not possible to tail call VM helpers that have extra parameters. Instead, we use a trampoline when calling these: The helper returns astruct{opline,handler}
(in two registers) which is then tail called by the caller. Since helpers always return (unless they call other helpers), the stack doesn't grow indefinitely:I introduce a
ZEND_VM_DISPATCH()
macro that is used byZEND_VM_NEXT_OPCODE()
and related macros. This macro tail calls the next opline by default. In VM helpers with extra parameters,ZEND_DISPATCH()
is redefined to return the trampoline value instead:Caveats
__attribute__((preserve_none))
is not stable, so we might not use it in exported functions. This has implications for JIT and user opcode handlers. We might need to generate wrappers with a stable convention.opline
as argument and__attribute__((preserve_none))
) to reduce the differences between the call-based interpreter and the clang one.TODO
opline
as argument, without other changes. Maybe do that by default?__attribute__((preserve_none))
, without other changesFuture scope:
preserve_none
/preserve_most
/ slow paths