xdist
works by spawning one or more workers, which are controlled
by the master. Each worker is responsible for performing
a full test collection and afterwards running tests as dictated by the master.
The execution flow is:
-
master spawns one or more workers at the beginning of the test session. The communication between master and worker nodes makes use of execnet and its gateways. The actual interpreters executing the code for the workers might be remote or local.
-
Each worker itself is a mini pytest runner. workers at this point perform a full test collection, sending back the collected test-ids back to the master which does not perform any collection itself.
-
The master receives the result of the collection from all nodes. At this point the master performs some sanity check to ensure that all workers collected the same tests (including order), bailing out otherwise. If all is well, it converts the list of test-ids into a list of simple indexes, where each index corresponds to the position of that test in the original collection list. This works because all nodes have the same collection list, and saves bandwidth because the master can now tell one of the workers to just execute test index 3 index of passing the full test id.
-
If dist-mode is each: the master just sends the full list of test indexes to each node at this moment.
-
If dist-mode is load: the master takes around 25% of the tests and sends them one by one to each worker in a round robin fashion. The rest of the tests will be distributed later as workers finish tests (see below).
-
Note that
pytest_xdist_make_scheduler
hook can be used to implement custom tests distribution logic. -
workers re-implement
pytest_runtestloop
: pytest's default implementation basically loops over all collected items in thesession
object and executes thepytest_runtest_protocol
for each test item, but in xdist workers sit idly waiting for master to send tests for execution. As tests are received by workers,pytest_runtest_protocol
is executed for each test. Here it worth noting an implementation detail: workers always must keep at least one test item on their queue due to how thepytest_runtest_protocol(item, nextitem)
hook is defined: in order to pass thenextitem
to the hook, the worker must wait for more instructions from master before executing that remaining test. If it receives more tests, then it can safely callpytest_runtest_protocol
because it knows what thenextitem
parameter will be. If it receives a "shutdown" signal, then it can execute the hook passingnextitem
asNone
. -
As tests are started and completed at the workers, the results are sent back to the master, which then just forwards the results to the appropriate pytest hooks:
pytest_runtest_logstart
andpytest_runtest_logreport
. This way other plugins (for examplejunitxml
) can work normally. The master (when in dist-mode load) decides to send more tests to a node when a test completes, using some heuristics such as test durations and how many tests each worker still has to run. -
When the master has no more pending tests it will send a "shutdown" signal to all workers, which will then run their remaining tests to completion and shut down. At this point the master will sit waiting for workers to shut down, still processing events such as
pytest_runtest_logreport
.
Why does each worker do its own collection, as opposed to having the master collect once and distribute from that collection to the workers?
If collection was performed by master then it would have to serialize collected items to send them through the wire, as workers live in another process. The problem is that test items are not easily (impossible?) to serialize, as they contain references to the test functions, fixture managers, config objects, etc. Even if one manages to serialize it, it seems it would be very hard to get it right and easy to break by any small change in pytest.