Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: run with pytest-xdist and split dask client / not client tests #1278

Merged
merged 11 commits into from
Feb 19, 2025

Conversation

andrzejnovak
Copy link
Member

Will it run faster?

@andrzejnovak
Copy link
Member Author

Faster by about ~half @lgray

Before:
image

After:
image

@ikrommyd
Copy link
Collaborator

@andrzejnovak
Copy link
Member Author

Also whoa, some of these are expensive

88.72s call     tests/test_jetmet_tools.py::test_corrected_jets_factory[True]
71.57s call     tests/test_lookup_tools.py::test_rochester
64.48s call     tests/test_jetmet_tools.py::test_corrected_jets_factory[False]
52.56s call     tests/test_analysis_tools.py::test_packed_selection_cutflow_dak[False]
47.27s call     tests/test_analysis_tools.py::test_packed_selection_nminusone_dak[False]
39.85s call     tests/test_analysis_tools.py::test_packed_selection_nminusone_dak[True]
26.03s call     tests/test_analysis_tools.py::test_packed_selection_cutflow_dak[True]
14.82s call     tests/test_dataset_tools.py::test_apply_to_fileset[proc_and_schema0]
12.86s call     tests/test_analysis_tools.py::test_packed_selection_cutflow_dak_uproot_only[False]
11.18s call     tests/test_analysis_tools.py::test_packed_selection_nminusone_dak_uproot_only[False]
10.60s call     tests/test_dataset_tools.py::test_apply_to_fileset[proc_and_schema1]
10.17s call     tests/test_btag.py::test_BTagScalefactor
9.36s call     tests/test_dataset_tools.py::test_recover_failed_chunks
8.45s call     tests/test_nanoevents.py::test_missing_eventIds_warning_dask
8.32s call     tests/test_nanoevents_vector.py::test_photon_zero_mass_charge[False]
7.63s call     tests/test_dataset_tools.py::test_apply_to_fileset_hinted_form
6.79s call     tests/test_nanoevents.py::test_read_nanomc[root]
5.90s call     tests/test_nanoevents_vector.py::test_dask_metric_table_and_nearest[False]
5.61s call     tests/test_lumi_tools.py::test_1259_avoid_pickle_numba_dict
5.52s call     tests/test_nanoevents_vector.py::test_photon_zero_mass_charge[True]
5.35s call     tests/test_analysis_tools.py::test_packed_selection_cutflow_dak_uproot_only[True]
5.09s call     tests/test_dataset_tools.py::test_preprocess_calculate_form
4.88s call     tests/test_analysis_tools.py::test_packed_selection_nminusone_dak_uproot_only[True]
4.35s call     tests/test_lumi_tools.py::test_lumilist_client_fromfile

@pfackeldey
Copy link
Contributor

pfackeldey commented Feb 19, 2025

This is great! the integration-tests are pretty much bound mainly by the runtime of the coffea tests, so I'm very happy to see such improvements 🎉

We'd have to use pytest-xdist also for the integration-tests then. It would be nice if it's possible to speedup the tests directly too?

@lgray
Copy link
Collaborator

lgray commented Feb 19, 2025

This is interesting - I will read about xdist...

@lgray
Copy link
Collaborator

lgray commented Feb 19, 2025

Some of those tests are necessarily insanely expensive but the reason they're expensive is because of the number of branches being read in. We could likely thin the file to just the branches being used (or slightly more) and it would significantly speed up computation time while covering the same bits of code.

Repeatedly checking delayed on/off is also going to get worse when we add virtual arrays.

@lgray
Copy link
Collaborator

lgray commented Feb 19, 2025

ah, ok, we are faster because we are multiprocessing. This may work against us in some of the tests since they spawn dask clients @andrzejnovak.

@andrzejnovak andrzejnovak changed the title ci: try pytest-xdist ci: run with pytest-xdist and split dask client / not client tests Feb 19, 2025
@andrzejnovak
Copy link
Member Author

This is ready (should be squashed in the merge)

@NJManganelli NJManganelli mentioned this pull request Feb 19, 2025
4 tasks
@lgray lgray merged commit da39e37 into master Feb 19, 2025
22 checks passed
pfackeldey pushed a commit that referenced this pull request Feb 20, 2025
…1278)

* ci: try pytest-xdist

* Update ci.yml

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client

* ci: paralellize tests that don't depend on dask-client
@lgray
Copy link
Collaborator

lgray commented Feb 21, 2025

@andrzejnovak could you make a corresponding PR over in https://github.com/scikit-hep/integration-tests ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants