ci: Rerun tests that fail due to networking issues. #2029

sam-hey · 2025-02-11T09:24:18Z

#2012: Rerun test cases that frequently fail in the pipeline. After 3 unsuccessful reruns, the test cases will fail.

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Closes: #2012
cc: @KennethEnevoldsen @Samoed @isaac-chung

sam-hey · 2025-02-11T09:28:24Z

I observed a requests.exceptions.ReadTimeout and

Failed test:
tests/test_tasks/test_all_abstasks.py::test_dataset_availability
Error:
AssertionError: Datasets not available on Hugging Face.

If there are any additional issues, let me know, and I'll add them to the PR.

Samoed · 2025-02-11T09:31:39Z

There is some more integrational tests

Samoed · 2025-02-11T09:43:29Z

It seems that we need to rerun tasks with JSONDecodeError, because sometimes results not loading correctly, like in last CI

sam-hey · 2025-02-11T09:44:49Z

There is some more integrational tests

* [test_benchmark_integration_with_sentencetransformers](https://github.com/embeddings-benchmark/mteb/blob/main/tests/test_benchmark/test_benchmark_integration_with_sentencetransformers.py)

* [test_benchmark_integration_with_datasets](https://github.com/embeddings-benchmark/mteb/blob/main/tests/test_benchmark/test_benchmark_integration_with_datasets.py)

@Samoed What exceptions do you have for these tests? I’d like to cover them using the CLI call since there will be more than one test, ensuring that all errors are caught.

Samoed · 2025-02-11T09:48:10Z

I thought similar like pytest.mark.flaky, but if this would cover with cli, then everything is fine

sam-hey · 2025-02-11T09:51:31Z

Currently, I am handling the requests.exceptions.ReadTimeout error through the CLI call, as I observed this issue in the actions. If there are additional exceptions occurring, they will also need to be identified and added to the list for rerun

Makefile

pyproject.toml

tests/test_benchmark/test_benchmark.py

KennethEnevoldsen · 2025-02-11T10:56:47Z

tests/test_tasks/test_all_abstasks.py

@@ -84,6 +84,12 @@ async def check_datasets_are_available_on_hf(tasks):
        assert False, f"Datasets not available on Hugging Face:\n{pretty_print}"


+@pytest.mark.flaky(


Any way why we simply can't use this only? I would def. prefer that

I would advocate doing this only for specific cases; otherwise, we would need to add the marker to many functions, and everyone adding a new test would have to consider it as well. Instead, I’ll add it as an argument in pyproject.toml. This should make it cleaner and apply to all cases automatically.

I think we only have 2 integration tests, the cli test, the reproducible workflow I believe. I would much rather add specific handling to these rather than add overall generic handling this also adds documentation to when it fails locally with e.g. HTTPConnectionErrror that this is an expected error for this test. If we add it in pyproject.toml most people will not see it.

I would argue that this is precisely the approach we should be pursuing. Since the issue appears to be related to networking on a part of the system that we do not control, I believe this is an error that new contributors should not need to account for when adding new tests. Additionally, long-time contributors might overlook this when reviewing the PR.

Therefore, I strongly recommend adding the relevant configurations to the pyproject.toml file. This would allow maintainers to easily document errors that affect all tests. Our goal should be to ensure that contributors, especially those who may not anticipate such issues, can contribute without unnecessary complexity.

Moreover, the test results indicate that the test was executed multiple times (the reruns) if it fails. While I do not expect the test to fail after this change, should it happen, we should consider increasing the timeouts to prevent future issues.

Well it is always possible to add after a PR and it is not an overly huge problem if it gets merged. I also in general don't think that integration tests should be added by new contributors.

@Samoed would love to get a second opinion here

We have some tests that download data from hf, or github. Obviously, test_benchmark_integration_with_sentencetransformers and test_benchmark_integration_with_datasets. Then

test_model_memory_usage

test_mteb_rerank

test_cli

test_encoder_interfaces

test_reproducible_workflow

I used the files instead of the test functions, and I think there are almost 10 (or maybe more) tests that involve a network connection. I don't think we should add a decorator to each of these tests and can use some global config (cli or pyproject)

isaac-chung · 2025-02-11T12:31:27Z

I think there are enough eyes on this PR. I'll sit this one out.

KennethEnevoldsen · 2025-02-11T15:54:11Z

tests/test_tasks/test_all_abstasks.py

@@ -84,6 +84,12 @@ async def check_datasets_are_available_on_hf(tasks):
        assert False, f"Datasets not available on Hugging Face:\n{pretty_print}"


+@pytest.mark.flaky(


I think we only have 2 integration tests, the cli test, the reproducible workflow I believe. I would much rather add specific handling to these rather than add overall generic handling this also adds documentation to when it fails locally with e.g. HTTPConnectionErrror that this is an expected error for this test. If we add it in pyproject.toml most people will not see it.

pyproject.toml

fix: rerun tests that fail - Networking

e296673

update tests to use tmp_path

58778e2

KennethEnevoldsen reviewed Feb 11, 2025

View reviewed changes

sam-hey added 3 commits February 11, 2025 12:37

set versions for dev dependencies

8231ed3

add pytest options to pyproject.toml

ac549ef

add rerun json.decoder.JSONDecodeError

fee559a

isaac-chung changed the title ~~fix: Rerun tests that fail due to networking issues.~~ ci: Rerun tests that fail due to networking issues. Feb 11, 2025

KennethEnevoldsen reviewed Feb 11, 2025

View reviewed changes

remove JSONDecodeError from pyproject.toml

16dd8ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: Rerun tests that fail due to networking issues. #2029

ci: Rerun tests that fail due to networking issues. #2029

sam-hey commented Feb 11, 2025 •

edited

Loading

sam-hey commented Feb 11, 2025

Samoed commented Feb 11, 2025

Samoed commented Feb 11, 2025

sam-hey commented Feb 11, 2025 •

edited

Loading

Samoed commented Feb 11, 2025

sam-hey commented Feb 11, 2025

KennethEnevoldsen Feb 11, 2025

sam-hey Feb 11, 2025

KennethEnevoldsen Feb 11, 2025

sam-hey Feb 11, 2025

KennethEnevoldsen Feb 11, 2025

Samoed Feb 11, 2025

isaac-chung commented Feb 11, 2025

KennethEnevoldsen Feb 11, 2025

		@@ -84,6 +84,12 @@ async def check_datasets_are_available_on_hf(tasks):
		assert False, f"Datasets not available on Hugging Face:\n{pretty_print}"


		@pytest.mark.flaky(

ci: Rerun tests that fail due to networking issues. #2029

Are you sure you want to change the base?

ci: Rerun tests that fail due to networking issues. #2029

Conversation

sam-hey commented Feb 11, 2025 • edited Loading

Code Quality

Documentation

Testing

sam-hey commented Feb 11, 2025

Samoed commented Feb 11, 2025

Samoed commented Feb 11, 2025

sam-hey commented Feb 11, 2025 • edited Loading

Samoed commented Feb 11, 2025

sam-hey commented Feb 11, 2025

KennethEnevoldsen Feb 11, 2025

Choose a reason for hiding this comment

sam-hey Feb 11, 2025

Choose a reason for hiding this comment

KennethEnevoldsen Feb 11, 2025

Choose a reason for hiding this comment

sam-hey Feb 11, 2025

Choose a reason for hiding this comment

KennethEnevoldsen Feb 11, 2025

Choose a reason for hiding this comment

Samoed Feb 11, 2025

Choose a reason for hiding this comment

isaac-chung commented Feb 11, 2025

KennethEnevoldsen Feb 11, 2025

Choose a reason for hiding this comment

sam-hey commented Feb 11, 2025 •

edited

Loading

sam-hey commented Feb 11, 2025 •

edited

Loading