Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Rerun tests that fail due to networking issues. #2029

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sam-hey
Copy link
Contributor

@sam-hey sam-hey commented Feb 11, 2025

#2012: Rerun test cases that frequently fail in the pipeline. After 3 unsuccessful reruns, the test cases will fail.

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Closes: #2012
cc: @KennethEnevoldsen @Samoed @isaac-chung

@sam-hey
Copy link
Contributor Author

sam-hey commented Feb 11, 2025

I observed a requests.exceptions.ReadTimeout and

Failed test:
tests/test_tasks/test_all_abstasks.py::test_dataset_availability
Error:
AssertionError: Datasets not available on Hugging Face.

If there are any additional issues, let me know, and I'll add them to the PR.

@Samoed
Copy link
Collaborator

Samoed commented Feb 11, 2025

@Samoed
Copy link
Collaborator

Samoed commented Feb 11, 2025

It seems that we need to rerun tasks with JSONDecodeError, because sometimes results not loading correctly, like in last CI

@sam-hey
Copy link
Contributor Author

sam-hey commented Feb 11, 2025

There is some more integrational tests

* [test_benchmark_integration_with_sentencetransformers](https://github.com/embeddings-benchmark/mteb/blob/main/tests/test_benchmark/test_benchmark_integration_with_sentencetransformers.py)

* [test_benchmark_integration_with_datasets](https://github.com/embeddings-benchmark/mteb/blob/main/tests/test_benchmark/test_benchmark_integration_with_datasets.py)

@Samoed What exceptions do you have for these tests? I’d like to cover them using the CLI call since there will be more than one test, ensuring that all errors are caught.

@Samoed
Copy link
Collaborator

Samoed commented Feb 11, 2025

I thought similar like pytest.mark.flaky, but if this would cover with cli, then everything is fine

@sam-hey
Copy link
Contributor Author

sam-hey commented Feb 11, 2025

Currently, I am handling the requests.exceptions.ReadTimeout error through the CLI call, as I observed this issue in the actions. If there are additional exceptions occurring, they will also need to be identified and added to the list for rerun

Makefile Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
tests/test_benchmark/test_benchmark.py Show resolved Hide resolved
@@ -84,6 +84,12 @@ async def check_datasets_are_available_on_hf(tasks):
assert False, f"Datasets not available on Hugging Face:\n{pretty_print}"


@pytest.mark.flaky(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way why we simply can't use this only? I would def. prefer that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advocate doing this only for specific cases; otherwise, we would need to add the marker to many functions, and everyone adding a new test would have to consider it as well. Instead, I’ll add it as an argument in pyproject.toml. This should make it cleaner and apply to all cases automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only have 2 integration tests, the cli test, the reproducible workflow I believe. I would much rather add specific handling to these rather than add overall generic handling this also adds documentation to when it fails locally with e.g. HTTPConnectionErrror that this is an expected error for this test. If we add it in pyproject.toml most people will not see it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that this is precisely the approach we should be pursuing. Since the issue appears to be related to networking on a part of the system that we do not control, I believe this is an error that new contributors should not need to account for when adding new tests. Additionally, long-time contributors might overlook this when reviewing the PR.

Therefore, I strongly recommend adding the relevant configurations to the pyproject.toml file. This would allow maintainers to easily document errors that affect all tests. Our goal should be to ensure that contributors, especially those who may not anticipate such issues, can contribute without unnecessary complexity.

Moreover, the test results indicate that the test was executed multiple times (the reruns) if it fails. While I do not expect the test to fail after this change, should it happen, we should consider increasing the timeouts to prevent future issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it is always possible to add after a PR and it is not an overly huge problem if it gets merged. I also in general don't think that integration tests should be added by new contributors.

@Samoed would love to get a second opinion here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some tests that download data from hf, or github. Obviously, test_benchmark_integration_with_sentencetransformers and test_benchmark_integration_with_datasets. Then

I used the files instead of the test functions, and I think there are almost 10 (or maybe more) tests that involve a network connection. I don't think we should add a decorator to each of these tests and can use some global config (cli or pyproject)

@isaac-chung isaac-chung changed the title fix: Rerun tests that fail due to networking issues. ci: Rerun tests that fail due to networking issues. Feb 11, 2025
@isaac-chung
Copy link
Collaborator

I think there are enough eyes on this PR. I'll sit this one out.

@@ -84,6 +84,12 @@ async def check_datasets_are_available_on_hf(tasks):
assert False, f"Datasets not available on Hugging Face:\n{pretty_print}"


@pytest.mark.flaky(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only have 2 integration tests, the cli test, the reproducible workflow I believe. I would much rather add specific handling to these rather than add overall generic handling this also adds documentation to when it fails locally with e.g. HTTPConnectionErrror that this is an expected error for this test. If we add it in pyproject.toml most people will not see it.

pyproject.toml Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: Seperate out tests that fail due to HTTP calls
4 participants