-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide unique seed to each test #600
Comments
Using a random number per test does not guarantee isolation in remote systems. The birthday problem makes the chance of collision between many tests high, at least for 32-bit identifiers. Why not use shared counter between tests to absolutely guarantee unique numbers? I am not exactly against the proposal though. It might make sense for the reason that unique random seeds will make a test suite use more random data, increasing the amount of fuzz testing. |
Actual scenario I encountered: Even accounting for the birthday problem, the chances of collision are very small. If the email addresses are unique then understanding the logs is a lot easier. You just look for the email address that relates to the test. You could even keep the database around and look at records related to that account. But if the "random" email addresses are the same in every test, that's unexpected and makes debugging a lot harder. |
I think I have the same request than @brycedrennan, but if not, I could make another issue, tell me: I would like to be able to call with --randomly-seed=1976373286, but to have that the seed used for all test
I think it allows to have tests with similar code, but to make sure that the real seeds that will be used in them are different, for more test coverage. It's basically what I had done in our https://github.com/zama-ai/concrete-ml/blob/main/conftest.py#L171
I would be happy to participate or do a PR for that, I'll just need a bit of guidance to make the right implementation choice. |
OK so actually, what I did with
If requested, I'll be happy to try to add it to the project! |
@bcm-at-zama That sounds like about the same request here. Rather than combine the filename and test name like you’ve done in your fixture, we should instead use the pytest “test ID”, which includes the path already, plus extra details like test parameters IIRC. I agree with @brycedrennan that we can actually ignore the birthday problem. I still don't like the idea of trying to rely on differing random seeds to avoid collisions. But it does sound like a good idea to increase test coverage. @bcm-at-zama , would you like to try creating a PR? Please include tests, a changelog note, and any relevant updates to the README. |
@adamchainz yeah its a good point. a global counter in a fixture (possibly combined with some more random data) would be a more thorough solution. On the other hand, sometimes tests are distributed among machines and a global counter would actually be a bit tricky to implement so it's nice to have options. |
Great @adamchainz : I don't know about test ID but I'll have a look. I'll try to make this PR in April! |
Description
Per #531 its unexpected that randomly generated values in different tests would be identical. It's common in test suites to use randomly generated identifiers to help maintain test isolation and provide easier-to-understand logs.
Examples:
One could use
--randomly-dont-reset-seed
but then they would lose the reproducibility and repeatability.I propose that the node_id of the test (and the test run seed) be used to deterministically generate a unique seed for that test (instead of all tests being reset to the same fixed value).
The text was updated successfully, but these errors were encountered: