Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make fingerprint feature deterministic #151

Merged
merged 1 commit into from
Jan 22, 2025
Merged

Conversation

LennartPurucker
Copy link
Collaborator

Using Python's built-in hash is not consistent between runs due to salting.

Thus, the fingerprint feature using the built-in hash is not deterministic, making the input data non-deterministic and making the predictions non-deterministic between runs. Hashlib's hash is consistent between runs and thus does not face this problem.

This broke during the refactoring. The commit below is essentially reverting a commit that happened during the refactor.

FYI: @eddiebergman

Copy link
Collaborator

@LeoGrin LeoGrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lennart, LGTM! I would maybe just add a test that TabPFN is deterministic when setting the random state? I thought that scikit-learn checks would already check that but they were passing before. (I checked quickly and the only failing test test_sklearn_compatible_estimator[TabPFNRegressor()-check_methods_sample_order_invariance] still fails with this PR)

@LennartPurucker
Copy link
Collaborator Author

We cannot test for this, as it requires separate Python runs (not just separate calls).

And test_sklearn_compatible_estimator[TabPFNRegressor()-check_methods_sample_order_invariance] fails due to precision and not randomness.

@LeoGrin
Copy link
Collaborator

LeoGrin commented Jan 22, 2025

Ah yeah ok the test does look hard to do! LGTM then!

@eddiebergman
Copy link
Collaborator

eddiebergman commented Jan 22, 2025

Ah damn, apologies. @LeoGrin here's the reference Lennart's refering to:
https://docs.python.org/3/reference/datamodel.html#object.hash

Scroll to the bottom of that function

@LennartPurucker LennartPurucker merged commit aadaf75 into main Jan 22, 2025
5 checks passed
@LennartPurucker LennartPurucker deleted the fix_fp_feature branch January 22, 2025 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants