-
Notifications
You must be signed in to change notification settings - Fork 54
Balanced fps from Issue #518 #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
baybe/utils/sampling_algorithms.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this deserves a CHANGELOG entry under ### Changed
# Examples where this can make a difference is three points forming an equilateral | ||
# triangle or four points spanning a unit cube. Here, tie-breaking operations such | ||
# as `np.max` can lead to different results depending on the order. | ||
permutation_idxs = np.random.permutation(len(points)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how the removal of this is necessary, but the new implementation only affects edge cases where there are degenerate distances in points
, right?
So for all other cases I think we still want the determinism to hold as tested here
So one could simply keep this case and skip it if there are degenerate distances in points
. Alternatively, one could drop points corresponding to such problematic cases and always have this test afterwards
should wait for what @AdrianSosic's opinion on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the change affects only the cases with degenerate distances. Since the test checks such a case (a set of one-hot encoded points), I opted for removing it. Alternatively, one could also add a test with a different set of points, for which it should be deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @StefanPSchmid, thanks for opening the PR and apologize the delay (last week was crazy, no time for nothing ...).
So there are two thoughts coming to mind I'd like to discuss with both of you:
- When there are no degenerate distances, I think testing for determinism is not super crucial (but wouldn't hurt either) because the determinism test was added specifically for those degenerate cases only. That said, I'm most concerned about the reason why we actually added this, which I unfortunately do not remember precisely 🙈. I think it had something to do with how distances appeared in substance parameters and it was added in the context where we fought against parameter ordering issues. Perhaps you remember, @Scienfitz? Whatever we do, we should take care not to destroy that part again...
- Also, if we add the stochasticity, it might be worth considering whether we want to make that configurable somehow, e.g. via some additional Boolean flag that controls tie-breaking behavior (with the new default of using stochastic breaking). Any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AdrianSosic for the comments. I hope this week is going better for you.
Regarding your thoughts:
- I think for this we should wait for @Scienfitz to see why you did this in the first place. I agree though that I don't want it to break your code :)
- Agreed, that would be a good idea. It could then be a simple boolean flag similar to
initialization
that already is in this function and triggers an if that uses the deterministic (old) or stochastic (new) flag. On that note, I want to bring to your attention that theFPSRecommender
, which callsfarthest_point_sampling
, actually never uses theinitialization
argument (in https://github.com/emdgroup/baybe/blob/main/baybe/recommenders/pure/nonpredictive/sampling.py, line 77). So, for the user to actually be able to change something, theFPSRecommender
class would also require some changes, unless I am overlooking something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AdrianSosic I can think of no more additional reason why you wanted full determinism, it just seems nice to have it. The permutation poblem was also sort of solved from another perspective by auto-sorting parameters. But in the degen distance situation a random tie break seems more reasonable to me.
--> If we make it configurable its best of both worlds, right? The initial point selection is already configurable (at least in the utility).
For me its literally just about the test here, if only the edge case is affected imo the permutation test should stay and should only be skipped if there are degen distances.
As for adding the flag, it would probably require two lines changed in FPSRecommender
:
- adding the flag as cattrs attribute (similar eg to here) (guess we would also set default to True), naming: something like
random_tie_break
or similar - adjusting the call to
farthest_point_sampling
passing the flag as new argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you added this test specifically to test deterministic outcomes for degenerate cases, we could also do a test if it is deterministic, when that flag is applied. This would combine the test of determinism and degenerate case.
Once we cleared up what exactly to test, I am happy to implement all the changes we discussed in the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So because I don't remember any other reason that speaks against it, all suggestions sound like a good idea to me 😄 And @StefanPSchmid: thanks for pointing out the missing argument, we can also add that in the same PR I guess 👍🏼
Change to farthest point sampling as outlined in Issue 518.
Since this change makes fps undeterministic, the corresponding test was removed as well.