Skip to content

Balanced fps from Issue #518 #519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

StefanPSchmid
Copy link

Change to farthest point sampling as outlined in Issue 518.
Since this change makes fps undeterministic, the corresponding test was removed as well.

@CLAassistant
Copy link

CLAassistant commented Mar 27, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this deserves a CHANGELOG entry under ### Changed

# Examples where this can make a difference is three points forming an equilateral
# triangle or four points spanning a unit cube. Here, tie-breaking operations such
# as `np.max` can lead to different results depending on the order.
permutation_idxs = np.random.permutation(len(points))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how the removal of this is necessary, but the new implementation only affects edge cases where there are degenerate distances in points, right?

So for all other cases I think we still want the determinism to hold as tested here

So one could simply keep this case and skip it if there are degenerate distances in points. Alternatively, one could drop points corresponding to such problematic cases and always have this test afterwards
should wait for what @AdrianSosic's opinion on this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the change affects only the cases with degenerate distances. Since the test checks such a case (a set of one-hot encoded points), I opted for removing it. Alternatively, one could also add a test with a different set of points, for which it should be deterministic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @StefanPSchmid, thanks for opening the PR and apologize the delay (last week was crazy, no time for nothing ...).

So there are two thoughts coming to mind I'd like to discuss with both of you:

  • When there are no degenerate distances, I think testing for determinism is not super crucial (but wouldn't hurt either) because the determinism test was added specifically for those degenerate cases only. That said, I'm most concerned about the reason why we actually added this, which I unfortunately do not remember precisely 🙈. I think it had something to do with how distances appeared in substance parameters and it was added in the context where we fought against parameter ordering issues. Perhaps you remember, @Scienfitz? Whatever we do, we should take care not to destroy that part again...
  • Also, if we add the stochasticity, it might be worth considering whether we want to make that configurable somehow, e.g. via some additional Boolean flag that controls tie-breaking behavior (with the new default of using stochastic breaking). Any thoughts here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AdrianSosic for the comments. I hope this week is going better for you.

Regarding your thoughts:

  • I think for this we should wait for @Scienfitz to see why you did this in the first place. I agree though that I don't want it to break your code :)
  • Agreed, that would be a good idea. It could then be a simple boolean flag similar to initialization that already is in this function and triggers an if that uses the deterministic (old) or stochastic (new) flag. On that note, I want to bring to your attention that the FPSRecommender, which calls farthest_point_sampling, actually never uses the initialization argument (in https://github.com/emdgroup/baybe/blob/main/baybe/recommenders/pure/nonpredictive/sampling.py, line 77). So, for the user to actually be able to change something, the FPSRecommender class would also require some changes, unless I am overlooking something.

Copy link
Collaborator

@Scienfitz Scienfitz Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdrianSosic I can think of no more additional reason why you wanted full determinism, it just seems nice to have it. The permutation poblem was also sort of solved from another perspective by auto-sorting parameters. But in the degen distance situation a random tie break seems more reasonable to me.
--> If we make it configurable its best of both worlds, right? The initial point selection is already configurable (at least in the utility).

For me its literally just about the test here, if only the edge case is affected imo the permutation test should stay and should only be skipped if there are degen distances.

As for adding the flag, it would probably require two lines changed in FPSRecommender:

  • adding the flag as cattrs attribute (similar eg to here) (guess we would also set default to True), naming: something like random_tie_break or similar
  • adjusting the call to farthest_point_sampling passing the flag as new argument.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you added this test specifically to test deterministic outcomes for degenerate cases, we could also do a test if it is deterministic, when that flag is applied. This would combine the test of determinism and degenerate case.

Once we cleared up what exactly to test, I am happy to implement all the changes we discussed in the PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So because I don't remember any other reason that speaks against it, all suggestions sound like a good idea to me 😄 And @StefanPSchmid: thanks for pointing out the missing argument, we can also add that in the same PR I guess 👍🏼

@Scienfitz Scienfitz added the enhancement Expand / change existing functionality label Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Expand / change existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants