-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Skip callbacks with dead weakrefs while iterating over callbacks #2310
fix: Skip callbacks with dead weakrefs while iterating over callbacks #2310
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wernerd-cern Can you please report what the actual bug is and give a reproducer as well for documentation? It would be better to make an Issue that clearly documents the bug and then have this PR close it. This PR will need tests as well.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #2310 +/- ##
==========================================
+ Coverage 97.61% 98.28% +0.66%
==========================================
Files 69 69
Lines 4535 4538 +3
Branches 802 803 +1
==========================================
+ Hits 4427 4460 +33
+ Misses 65 45 -20
+ Partials 43 33 -10
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
For reviewer reference, the last time this code was edited was in PR #1035. |
When events (such as 'tensorlib_changed') are called, a bug occured previously where objects (i.e. the `TensorViewer`) are already dead. In this case an error occured when member functions where using attributes such as `self._sorted_indices`. This bug is fixed by catching the cases, in which the object is invalidated by earlier callbacks.
for more information, see https://pre-commit.ci
08cf346
to
b138897
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is indeed needed, to be able to do what we do in our tests without providing all of conftest.py
's machinery. So this should go in and then a patch release made from it. 👍
This code doesn't have full coverage though, so appropriate tests will need to be added.
Aside: Thanks for making https://gitlab.cern.ch/dawerner/statanabenchmark public. This is a nice project. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't seem to introduce any new functionality that does not already exist. __call__
here in the changes is including changes covered by the _flush
function. The _flush()
should be called before trying to access the callbacks to determine the valid callbacks first. So I'm not sure how you ran into a situation that's covered by this PR that isn't already covered by the existing code...
In addition, the case you're describing is covered by our tests right now. |
I'm going to put this into draft until we can determine if Issue #2310 is actually reproducible in any way that |
No, it's actually not the same logic. Sorry, our bad, the comment should have been more precise in the code. Maybe:
I suspect the problem is that some callbacks delete objects that owns the arguments of other callbacks, which is why flushing only at the beginning is not sufficient. |
Again, this doesn't explain or change the logic. All callbacks are called through this function. Anytime you trigger the callbacks, the Without the changes, the flow is
With the changes, the flow is
Either way, |
Thanks for making the bullet point list, maybe it's now easier for me to explain what I think is the crucial difference. To quote the list you made and annotate with what I think is the flaw in the logic:
After the change, the flush will be called after iterating over the callbacks, catching any new dead refs that were the results of calling the callbacks. |
Yes, after iterating over all callbacks, not after each one... so before the change is identical to after the change. They will get caught by the next access for callbacks before the change (delayed), but that's still ok. This is why I'm still confused. I don't see a scenario not currently caught by the original code. |
Here is a scenario that would not be caught by the original code:
I didn't dig into the pyhf code deep enough to exactly tell you what |
Ok, but again, you've been stating that moving the flush to the end catches this case when the original code does not -- this is an incorrect statement. The difference is that you're skipping deadrefs as you iterate through the callbacks depending on if the ordering of the callbacks causes some refs to go dead during processing. I bet if you just simply add in logic to skip refs that are dead, and remove all other changes -- that you'll catch exactly this case. Moving around the flush will do nothing though. |
No, I never intended to say that (sorry if it came across like that). The position of the flush actually doesn't matter, it's all about the explicit dead arg check in the loop. We only put the flush at the end now to be a bit more efficient: like this, the flush() will also flush the new dead refs that came from calling the callbacks, so that the next time You might as well put the flush also back on top. But I'm happy we are now on the same page about skipping dead refs 👍 Sorry for not explaining explicitly why we moved the flush to the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but with the caveat that this is somewhat impossible to properly force a test force, as it is only catching the scenario of an intermediate callback triggering a garbage-collect.
Co-authored-by: Jonas Rembser <[email protected]>
Thanks, @kratsg and @matthewfeickert ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but with the caveat that this is somewhat impossible to properly force a test force, as it is only catching the scenario of an intermediate callback triggering a garbage-collect.
@kratsg if you are also happy to not attempt to figure out a way to deterministically test this then we can just note it in the commit message (I've updated it) and accept this.
Thanks @wernerd-cern and @guitargeek. I'll prepare pyhf v0.7.4
and get that out before the end of the week.
As pointed out by @kratsg, also the new title is maybe not completely appropriate because it's about the skipping of dead weakrefs and not the flushing. Maybe |
… callbacks (#2321) * Backport PR #2310 * Check refs while processing callbacks in case a callee is de-ref'd during the callback process. * Additionally flush after processing callbacks in case any callback de-ref's a callee. * Note that no additional tests are added for this case as the problem arises from and intermediate callback triggering a garbage-collect. It is unclear how to force this scenario deterministically in testing. * Add Jonas Rembser to contributors list. Co-authored-by: Daniel Werner <[email protected]> Co-authored-by: Jonas Rembser <[email protected]>
@wernerd-cern @guitargeek |
Description
Resolves #2311
The bug fixed by this PR occured in my work in the repository https://gitlab.cern.ch/dawerner/statanabenchmark.
I am currently working on a summer student project at CERN benchmarking statistical analysis tools in python including pyhf.
When performing these benchmarks, pyhf is called multiple times in a row with different backends. In some cases an error occurs then as objects are called through weakrefs that are already dead and thus
NoneType
objects.When events (such as 'tensorlib_changed') are called, a bug occured previously where objects (i.e. the
TensorViewer
) are already dead. In this case an error occured when member functions where using attributes such asself._sorted_indices
insrc/pyhf/tensor/common.py:33
.This bug is fixed by catching the cases, in which the object is invalidated by earlier callbacks.
Checklist Before Requesting Reviewer
Before Merging
For the PR Assignees: