-
Notifications
You must be signed in to change notification settings - Fork 2.2k
skip setup signal notifier for detached container #4661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
skip setup signal notifier for detached container #4661
Conversation
08ab900
to
b2d5924
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's weird. We set up signal forwarding (for create/run/exec) and then we don't use it.
But, if notifySocker is non-nil, we set up signal forwarding and when we do forward signals for some short time (until we exit).
I can't review it because the current code makes no sense to me :(
I don’t think so, if notifySocket is non-nil, we also don’t consume any signals. There are two places checking detach in ‘forward()’:
The signal consumer is in: Line 95 in b2d5924
If I miss something, please tell me. |
Ah, yes, you are right, if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (yet I wish someone can refactor the code dealing with notifySocket, it seems it is orthogonal to signals and yet it is mixed together)
Refactored. I think we should setup notify socket first before we handler and forward signals for non detached containers, so I didn't split them to two functions, but refactor to keep one path for detached container. |
e66d7ad
to
8c15ed7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the perf improvement, much appreciated :)
I'll try to review again later with more time. However, it feels like several different things are changed here. If you can separate your patches as explained there, IMHO it will greatly simplify review.
7dfb066
to
8838ab2
Compare
Oh, yes, these codes were written during my business trip. |
b04ff50
to
48beb0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might result in a better code if we add one more patch in the middle:
diff --git a/utils_linux.go b/utils_linux.go
index feb6ef80..2674bd3f 100644
--- a/utils_linux.go
+++ b/utils_linux.go
@@ -205,14 +205,13 @@ type runner struct {
subCgroupPaths map[string]string
}
-func (r *runner) run(config *specs.Process) (int, error) {
- var err error
+func (r *runner) run(config *specs.Process) (_ int, retErr error) {
defer func() {
- if err != nil {
+ if retErr != nil {
r.destroy()
}
}()
- if err = r.checkTerminal(config); err != nil {
+ if err := r.checkTerminal(config); err != nil {
return -1, err
}
process, err := newProcess(*config)
to disambiguate the use or err
.
So it seems that this is a bug? |
48beb0d
to
6d60c2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for splitting it up! It's way simpler to review now.
I'm trying to digest if there are simpler ways for some parts. I'll post if I have ideas, but seems okay for me :)
Probably not, I just don't like using return -1, errors.New("oops") the check in So, this code is error-prone as it is. It won't be if we name a return variable and check it. |
This is a bug for sometimes, for example: |
66e13e5
to
7f34471
Compare
utils_linux.go
Outdated
defer func() { | ||
// We should terminate the process once we got an error. | ||
// It's safe to do this because we ignored all errors in | ||
// this function. | ||
if retErr != nil { | ||
r.terminate(process) | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I get this. Now this defer call will be used in places where we didn't have the r.terminate()
call before. Like 84ffd99#diff-bcda3d076af1f465c76f38c2404e7cd82b431bfd484d5c810871f9953162136bR254 , 84ffd99#diff-bcda3d076af1f465c76f38c2404e7cd82b431bfd484d5c810871f9953162136bR265 and 84ffd99#diff-bcda3d076af1f465c76f38c2404e7cd82b431bfd484d5c810871f9953162136bR272
Is this on purpose? Not sure what the "we ignored all errors in this function" means, like the callers of this function ignore the errors? Here we are not ignoring errors, we are returning errors in several cases.
What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the "we ignored all errors in this function" means, like the callers of this function ignore the errors? Here we are not ignoring errors, we are returning errors in several cases.
We have ignored any errors here:
Lines 333 to 336 in 7f34471
func (r *runner) terminate(p *libcontainer.Process) { | |
_ = p.Signal(unix.SIGKILL) | |
_, _ = p.Wait() | |
} |
If process.ops == nil
, it will also return an error:
Lines 148 to 151 in 7f34471
func (p Process) Signal(sig os.Signal) error { | |
if p.ops == nil { | |
return errInvalidProcess | |
} |
Not sure I get this. Now this defer call will be used in places where we didn't have the
r.terminate()
call before.
As we have ignored any errors in r.terminate(process)
, so we can call it safely in any places when got an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we wouldn't just add this defer after the process has been created. Can you please elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we wouldn't just add this defer after the process has been created.
I think I don't understand your question here. I think I just has done what you want to do, I added this defer after the process has been created. Perhaps I misunderstood what you said. If yes, please tell me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry I wasn't clear. I mean, the process is created after this line: https://github.com/opencontainers/runc/pull/4661/files#diff-bcda3d076af1f465c76f38c2404e7cd82b431bfd484d5c810871f9953162136bL282
Why not add the defer function there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it now, thanks, it has the same effect as before now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some more comments. Thanks!
7f34471
to
3e9350e
Compare
Had another look, left this comment. The diff says it is outdated, that is why I'm highlighting it here, but it seems to be still relevant. |
We should use a named return value of error, or else we can't catch all errors when calling defer function, for example we used a block scope var name `err` for `setupPidfdSocket`. Signed-off-by: lifubang <[email protected]>
Signed-off-by: lifubang <[email protected]>
In fact, notifySocket has no relationship to signalHandler, we can move it out of signalHandler to make the code more clear. Signed-off-by: lifubang <[email protected]>
For detached container, we don't need to setup signal notifier, because there is no customer to consume the signals in `forward()`. Signed-off-by: lifubang <[email protected]>
3e9350e
to
e4266af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly LGTM now, thanks!
Left some questions/nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I think this is a net improvement.
I think I found a bug, though. Let me know if that is not the case.
Sorry I took long, but it really takes me a lot of more time to review a PR where several different changes are not split in different commit. See my previous comment with links that explain better what I mean.
But to give a concrete example here, let's see commit e4266af. It is doing at least these things:
- Move enableSubreaper outside of newSignalHandler
- Moving code from the main function to the defer function
- Skipping the forward call for !detached containers
Splitting in different commits makes the whole chunk way simpler to review. If in one commit I just need to check that when you move code from A to B, that was okay, that is great. But if you move code, at the same time adjust that code slightly and move some other code... it is more likely I'll miss something. It's simpler for bugs to squeeze in.
If something is not enough to split in a different commit, then a mention in the commit message is very helpful to me as a reviewer. That is enough in lot of very simple cases.
If I see a commit that says "skip X for !detached containers", I expect not much more than a commit that adds an if !detached { ....
. Here there are quite some more changes than just that, but what is the reason? I can't judge, as nothing is mentioned. So it hard as a reviewer to know. I try to guess reasons, try to see if I agree with what I guess is your reason and review that. But you might have done it for a completely different reason, that is not what I guessed, and maybe in that case I'd have suggested something else.
My point here is that there are also lot of things lost by not being explicit in the commits.
Also, if I did find a bug (see the comment), it took me like 3 review rounds (I think it was always there). And a big factor in that is that the commit introducing the bug is doing several other things and the code in question is also being moved to another function (the defer) and therefore, you need to keep track of many things and you can easily miss one. If that is a bug, I think I did miss it the other times, Kir probably too.
I think at this point, it's really worth to further split the commit in more commits, keeping different logical changes in different commits. It's easier to review, it's easier if we find a bug to know that you wanted to do (and clearly spot the bug in retrospective, because we know what the intent of that logical change was), etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, left some more comments
defer func() { | ||
// We should terminate the process once we got an error. | ||
if retErr != nil { | ||
r.terminate(process) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicated now. The function deferred previously will also do the exact same thing. Is this on purpose?
if err != nil { | ||
// For a non-detached container, or we get an error, we | ||
// should destroy the container. | ||
if !detach || retErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always want to do this for !detach containers? What if we fail before we start the process? Is this handled fine by r.destroy()? Or does this kill something else too, and we should do it here?
I don't know, maybe we want to just do it later in the other defer? I'd like to understand why one or the other, though. But that is what we were doing before, though, the destroy was not called for this part of the code.
For detached container, we don't need to setup signal notifier, because there is no customer to consume the signals in
forward()
.