Skip to content

skip setup signal notifier for detached container #4661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lifubang
Copy link
Member

@lifubang lifubang commented Mar 5, 2025

For detached container, we don't need to setup signal notifier, because there is no customer to consume the signals in forward().

@lifubang lifubang force-pushed the skip-setup-signalNotify-for-detached branch 2 times, most recently from 08ab900 to b2d5924 Compare March 5, 2025 14:39
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's weird. We set up signal forwarding (for create/run/exec) and then we don't use it.

But, if notifySocker is non-nil, we set up signal forwarding and when we do forward signals for some short time (until we exit).

I can't review it because the current code makes no sense to me :(

@lifubang
Copy link
Member Author

lifubang commented Mar 6, 2025

But, if notifySocker is non-nil, we set up signal forwarding and when we do forward signals for some short time (until we exit).

I don’t think so, if notifySocket is non-nil, we also don’t consume any signals. There are two places checking detach in ‘forward()’:

  1. runc/signals.go

    Lines 73 to 75 in b2d5924

    if detach && h.notifySocket == nil {
    return 0, nil
    }
  2. runc/signals.go

    Lines 82 to 86 in b2d5924

    if h.notifySocket != nil {
    if detach {
    _ = h.notifySocket.run(pid1)
    return 0, nil
    }

The signal consumer is in:

for s := range h.signals {

If I miss something, please tell me.

@kolyshkin
Copy link
Contributor

Ah, yes, you are right, if detach is true, it exits either way. But the code around notifySocket is spaghetti (in a bad way).

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (yet I wish someone can refactor the code dealing with notifySocket, it seems it is orthogonal to signals and yet it is mixed together)

@lifubang
Copy link
Member Author

lifubang commented Mar 7, 2025

yet I wish someone can refactor the code dealing with notifySocket, it seems it is orthogonal to signals and yet it is mixed together

Refactored. I think we should setup notify socket first before we handler and forward signals for non detached containers, so I didn't split them to two functions, but refactor to keep one path for detached container.
PTAL

@lifubang lifubang force-pushed the skip-setup-signalNotify-for-detached branch 4 times, most recently from e66d7ad to 8c15ed7 Compare March 16, 2025 07:30
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the perf improvement, much appreciated :)

I'll try to review again later with more time. However, it feels like several different things are changed here. If you can separate your patches as explained there, IMHO it will greatly simplify review.

@lifubang lifubang force-pushed the skip-setup-signalNotify-for-detached branch 3 times, most recently from 7dfb066 to 8838ab2 Compare March 18, 2025 14:50
@lifubang
Copy link
Member Author

However, it feels like several different things are changed here. If you can separate your patches as explained there, IMHO it will greatly simplify review.

Oh, yes, these codes were written during my business trip.
I split it into two commits now, do you think it's more clear now?

@lifubang lifubang requested review from kolyshkin and rata March 18, 2025 14:53
@lifubang lifubang force-pushed the skip-setup-signalNotify-for-detached branch 2 times, most recently from b04ff50 to 48beb0d Compare March 18, 2025 15:54
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might result in a better code if we add one more patch in the middle:

diff --git a/utils_linux.go b/utils_linux.go
index feb6ef80..2674bd3f 100644
--- a/utils_linux.go
+++ b/utils_linux.go
@@ -205,14 +205,13 @@ type runner struct {
        subCgroupPaths  map[string]string
 }
 
-func (r *runner) run(config *specs.Process) (int, error) {
-       var err error
+func (r *runner) run(config *specs.Process) (_ int, retErr error) {
        defer func() {
-               if err != nil {
+               if retErr != nil {
                        r.destroy()
                }
        }()
-       if err = r.checkTerminal(config); err != nil {
+       if err := r.checkTerminal(config); err != nil {
                return -1, err
        }
        process, err := newProcess(*config)

to disambiguate the use or err.

@lifubang
Copy link
Member Author

to disambiguate the use or err.

So it seems that this is a bug?

@lifubang lifubang force-pushed the skip-setup-signalNotify-for-detached branch from 48beb0d to 6d60c2e Compare March 19, 2025 11:06
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting it up! It's way simpler to review now.

I'm trying to digest if there are simpler ways for some parts. I'll post if I have ideas, but seems okay for me :)

@kolyshkin
Copy link
Contributor

to disambiguate the use or err.

So it seems that this is a bug?

Probably not, I just don't like using err in a defer -- we mean to check what the function will return, not what the value of some variable named err is. I mean, if someone will add a code like

  return -1, errors.New("oops")

the check in defer won't work properly.

So, this code is error-prone as it is. It won't be if we name a return variable and check it.

@lifubang
Copy link
Member Author

Probably not

This is a bug for sometimes, for example:
https://github.com/opencontainers/runc/blob/main/utils_linux.go#L264

@lifubang lifubang force-pushed the skip-setup-signalNotify-for-detached branch 2 times, most recently from 66e13e5 to 7f34471 Compare March 22, 2025 00:18
utils_linux.go Outdated
Comment on lines 232 to 243
defer func() {
// We should terminate the process once we got an error.
// It's safe to do this because we ignored all errors in
// this function.
if retErr != nil {
r.terminate(process)
}
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get this. Now this defer call will be used in places where we didn't have the r.terminate() call before. Like 84ffd99#diff-bcda3d076af1f465c76f38c2404e7cd82b431bfd484d5c810871f9953162136bR254 , 84ffd99#diff-bcda3d076af1f465c76f38c2404e7cd82b431bfd484d5c810871f9953162136bR265 and 84ffd99#diff-bcda3d076af1f465c76f38c2404e7cd82b431bfd484d5c810871f9953162136bR272

Is this on purpose? Not sure what the "we ignored all errors in this function" means, like the callers of this function ignore the errors? Here we are not ignoring errors, we are returning errors in several cases.

What am I missing?

Copy link
Member Author

@lifubang lifubang Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the "we ignored all errors in this function" means, like the callers of this function ignore the errors? Here we are not ignoring errors, we are returning errors in several cases.

We have ignored any errors here:

runc/utils_linux.go

Lines 333 to 336 in 7f34471

func (r *runner) terminate(p *libcontainer.Process) {
_ = p.Signal(unix.SIGKILL)
_, _ = p.Wait()
}

If process.ops == nil, it will also return an error:

func (p Process) Signal(sig os.Signal) error {
if p.ops == nil {
return errInvalidProcess
}

Not sure I get this. Now this defer call will be used in places where we didn't have the r.terminate() call before.

As we have ignored any errors in r.terminate(process), so we can call it safely in any places when got an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we wouldn't just add this defer after the process has been created. Can you please elaborate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we wouldn't just add this defer after the process has been created.

I think I don't understand your question here. I think I just has done what you want to do, I added this defer after the process has been created. Perhaps I misunderstood what you said. If yes, please tell me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry I wasn't clear. I mean, the process is created after this line: https://github.com/opencontainers/runc/pull/4661/files#diff-bcda3d076af1f465c76f38c2404e7cd82b431bfd484d5c810871f9953162136bL282

Why not add the defer function there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it now, thanks, it has the same effect as before now!

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some more comments. Thanks!

@lifubang lifubang force-pushed the skip-setup-signalNotify-for-detached branch from 7f34471 to 3e9350e Compare March 25, 2025 01:24
@rata
Copy link
Member

rata commented Mar 31, 2025

Had another look, left this comment. The diff says it is outdated, that is why I'm highlighting it here, but it seems to be still relevant.

lifubang added 4 commits April 1, 2025 15:16
We should use a named return value of error, or else we can't
catch all errors when calling defer function, for example we
used a block scope var name `err` for `setupPidfdSocket`.

Signed-off-by: lifubang <[email protected]>
In fact, notifySocket has no relationship to signalHandler, we
can move it out of signalHandler to make the code more clear.

Signed-off-by: lifubang <[email protected]>
For detached container, we don't need to setup signal notifier, because
there is no customer to consume the signals in `forward()`.

Signed-off-by: lifubang <[email protected]>
@lifubang lifubang force-pushed the skip-setup-signalNotify-for-detached branch from 3e9350e to e4266af Compare April 1, 2025 15:16
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly LGTM now, thanks!

Left some questions/nits

@kolyshkin kolyshkin requested a review from rata April 18, 2025 23:48
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I think this is a net improvement.

I think I found a bug, though. Let me know if that is not the case.

Sorry I took long, but it really takes me a lot of more time to review a PR where several different changes are not split in different commit. See my previous comment with links that explain better what I mean.

But to give a concrete example here, let's see commit e4266af. It is doing at least these things:

  • Move enableSubreaper outside of newSignalHandler
  • Moving code from the main function to the defer function
  • Skipping the forward call for !detached containers

Splitting in different commits makes the whole chunk way simpler to review. If in one commit I just need to check that when you move code from A to B, that was okay, that is great. But if you move code, at the same time adjust that code slightly and move some other code... it is more likely I'll miss something. It's simpler for bugs to squeeze in.

If something is not enough to split in a different commit, then a mention in the commit message is very helpful to me as a reviewer. That is enough in lot of very simple cases.

If I see a commit that says "skip X for !detached containers", I expect not much more than a commit that adds an if !detached { .... . Here there are quite some more changes than just that, but what is the reason? I can't judge, as nothing is mentioned. So it hard as a reviewer to know. I try to guess reasons, try to see if I agree with what I guess is your reason and review that. But you might have done it for a completely different reason, that is not what I guessed, and maybe in that case I'd have suggested something else.

My point here is that there are also lot of things lost by not being explicit in the commits.

Also, if I did find a bug (see the comment), it took me like 3 review rounds (I think it was always there). And a big factor in that is that the commit introducing the bug is doing several other things and the code in question is also being moved to another function (the defer) and therefore, you need to keep track of many things and you can easily miss one. If that is a bug, I think I did miss it the other times, Kir probably too.

I think at this point, it's really worth to further split the commit in more commits, keeping different logical changes in different commits. It's easier to review, it's easier if we find a bug to know that you wanted to do (and clearly spot the bug in retrospective, because we know what the intent of that logical change was), etc.

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left some more comments

Comment on lines +294 to +297
defer func() {
// We should terminate the process once we got an error.
if retErr != nil {
r.terminate(process)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated now. The function deferred previously will also do the exact same thing. Is this on purpose?

if err != nil {
// For a non-detached container, or we get an error, we
// should destroy the container.
if !detach || retErr != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want to do this for !detach containers? What if we fail before we start the process? Is this handled fine by r.destroy()? Or does this kill something else too, and we should do it here?

I don't know, maybe we want to just do it later in the other defer? I'd like to understand why one or the other, though. But that is what we were doing before, though, the destroy was not called for this part of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants