-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add creating status to ensure state.json exists when runc kill. #4645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add creating status to ensure state.json exists when runc kill. #4645
Conversation
5e09087
to
5b791ec
Compare
I have yet to take a closer look but it is much better than #4535 already! |
Please add a description into commit message. |
} | ||
|
||
func (i *creatingState) destroy() error { | ||
_ = signalAllProcesses(i.c.cgroupManager, unix.SIGKILL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong.
- There are no processes in the cgroup yet, why are you killing them?
- We should only use
signalAllProcesses
for cases when the container doesn't have its own PID namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

Between the creation of cgroup and waitForChildExit, the pid of runc init[STAGE_PARENT], runc init[STAGE_CHILD], and runc init[STAGE_INIT] will exist in the cgroup.
runc delete does not know which process to send SIGKILL to. Considering the possible existence of processes like runc init PARENT, CHILD, etc., it is necessary to kill the processes in cgroup.procs one by one.
When I was contemplating about writing a function to terminate all processes in cgroup.procs, I discovered that signalAllProcesses already does exactly that, so I directly called it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right.
Maybe add a comment, something like
// Use signalAllProcesses here because various stages of `runc init` might be in the cgroup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the missing comment.
@cyphar PTAL
Signed-off-by: jianghao65536 <[email protected]>
5b791ec
to
9544dba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Left a comment, it will be great if we can have tests too :)
@@ -242,3 +242,20 @@ func (n *loadedState) destroy() error { | |||
} | |||
return n.c.state.destroy() | |||
} | |||
|
|||
type creatingState struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This state is awfully similar to the created state. I haven't had a deep look into this, but any reason to not use the created state?
I think if we remove the c.Init assignment as this PR does, and we run the p.container.UpdateState()
as this PR also does, maybe it will also fix the problem?
It seems in that case, the refreshstate will return the created state, the only difference is that the destroy function is _ = i.c.initProcess.signal(unix.SIGKILL)
. If that is a problem, maybe the signalAllProcesses can work for the other things in the created state too?
Am I missing something?
@jianghao65536 ping? |
Just a reminder for maintainers, once the code here looks good, we should remember to add tests before we merge this (if possible). |
@jianghao65536 friendly ping, may I ask if you’re still working on this? This issue is currently blocking some of my work. If you don’t have the bandwidth to move this forward, I’d be happy to open a new PR to address it and include tests for it. Thank you! |
Sorry for the delay, I don't have a good way to add test cases here, I really appreciate you taking over this fix. @HirazawaUi cc @kolyshkin @rata |
@HirazawaUi sure, please go ahead |
@rata @jianghao65536 Thanks, but with China's public holidays approaching, I might implement this fix after the holidays (next week). |
closing in favor of #4757 |
(This is a continuation of #4535)
Fix 4534
If runc gets killed before it sets the cgroup properties during the container creation process, the cgroup ends up in a frozen state. As a result, subsequent attempts to delete the container using 'runc delete' will fail because the state.json file hasn't been created yet.
I've introduced a 'creatingstate'. It's only used by 'runc delete'. Currently, for the containers that are created, 'Stopped' is still the initial set status. I originally wanted to replace it with 'Creating', but due to the mutual reference between 'parentProcess' and 'Container', it is not easy to implement.
Test cases will be supplemented later。