Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Eventd] Race condition causes start failure for eventd #20988

Open
gpunathilell opened this issue Dec 2, 2024 · 3 comments
Open

[Eventd] Race condition causes start failure for eventd #20988

gpunathilell opened this issue Dec 2, 2024 · 3 comments
Assignees
Labels
Issue for 202405 Triaged this issue has been triaged

Comments

@gpunathilell
Copy link
Contributor

Description

There is a specific race condition due to which we get the following start failure for eventd in the logs:
ERR container: docker cmd: start for eventd failed with 500 Server Error for http+docker://localhost/v1.43/containers/e6c2c0c3006061fd86d1736f8372ca4a10a7d8709d9a4c30c1da6d996576a2f0/start: Internal Server Error ("failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: failed to write 19294: write /sys/fs/cgroup/blkio/docker/e6c2c0c3006061fd86d1736f8372ca4a10a7d8709d9a4c30c1da6d996576a2f0/cgroup.procs: no such device: unknown")

We have a daemon reload from featured:
https://github.com/sonic-net/sonic-host-services/blob/89aead2c34eb95102328c4730fce534190ee5dac/scripts/featured#L368
And eventd is started since it is binded to sonic.target
BindsTo=sonic.target

Issue could be due to race condition between docker start and systemd (Systemd removes cgroups during daemon-reload and docker start is writing to cgroups). Similar issue seen in

https://www.findbugzero.com/operational-defect-database/vendors/rh/defects/RHEL-16781

Issue is seen only once, and unable to be reproduced

Image used was latest master

@zbud-msft zbud-msft self-assigned this Dec 3, 2024
@prabhataravind prabhataravind added the Triaged this issue has been triaged label Dec 4, 2024
@vivekrnv
Copy link
Contributor

Also seen in 202405.

2024 Dec 18 12:26:06.411388 sonic INFO featured: Reloading systemd configuration files ...
2024 Dec 18 12:26:06.449925 sonic INFO systemd[1]: Reloading.
2024 Dec 18 12:26:06.491666 sonic ERR container: docker cmd: start for bgp failed with 400 Client Error for http+docker://localhost/v1.43/containers/5e6c1ded953236e1ddb3728c7007f3d689ce03c0e302b76610047e1cc7b4e426/start: Bad Request ("failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: failed to write 46213: openat2 /sys/fs/cgroup/blkio/docker/5e6c1ded953236e1ddb3728c7007f3d689ce03c0e302b76610047e1cc7b4e426/cgroup.procs: no such file or directory: unknown")
2024 Dec 18 12:26:06.491851 sonic INFO dockerd[1302]: time="2024-12-18T12:26:06.489256318+02:00" level=error msg="Error setting up exec command in container bgp: Container 5e6c1ded953236e1ddb3728c7007f3d689ce03c0e302b76610047e1cc7b4e426 is not running"

@zbud-msft
Copy link
Contributor

Doesn't seem like a specific issue with eventd.

@vivekrnv
Copy link
Contributor

yes, it does not. I think its related to "systemctl daemon-reload" and docker start happening at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue for 202405 Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

4 participants