Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombie process in/from Docker container #1428

Closed
mk3media opened this issue Aug 14, 2022 · 13 comments · May be fixed by #2890
Closed

Zombie process in/from Docker container #1428

mk3media opened this issue Aug 14, 2022 · 13 comments · May be fixed by #2890
Labels

Comments

@mk3media
Copy link

mk3media commented Aug 14, 2022

I just noticed a zombie process on my server. After some further investigation I found the course of the problem, the zombie process belongs to the umami docker container. Here is the output of top on the containers console:

Load average: 0.10 0.03 0.01 2/519 282
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
237 226 nextjs S 20.3g1080% 1 0% /usr/local/bin/node serve
226 28 nextjs S 306m 16% 1 0% /usr/local/bin/node /opt/
1 0 nextjs S 305m 16% 0 0% node /opt/yarn-v1.22.19/b
28 1 nextjs S 284m 15% 0 0% /usr/local/bin/node /app/
276 0 nextjs S 1680 0% 0 0% sh
282 276 nextjs R 1608 0% 0 0% top
197 1 nextjs Z 0 0% 1 0% [node]

Same problem is present on two different systems. Any suggestions?

@mk3media mk3media changed the title Zombie process Zombie process in/from Docker container Aug 16, 2022
@boly38
Copy link

boly38 commented Aug 18, 2022

I got same issue / after upgrading umami version v1.36.1 > v1.37.0

here is an example:

   9370 ?        Sl     0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 6b2d9f052****fefccf7 -address /run/containerd/containerd.sock
   9398 ?        Ssl    0:00  \_ node /opt/yarn-v1.22.19/bin/yarn.js start-docker
   9468 ?        Sl     0:00      \_ /usr/local/bin/node /app/node_modules/.bin/npm-run-all check-db update-tracker start-server
   9828 ?        Sl     0:00      |   \_ /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run start-server
   9839 ?        Sl     0:03      |       \_ /usr/local/bin/node server.js
   9668 ?        Zs     0:00      \_ [node] <defunct>
   9791 ?        Zs     0:00      \_ [node] <defunct>

btw container logs seems ok:

docker logs my_umami
yarn run v1.22.19
$ npm-run-all check-db update-tracker start-server
$ node scripts/check-db.js
✓ DATABASE_URL is defined.
✓ Database connection successful.
✓ Database tables found.
Prisma schema loaded from prisma/schema.prisma
Datasource "db": PostgreSQL database "postgres", schema "public" at "my_pgsql:5432"

2 migrations found in prisma/migrations

Following migration have not yet been applied:
02_add_event_data

To apply migrations in development run yarn prisma migrate dev.
To apply migrations in production run yarn prisma migrate deploy.



Running update...
Prisma schema loaded from prisma/schema.prisma
Datasource "db": PostgreSQL database "postgres", schema "public" at "my_pgsql:5432"

2 migrations found in prisma/migrations

Applying migration `02_add_event_data`

The following migration have been applied:

migrations/
  └─ 02_add_event_data/
    └─ migration.sql

All migrations have been successfully applied.

✓ Database is up to date.
$ node scripts/update-tracker.js
$ node server.js
Listening on port 3000
(...some table json definition..)
  • reboot the VM
  • 1 zombi :'(
   1347 ?        Sl     0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 6b2d9f052****feed7550555441b2acb7fefccf7 -address /run/containerd/containerd.sock
   1421 ?        Ssl    0:00  \_ node /opt/yarn-v1.22.19/bin/yarn.js start-docker
   1899 ?        Sl     0:00      \_ /usr/local/bin/node /app/node_modules/.bin/npm-run-all check-db update-tracker start-server
   2187 ?        Sl     0:00      |   \_ /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run start-server
   2198 ?        Sl     0:03      |       \_ /usr/local/bin/node server.js
   2158 ?        Zs     0:00      \_ [node] <defunct>

FYI: about image content

$ docker exec -it my_umami sh
/app $ npm list -g --depth 0
npm WARN config global `--global`, `--local` are deprecated. Use `--location=global` instead.
/usr/local/lib
+-- [email protected]
`-- [email protected]

/app $ yarn --version
1.22.19

could you tell us a way to troubleshoot? or help you to reproduce ?

@mk3media
Copy link
Author

mk3media commented Sep 6, 2022

Updated to 1.38.0 and the zombie process still remains :(

@boly38
Copy link

boly38 commented Sep 14, 2022

zombie still there in 1.38 too.

I tried to change for package.json :: start-docker target the binary npm-run-all(doc) by run-s to run node commands sequentially and try to identify the root cause.
Then

  • restart the umami docker
  • then quickly docker exec -it myumami sh (having shell on umami docker container)
  • then repeat ps xaf command

What I see is :

/app $ ps xaf
PID   USER     TIME  COMMAND
    1 nextjs    0:00 node /opt/yarn-v1.22.19/bin/yarn.js start-docker
   27 nextjs    0:00 /usr/local/bin/node /app/node_modules/.bin/run-s check-db update-tracker start-server
   38 nextjs    0:00 /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run check-db
   49 nextjs    0:00 /usr/local/bin/node scripts/check-db.js
   82 nextjs    0:02 /usr/local/bin/node /app/node_modules/.bin/prisma migrate status
   89 nextjs    0:00 sh
  105 nextjs    0:00 [sh]
  106 nextjs    0:00 ps xaf
/app $ ps xaf
PID   USER     TIME  COMMAND
    1 nextjs    0:00 node /opt/yarn-v1.22.19/bin/yarn.js start-docker
   27 nextjs    0:00 /usr/local/bin/node /app/node_modules/.bin/run-s check-db update-tracker start-server
   38 nextjs    0:00 /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run check-db
   49 nextjs    0:00 /usr/local/bin/node scripts/check-db.js
   82 nextjs    0:04 /usr/local/bin/node /app/node_modules/.bin/prisma migrate status
   89 nextjs    0:00 sh
  197 nextjs    0:00 /usr/local/bin/node /app/node_modules/prisma/build/child {"product":"prisma","version":"4.3.1","cli_install_type":"local","information":"","local_timestamp":"2022-09-14T18:30:58Z","project_
  208 nextjs    0:00 ps xaf
/app $ ps xaf
PID   USER     TIME  COMMAND
    1 nextjs    0:00 node /opt/yarn-v1.22.19/bin/yarn.js start-docker
   27 nextjs    0:00 /usr/local/bin/node /app/node_modules/.bin/run-s check-db update-tracker start-server
   89 nextjs    0:00 sh
  197 nextjs    0:00 [node]
  227 nextjs    0:00 /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run start-server
  234 nextjs    0:00 ps xaf
/app $ ps xaf

between first and second ps I see that node ...bin/prisma migrate status (from checkDb) child process number 197 is the process that become [node] zombie.

Not easy to go deep :

  • I even tried to add a proc.kill('SIGTERM') on run 'exit' event but without benefit: this zombie may be prisma detached subprocess.
    // proc.on('exit', () => resolve(buffer.join('')));
    proc.on('exit', () => { proc.kill('SIGTERM'); buffer.push("run is done"); resolve(buffer.join('')); });

@mk3media
Copy link
Author

After upgrading to 1.39.3 there are now 2 zombie processes. Did also a reboot of the host – no change.

@AntoninHuaut
Copy link

After upgrading to 1.39.3 there are now 2 zombie processes. Did also a reboot of the host – no change.

I have 2 zombie processes with 1.38.0

@mk3media
Copy link
Author

Today on one machine one zombie process disappeared (without any restart/reboot), so there still is one. On another machine, same setup, same host os etc. there are still 2 zombie processes.

@boly38
Copy link

boly38 commented Dec 17, 2022

I found a possible workaround for zombie issue according to the following context:

  • A) assume following umami version update, you've with success started umami a first time and migrate your database model
  • B) now you would like to run umami without zombie caused by migration step.

Patch umami startup sequence to ignore check-db stage

# open a shell on your umami container
docker exec -it umami sh
vi package.json
# duplicate "start-docker": line as "start-dockerBackup": (yy + p)
# update "start-docker": line by removing "check-db" (->  + dw )
# :wq
# CTRL D 
docker-compose stop
docker-compose start
# no more zombie

this is proof that the zombie is from the migration process

We could imagine an improvement where a given environnement variable could drive check-db execution or skip.

Example:
UMAMI_CHECK_DB (default:true)

@github-actions
Copy link

This issue is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stale label Aug 19, 2023
@github-actions
Copy link

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 26, 2023
@boly38
Copy link

boly38 commented Jan 18, 2024

I stayed a long time with 1.40 version on some site without issue/maintenance
and today I just migrate to v2.9.0 : following data migration, and docker refresh & recreate I didn't see any Zombie on my road :)

A special thanks to the Umami high quality project, especially migration guide dedicated doc/repo which was just perfect 👏 🥇

@simonwiles
Copy link

I still have the zombie on v2.9.0 (two separate instances exhibiting the same behaviour), fwiw.

@boly38
Copy link

boly38 commented Jan 23, 2024

unfortunately you're right @simonwiles

following double-check my vm, it's true that the zombie still appears (with ome delay after docker compose up -d)

@bparmentier
Copy link

The issue is still present in 2.12.1. But running the container with the --init flag seems to fix it!

bparmentier added a commit to bparmentier/umami that referenced this issue Aug 22, 2024
When the `start-docker` script is executed, some process is not properly
cleaned up and ends up in a zombie state.

Using the `init` flag when launching the container runs an init process
inside the container that will forward signals to node and reap
processes.

Fixes: umami-software#1428
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants