Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] not possible to run two compose projects on the same machine anymore, hang #12627

Open
pbering opened this issue Mar 11, 2025 · 14 comments
Assignees
Labels

Comments

@pbering
Copy link

pbering commented Mar 11, 2025

Description

When executing step 5 it just hang forever (tried leaving it for ~45min) with:

[+] Running 1/2
 ✔ Network stack2_default        Created                                                                                                                                                                                           0.2s 
 - Container stack2-service20-1  Starting                             

I can do CTRL-C to terminate and then the output is "exit status 130" like so:

[+] Running 1/2
 ✔ Network stack2_default        Created                                                                                                                                                                                           0.2s 
 - Container stack2-service20-1  Starting                                                                                                                                                                                        102.2s 
exit status 130

I can't even do docker compose -p stack1 --file ./compose.stack1.yaml down or docker compose -p stack2 --file ./compose.stack2.yaml down, it will hang in "Stopping" state...

Only way to get Docker engine to respond again is to reboot the host.

If I try to do the same as in stack1 and stack2 with:

docker run --rm -d -e ASPNETCORE_HTTP_PORTS=80 -p 7001:80 mcr.microsoft.com/dotnet/samples:aspnetapp-9.0-nanoserver-ltsc2022
docker run --rm -d -e ASPNETCORE_HTTP_PORTS=80 -p 8001:80 -p 8002:8002 mcr.microsoft.com/dotnet/samples:aspnetapp-9.0-nanoserver-ltsc2022

Then everything works as expected.

If you remove the second port in stack2, it also works as expected, strangely enough...

Steps To Reproduce

  1. Set engine to Windows: docker desktop engine use windows
  2. Create ./compose.stack1.yaml with contents:
services:
  service10:
    image: mcr.microsoft.com/dotnet/samples:aspnetapp-9.0-nanoserver-ltsc2022
    ports:
      - "7001:80"
    environment:
      - ASPNETCORE_URLS=http://+:80
  1. Create ./compose.stack2.yaml with contents:
services:
  service20:
    image: mcr.microsoft.com/dotnet/samples:aspnetapp-9.0-nanoserver-ltsc2022
    ports:
      - "8001:80"
      - "8002:8002"
    environment:
      - ASPNETCORE_URLS=http://+:80
  1. Start project stack1: docker compose -p stack1 --file ./compose.stack1.yaml up -d
  2. Start project stack1: docker compose -p stack2 --file ./compose.stack2.yaml up -d

Compose Version

Docker Compose version v2.33.1-desktop.1

Docker Environment

Client:
 Version:    28.0.1
 Context:    desktop-windows
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v0.9.4
    Path:     C:\Users\pberi\.docker\cli-plugins\docker-ai.exe
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.21.1-desktop.2
    Path:     C:\Users\pberi\.docker\cli-plugins\docker-buildx.exe
  compose: Docker Compose (Docker Inc.)
    Version:  v2.33.1-desktop.1
    Path:     C:\Users\pberi\.docker\cli-plugins\docker-compose.exe
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.38
    Path:     C:\Users\pberi\.docker\cli-plugins\docker-debug.exe
  desktop: Docker Desktop commands (Beta) (Docker Inc.)
    Version:  v0.1.5
    Path:     C:\Users\pberi\.docker\cli-plugins\docker-desktop.exe
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     C:\Users\pberi\.docker\cli-plugins\docker-dev.exe
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.27
    Path:     C:\Users\pberi\.docker\cli-plugins\docker-extension.exe
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.5
    Path:     C:\Users\pberi\.docker\cli-plugins\docker-feedback.exe
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.4.0
    Path:     C:\Users\pberi\.docker\cli-plugins\docker-init.exe
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     C:\Users\pberi\.docker\cli-plugins\docker-sbom.exe
  scout: Docker Scout (Docker Inc.)
    Version:  v1.16.3
    Path:     C:\Users\pberi\.docker\cli-plugins\docker-scout.exe

Server:
 Containers: 7
  Running: 5
  Paused: 0
  Stopped: 2
 Images: 180
 Server Version: 28.0.1
 Storage Driver: windowsfilter
  Windows: 
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics internal l2bridge l2tunnel nat null overlay private transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local splunk syslog
 Swarm: inactive
 Default Isolation: hyperv
 Kernel Version: 10.0 26100 (26100.1.amd64fre.ge_release.240331-1435)
 Operating System: Microsoft Windows Version 24H2 (OS Build 26100.3323)
 OSType: windows
 Architecture: x86_64
 CPUs: 24
 Total Memory: 63.92GiB
 Name: amd
 ID: 2588f1d0-8302-469b-935a-2dcec18226d4
 Docker Root Dir: D:\Data\Docker
 Debug Mode: false
 Labels:
  com.docker.desktop.address=npipe://\\.\pipe\docker_cli
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

Anything else?

No response

@danieletorelli
Copy link

Happens also on Linux, apparently when pulling big images for existing services.

It destroys other containers state as well and for me the only way to resolve is to entirely recreate all the containers and networks on the host by run docker compose down for all projects I have and then docker compose up -d again.

Extremely annoying and it happens only from recent versions, probably from 28.0.0.

@ndeloof
Copy link
Contributor

ndeloof commented Mar 12, 2025

@danieletorelli which image are you using to reproduce ?

@ndeloof ndeloof self-assigned this Mar 12, 2025
@danieletorelli
Copy link

danieletorelli commented Mar 12, 2025

@danieletorelli which image are you using to reproduce ?

@ndeloof Recently, it happens always when I'm trying to pull a new HomeAssistant image: ghcr.io/home-assistant/home-assistant:stable

So I think you could try creating a compose with an old image and then running docker compose up -d. When it tries to pull the new image with the container running the old one, usually hangs and then returns 130 or "unexpected EOF" and I experience the defect.

@danieletorelli
Copy link

@ndeloof Important to say that, in order to completely restore the containers in a healthy state, after running down I have to pull with docker compose pull and then goes well and I can run it again with up. So it seems that implicitly pulling a new image with the up --pull always command, when the container is already running with an old image, is the triggering factor.

@pbering
Copy link
Author

pbering commented Mar 12, 2025

just found out that the issue is still present if you first start stack1 and the do the second docker run line from the description. Is seem that compose is breaking all networks...

@ndeloof
Copy link
Contributor

ndeloof commented Mar 13, 2025

@danieletorelli I tried to reproduce with homeassistant image.

$ docker compose up -d
[+] Running 33/33
 ✔ serviceA Pulled                                                                                                                                                                                                                     97.0s 
[+] Running 2/2
 ✔ Network truc_default       Created                                                                                                                                                                                                   0.0s 
 ✔ Container truc-serviceA-1  Started                                                                                                                                                                                                   1.2s 

$ docker ps
CONTAINER ID   IMAGE                                                   COMMAND   CREATED          STATUS          PORTS     NAMES
d705813fdc88   homeassistant/home-assistant:2025.4.0.dev202503120232   "/init"   26 seconds ago   Up 25 seconds             truc-serviceA-1

$ # update compose to use latest tag
$ docker compose up -d
[+] Running 13/13
 ✔ serviceA Pulled                                                                                                                                                                                                                        
...
✔ c773d2994042 Pull complete                                                                                                                                                                                                        67.0s 
[+] Running 1/1
 ✔ Container truc-serviceA-1  Started    

@pbering You say it will hang in "Stopping" state...
When you get into this state, can you please capture container details with docker inspect ... ?
The only distinction I can see vs a plain docker run ... is that Ctrl+C is sent to container, while when using compose this is the compose command line you interrupt, which sends a ContainerStop API call to engine. Maybe you can try to reproduce using docker run ... without detach, and in a separate terminal run docker stop <ID> ?

@pbering
Copy link
Author

pbering commented Mar 13, 2025

@ndeloof not possible, when in "Stopping" state af  trying to run docker compose down and I open a new terminal and do docker ps -a then I see the container is in "Created" state and when I try to docker inspect <id> on that id, well then THAT hangs 🤷‍♂️

Not sure what you mean with:

"The only distinction I can see vs a plain docker run ... is that Ctrl+C is sent to container, while when using compose this is the compose command line you interrupt, which sends a ContainerStop API call to engine. Maybe you can try to reproduce using docker run ... without detach, and in a separate terminal run docker stop ?"

If I ONLY use docker run everything works fine ... but as soon as anything is started with docker compose, then everything after that stops working that publishes ports, both docker compose and docker run...

@pbering
Copy link
Author

pbering commented Mar 13, 2025

Please also notice my statement "If you remove the second port in stack2, it also works as expected, strangely enough..."...

I can run multiple compose projects as long as each service ONLY has 1 published port, as soon as any service has more than 1 published port everything hangs.

@ndeloof
Copy link
Contributor

ndeloof commented Mar 13, 2025

if docker inspect <id> fails, that seems to demonstrate a docker engine issue with some resource "dead locked".

If you remove the second port in stack2, it also works as expected, strangely enough...

indeed, this I can't explain, but some race condition in docker engine ?

@danieletorelli
Copy link

@ndeloof it happened again with another, way smaller, image.

[+] Pulling 6/8
 ⠹ adguardhome-sync [⣿⣿⣿⣿⣿⣿⣿] 6.827MB / 6.827MB Pulling                                       10.2s 
   ✔ 6e771e15690e Already exists                                                               0.0s 
   ✔ 67bbc8d12cbe Pull complete                                                                0.6s 
   ✔ b463c797062a Pull complete                                                                0.7s 
   ✔ 3eed5fc676fc Pull complete                                                                1.3s 
   ✔ f0595360dbac Pull complete                                                                1.9s 
   ✔ 93c22ece84d9 Pull complete                                                                2.3s 
   ⠋ 1a69639606d0 Extracting     475B/475B                                                     8.4s 
unexpected EOF

and I was able to determine that dockerd segfaults:

Mar 14 09:26:05 222 systemd-coredump[692445]: [🡕] Process 724 (dockerd) of user 0 dumped core.
Mar 14 09:26:05 222 systemd[1]: docker.service: Main process exited, code=dumped, status=11/SEGV
Mar 14 09:26:05 222 systemd[1]: docker.service: Failed with result 'core-dump'.
Mar 14 09:26:05 222 systemd[1]: docker.service: Consumed 16min 33.530s CPU time.
Mar 14 09:26:07 222 systemd[1]: docker.service: Scheduled restart job, restart counter is at 1.

I'm not sure anymore that this is the same issue described by @pbering, so please let me know if you'd prefer me to open another issue.

@ndeloof
Copy link
Contributor

ndeloof commented Mar 14, 2025

@danieletorelli please report to github.com/moby/moby. Whenever there might be something wrong with Compose, engine should not crash as client API is somehow misused. cc @thaJeztah

@danieletorelli
Copy link

@ndeloof, thank you, I see that is already reported there in moby/moby#49513

@pbering
Copy link
Author

pbering commented Mar 17, 2025

if docker inspect <id> fails, that seems to demonstrate a docker engine issue with some resource "dead locked".

If you remove the second port in stack2, it also works as expected, strangely enough...

indeed, this I can't explain, but some race condition in docker engine ?

@ndeloof I haven't been able to reproduce this using only docker run, it only happens if something was starter by compose so I don't think it is a docker engine issue...

@ndeloof
Copy link
Contributor

ndeloof commented Mar 17, 2025

@pbering I'm not saying there's nothing wrong with Compose and we can't provide a fix, but if you see Docker engine stuck, this demonstrates an issue on engine side which should not be broken by some inadequate API calls. Remember compose relies on many API calls within a very short timeframe, compared to manual reproduction by docker run ... commands. This already revealed race conditions in engine in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants