Skip to content

Subprocess kill win32 #729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jingx8885
Copy link

@jingx8885 jingx8885 commented May 16, 2025

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

jingx8885 added 8 commits May 8, 2025 12:05
- Refactor process termination logic to handle both parent and child processes
- Increase timeout for graceful termination from 1s to 2s
- Consolidate process termination functions to avoid zombie processes
- Add proper type hints for process list handling

This change ensures that all child processes are properly terminated
when killing a parent process on Windows, preventing zombie processes
from being left behind.
@jingx8885
Copy link
Author

@dsp-ant require code review

@jingx8885 jingx8885 closed this May 17, 2025
@jingx8885 jingx8885 reopened this May 17, 2025
@jingx8885
Copy link
Author

@dsp-ant require code review, thinks.

@ihrpr ihrpr modified the milestones: r-05-25, stdio shutdown May 23, 2025
Copy link
Contributor

@felixweinberger felixweinberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jingx8885, thank you for working on this and submitting a PR for process tree termination!

We're currently reviewing PR #555 which takes a simpler, cross-platform approach to the hanging process issue by standardizing the termination logic across all platforms. However, it doesn't modify any Windows behavior, so your change may still be needed.

Before we add the complexity of process tree handling and the psutil dependency (which also requires setup of Microsoft C++ 14.0 setup apparently in my local testing, so adding more setup complexity), could you help us understand:

  1. The Windows platform tests are still failing in this PR - could you investigate those failures first? I checked out this PR on my local machine and the tests do still fail.
  2. Do you have specific test cases or reproduction scenarios where child processes remain hanging after the parent is terminated? If we could create a test or sample script to illustrate the issue you're seeing clearly, that would help us massively in figuring out the right fix.

Given the additional complexity of adding psutil, we'd prefer to only add complexity if we have concrete cases with failing tests that demonstrate it's needed.

Also I realize this PR has been in queue for a while - if you'd be able to rebase on main to address 1 & 2 above that would be greatly appreciated!

@jingx8885
Copy link
Author

  1. PR test is passed.

  2. This issue is difficult to write test cases for, but I can describe the specific problem. During development of a Python MCP server using the standard uv-pyproject structure, I conducted integration testing with the mcp-agent project. When configuring the MCP server in stdio mode and launching it via uv run --env-file=xxx mcp_server.py, the service initialization creates a Python subprocess that spawns a uv.exe process, which in turn launches the actual MCP stdio service process. The issue occurs when terminating the parent subprocess - it fails to kill the child processes, leaving both the uv.exe and MCP stdio services as zombie processes.
    @felixweinberger

felixweinberger added a commit that referenced this pull request Jun 30, 2025
- Add process group support on POSIX systems via start_new_session=True
- Implement _terminate_process_with_children() and _kill_process_and_children()
  functions that properly handle child process cleanup
- On POSIX: Kill entire process group at once to avoid orphaning children
  when parent dies first
- On Windows: Use taskkill /T for process tree termination
- Add pid property to FallbackProcess for consistent interface
- Add proper type annotations (Process | FallbackProcess)
- Replace existing termination logic with new unified approach

This addresses issues from PRs #850 (npx child processes) and #729
(Windows process trees) in a unified, dependency-free way.

Reported-by:surya-prakash-susarla
Reported-by:jingx8885
Github-Issue:#547
felixweinberger added a commit that referenced this pull request Jul 1, 2025
- Add process group support on POSIX systems via start_new_session=True
- Implement _terminate_process_with_children() and _kill_process_and_children()
  functions that properly handle child process cleanup
- On POSIX: Kill entire process group at once to avoid orphaning children
  when parent dies first
- On Windows: Use taskkill /T for process tree termination
- Add pid property to FallbackProcess for consistent interface
- Add proper type annotations (Process | FallbackProcess)
- Replace existing termination logic with new unified approach

This addresses issues from PRs #850 (npx child processes) and #729
(Windows process trees) in a unified, dependency-free way.

Reported-by:surya-prakash-susarla
Reported-by:jingx8885
Github-Issue:#547

Add regression tests for child process termination

- test_stdio_client_child_process_cleanup: Verifies that child processes
  spawned by the parent (like npx spawning node) are properly terminated
- test_stdio_client_nested_process_tree: Tests that deeply nested process
  trees (parent -> child -> grandchild) are all terminated
- test_stdio_client_early_parent_exit: Tests the race condition where
  parent exits during cleanup but children are still terminated via
  process group

These tests verify the fix works correctly across different subprocess
spawning scenarios without requiring external dependencies like psutil.
felixweinberger added a commit that referenced this pull request Jul 1, 2025
- Add process group support on POSIX systems via start_new_session=True
- Implement _terminate_process_with_children() and _kill_process_and_children()
  functions that properly handle child process cleanup
- On POSIX: Kill entire process group at once to avoid orphaning children
  when parent dies first
- On Windows: Use taskkill /T for process tree termination
- Add pid property to FallbackProcess for consistent interface
- Add proper type annotations (Process | FallbackProcess)
- Replace existing termination logic with new unified approach

This addresses issues from PRs #850 (npx child processes) and #729
(Windows process trees) in a unified, dependency-free way.

Reported-by:surya-prakash-susarla
Reported-by:jingx8885
Github-Issue:#547

Add regression tests for child process termination

- test_stdio_client_child_process_cleanup: Verifies that child processes
  spawned by the parent (like npx spawning node) are properly terminated
- test_stdio_client_nested_process_tree: Tests that deeply nested process
  trees (parent -> child -> grandchild) are all terminated
- test_stdio_client_early_parent_exit: Tests the race condition where
  parent exits during cleanup but children are still terminated via
  process group

These tests verify the fix works correctly across different subprocess
spawning scenarios without requiring external dependencies like psutil.
felixweinberger added a commit that referenced this pull request Jul 1, 2025
- Add process group support on POSIX systems via start_new_session=True
- Implement _terminate_process_with_children() and _kill_process_and_children()
  functions that properly handle child process cleanup
- On POSIX: Kill entire process group at once to avoid orphaning children
  when parent dies first
- On Windows: Use taskkill /T for process tree termination
- Add pid property to FallbackProcess for consistent interface
- Add proper type annotations (Process | FallbackProcess)
- Replace existing termination logic with new unified approach

This addresses issues from PRs #850 (npx child processes) and #729
(Windows process trees) in a unified, dependency-free way.

Reported-by:surya-prakash-susarla
Reported-by:jingx8885
Github-Issue:#547

Add regression tests for child process termination

- test_stdio_client_child_process_cleanup: Verifies that child processes
  spawned by the parent (like npx spawning node) are properly terminated
- test_stdio_client_nested_process_tree: Tests that deeply nested process
  trees (parent -> child -> grandchild) are all terminated
- test_stdio_client_early_parent_exit: Tests the race condition where
  parent exits during cleanup but children are still terminated via
  process group

These tests verify the fix works correctly across different subprocess
spawning scenarios without requiring external dependencies like psutil.
felixweinberger added a commit that referenced this pull request Jul 1, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729
felixweinberger added a commit that referenced this pull request Jul 1, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729
felixweinberger added a commit that referenced this pull request Jul 1, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729
felixweinberger added a commit that referenced this pull request Jul 1, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729
felixweinberger added a commit that referenced this pull request Jul 1, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729
felixweinberger added a commit that referenced this pull request Jul 1, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729
felixweinberger added a commit that referenced this pull request Jul 1, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729
felixweinberger added a commit that referenced this pull request Jul 1, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729
felixweinberger added a commit that referenced this pull request Jul 1, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored by: @jingx8885, @surya-prakash-susarla
felixweinberger added a commit that referenced this pull request Jul 1, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored-by: jingx8885 <[email protected]>
Co-authored-by: Surya Prakash Susarla <[email protected]>
@felixweinberger
Copy link
Contributor

felixweinberger commented Jul 1, 2025

Thank you again @jingx8885 for submitting this PR and for your detailed responses on what failures you were seeing - extremely helpful.

I spent yesterday and today trying to unify all the different approaches and fixes we have pending in this process termination space at the moment, as there are several interrelated fixes that either conflict or depend on each other - specifically #555, #729, #765, and #850.

I've added your change to #1044 as a draft with you as a co-author + added extensive regression testing. Would you be OK with consolidating this change into #1044 for the comprehensive testing & process handling introduced there?

If you're able to, I'd also love your feedback whether the unified approach in #1044 successfully addresses the issue you're seeing in this PR.

@jingx8885
Copy link
Author

@felixweinberger
Thank you again for consolidating these changes and including me as a co-author on #1044. I've now had a chance to review the draft #1044 code in detail.

Great news: I can confirm that my specific changes from this PR have been correctly integrated into the unified approach of #1044. More importantly, the logic addressing the process termination issue I encountered is fully present.

Critical Validation: I want to highlight that this exact logic (or its direct equivalent now in #1044) has been running successfully in both our production environment and daily development/testing cycles for a month. It has proven to be robust and reliable on Windows 11 under our workload conditions. This gives me high confidence that #1044 effectively resolves the core issue I originally aimed to fix.

Therefore, I fully support merging these changes via #1044 and agree to close this PR in favor of it.

I remain very keen to collaborate on finalizing #1044. Please let me know how I can best contribute next – whether it's further testing on specific platforms/scenarios, reviewing recent updates, or assisting with documentation.

Looking forward to getting this comprehensive fix merged!

Best regards,

@jingx8885
Copy link
Author

Close this merge, support #1044 to fix this bug.

@jingx8885 jingx8885 closed this Jul 1, 2025
felixweinberger added a commit that referenced this pull request Jul 4, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored-by: jingx8885 <[email protected]>
Co-authored-by: Surya Prakash Susarla <[email protected]>
felixweinberger added a commit that referenced this pull request Jul 4, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored-by: jingx8885 <[email protected]>
Co-authored-by: Surya Prakash Susarla <[email protected]>
felixweinberger added a commit that referenced this pull request Jul 4, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored-by: jingx8885 <[email protected]>
Co-authored-by: Surya Prakash Susarla <[email protected]>
felixweinberger added a commit that referenced this pull request Jul 4, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored-by: jingx8885 <[email protected]>
Co-authored-by: Surya Prakash Susarla <[email protected]>
felixweinberger added a commit that referenced this pull request Jul 4, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored-by: jingx8885 <[email protected]>
Co-authored-by: Surya Prakash Susarla <[email protected]>
felixweinberger added a commit that referenced this pull request Jul 4, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored-by: jingx8885 <[email protected]>
Co-authored-by: Surya Prakash Susarla <[email protected]>
felixweinberger added a commit that referenced this pull request Jul 4, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored-by: jingx8885 <[email protected]>
Co-authored-by: Surya Prakash Susarla <[email protected]>
felixweinberger added a commit that referenced this pull request Jul 4, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored-by: jingx8885 <[email protected]>
Co-authored-by: Surya Prakash Susarla <[email protected]>
felixweinberger added a commit that referenced this pull request Jul 4, 2025
When terminating MCP servers, child processes were being orphaned
because only the parent process was killed. This caused resource
leaks and prevented proper cleanup, especially with tools like npx
that spawn child processes for the actual server implementation.

This was happening on both POSIX and Windows systems - however
because of implementation details, resolving this is non-trivial
and requires introducing psutil to introduce cross-platform
utilities for dealing with children and process trees.

This addresses critical issues where MCP servers using process
spawning tools would leave zombie processes running after client
shutdown.

resolves #850
resolves #729

Co-authored-by: jingx8885 <[email protected]>
Co-authored-by: Surya Prakash Susarla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants