-
Notifications
You must be signed in to change notification settings - Fork 2k
Fix stdio_client kill process after timeout #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @cristipufu for working on this!
Do you have steps to reproduce? How this changes were tested?
await process.wait() | ||
except TimeoutError: | ||
# Force kill if it doesn't terminate | ||
process.kill() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same TerminateProcess
is called on Windows... Did this worked locally?
https://anyio.readthedocs.io/en/stable/api.html#anyio.abc.Process.kill
Slightly orthogonal but this does not handle child process spawns which can cause the parent to leave ghost processes which are being re-parented to the root process. Addressed it here: #850 |
Add comprehensive tests that validate stdio client cleanup behavior: - Universal timeout test ensures cleanup completes within reasonable time even with long-running processes - Immediate completion test ensures fast processes remain fast and timeout mechanism doesn't introduce delays - Cross-platform compatibility with different commands for Windows vs Unix These tests will help ensure PR #555's timeout mechanism works correctly without regressing normal operation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Changes from PR #555 by @cristipufu: - Remove Windows-specific terminate_windows_process function - Apply uniform 2-second timeout mechanism across all platforms: - Try process.terminate() first (SIGTERM on Unix, similar on Windows) - Wait up to 2 seconds for graceful termination - If timeout occurs, force process.kill() (SIGKILL equivalent) This ensures consistent behavior between Windows and Unix systems when cleaning up stdio client processes, preventing hanging when MCP servers don't respond to SIGTERM. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
This change demonstrates a simpler approach than PR #555's timeout mechanism: 1. **Evidence**: Extensive testing shows PR #559 stream cleanup alone prevents hanging, even with servers that ignore SIGTERM and keep streams open. 2. **Simplification**: Removes all timeout logic and Windows-specific termination function in favor of unified `process.terminate()` + stream cleanup. 3. **Benefits**: - Less code complexity (no timeout handling, no platform branching) - Preserves proven stream cleanup protection from PR #559 - Makes behavior consistent across all platforms - All existing timeout tests still pass 4. **Risk reduction**: Avoids changing process termination semantics while maintaining hanging protection through stream cleanup. The core insight: process hanging was caused by stream management issues (solved by PR #559), not termination timing issues (targeted by PR #555).
NOTE: this test FAIL without #555 This test verifies that stdio_client properly handles processes that ignore SIGTERM but respond to SIGINT, preventing hanging issues with Node.js servers and other interactive tools that expect Ctrl+C signals.
**NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill()
**NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill()
**NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill() This test verifies that on UNIX systems MCP servers that don't respect SIGTERM but e.g. SIGINT still get terminated after a grace period.
**NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill() This test verifies that on UNIX systems MCP servers that don't respect SIGTERM but e.g. SIGINT still get terminated after a grace period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cristipufu, thank you for working on this and for submitting this PR!
Having looked into this in some detail I think your simplification makes sense:
- There shouldn't be any change in behavior on Windows from this change, so we're good on that end
- #559 already made some changes to stream cleanup and I believe addresses SIGTERM hangs already on UNIX systems
- As you mention on #372 though, MCP servers that ignore SIGTERM may still be an issue on UNIX systems
- I was able to create a test (see below) that simulates a service that expects SIGINT and ignores SIGTERM - this PR fixes that case.
I created a draft PR #1044 with new regression tests in an attempt to actually reproduce the hanging resolved by both this change (#555), #559, and #765.
- #559 resolves
test_stdio_client_universal_cleanup
- #555 (this PR) resolves
test_stdio_client_sigint_only_process
- #765 resolves
test_stdio_client_graceful_stdin_exit
andtest_stdio_client_stdin_close_ignored
Please take a look if that reflects what you were going for here - if yes could you rebase your PR here on latest main?
**NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill() This test verifies that on UNIX systems MCP servers that don't respect SIGTERM but e.g. SIGINT still get terminated after a grace period.
**NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill() This test verifies that on UNIX systems MCP servers that don't respect SIGTERM but e.g. SIGINT still get terminated after a grace period.
**NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill() This test verifies that on UNIX systems MCP servers that don't respect SIGTERM but e.g. SIGINT still get terminated after a grace period.
**NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill() This test verifies that on UNIX systems MCP servers that don't respect SIGTERM but e.g. SIGINT still get terminated after a grace period.
**NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill() This test verifies that on UNIX systems MCP servers that don't respect SIGTERM but e.g. SIGINT still get terminated after a grace period.
cherry-pick of #555 Add regression tests for stdio cleanup hanging **NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill() This test verifies that on UNIX systems MCP servers that don't respect SIGTERM but e.g. SIGINT still get terminated after a grace period.
cherry-pick of #555 Add regression tests for stdio cleanup hanging **NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill() This test verifies that on UNIX systems MCP servers that don't respect SIGTERM but e.g. SIGINT still get terminated after a grace period.
cherry-pick of #555 Add regression tests for stdio cleanup hanging **NOTE: These tests FAIL without the changes introduced in #559** - test_stdio_client_universal_timeout - test_stdio_client_immediate_completion await read_stream.aclose() await write_stream.aclose() await read_stream_writer.aclose() await write_stream_reader.aclose() These tests verify that stdio_client completes cleanup within reasonable time for both slow-terminating and fast-exiting processes, preventing the hanging issues reported in #559. **NOTE: This test FAILS without the changes introduced in #555** - test_stdio_client_sigint_only_process try: process.terminate() with anyio.fail_after(2.0): await process.wait() except TimeoutError: # If process doesn't terminate in time, force kill it process.kill() This test verifies that on UNIX systems MCP servers that don't respect SIGTERM but e.g. SIGINT still get terminated after a grace period.
The stdio cleanup was hanging indefinitely when processes ignored termination signals or took too long to exit. This caused the MCP client to freeze during shutdown, especially with servers that don't handle SIGTERM properly. The fix introduces a 2-second timeout for process termination. If a process doesn't exit gracefully within this window, it's forcefully killed. This ensures the client always completes cleanup in bounded time while still giving well-behaved servers a chance to exit cleanly. This resolves hanging issues reported when MCP servers ignore standard termination signals or perform lengthy cleanup operations. Original PR: #555 by @cristipufu Github-Issue:#559 Github-Issue:#555
The stdio cleanup was hanging indefinitely when processes ignored termination signals or took too long to exit. This caused the MCP client to freeze during shutdown, especially with servers that don't handle SIGTERM properly. The fix introduces a 2-second timeout for process termination. If a process doesn't exit gracefully within this window, it's forcefully killed. This ensures the client always completes cleanup in bounded time while still giving well-behaved servers a chance to exit cleanly. This resolves hanging issues reported when MCP servers ignore standard termination signals or perform lengthy cleanup operations. Also adds regression tests for #559. resolves #555
@felixweinberger rebased |
The stdio cleanup was hanging indefinitely when processes ignored termination signals or took too long to exit. This caused the MCP client to freeze during shutdown, especially with servers that don't handle SIGTERM properly. The fix introduces a 2-second timeout for process termination. If a process doesn't exit gracefully within this window, it's forcefully killed. This ensures the client always completes cleanup in bounded time while still giving well-behaved servers a chance to exit cleanly. This resolves hanging issues reported when MCP servers ignore standard termination signals or perform lengthy cleanup operations. Also adds regression tests for #559. resolves #555 Co-authored by: @cristipufu
The stdio cleanup was hanging indefinitely when processes ignored termination signals or took too long to exit. This caused the MCP client to freeze during shutdown, especially with servers that don't handle SIGTERM properly. The fix introduces a 2-second timeout for process termination. If a process doesn't exit gracefully within this window, it's forcefully killed. This ensures the client always completes cleanup in bounded time while still giving well-behaved servers a chance to exit cleanly. This resolves hanging issues reported when MCP servers ignore standard termination signals or perform lengthy cleanup operations. Also adds regression tests for #559. resolves #555 Co-authored-by: Cristian Pufu <[email protected]>
Thank you @cristipufu for your response and again for submitting this PR! I spent yesterday and today trying to unify all the different approaches and fixes we have pending in this process termination space at the moment, as there are several interrelated fixes that either conflict or depend on each other - specifically #555, #729, #765, and #850. I've added your change to #1044 as a draft with you as a co-author + added extensive regression testing. Would you be OK with consolidating this change into #1044 for the comprehensive testing & process handling introduced there? |
@felixweinberger yes, of course, thanks! |
The stdio cleanup was hanging indefinitely when processes ignored termination signals or took too long to exit. This caused the MCP client to freeze during shutdown, especially with servers that don't handle SIGTERM properly. The fix introduces a 2-second timeout for process termination. If a process doesn't exit gracefully within this window, it's forcefully killed. This ensures the client always completes cleanup in bounded time while still giving well-behaved servers a chance to exit cleanly. This resolves hanging issues reported when MCP servers ignore standard termination signals or perform lengthy cleanup operations. Also adds regression tests for #559. resolves #555 Co-authored-by: Cristian Pufu <[email protected]>
The stdio cleanup was hanging indefinitely when processes ignored termination signals or took too long to exit. This caused the MCP client to freeze during shutdown, especially with servers that don't handle SIGTERM properly. The fix introduces a 2-second timeout for process termination. If a process doesn't exit gracefully within this window, it's forcefully killed. This ensures the client always completes cleanup in bounded time while still giving well-behaved servers a chance to exit cleanly. This resolves hanging issues reported when MCP servers ignore standard termination signals or perform lengthy cleanup operations. Also adds regression tests for #559. resolves #555 Co-authored-by: Cristian Pufu <[email protected]>
The stdio cleanup was hanging indefinitely when processes ignored termination signals or took too long to exit. This caused the MCP client to freeze during shutdown, especially with servers that don't handle SIGTERM properly. This was already being handled on Windows, but not Unix systems. This Commit unifies the two approaches, removing special logic for windows process termination. The fix introduces a 2-second timeout for process termination. If a process doesn't exit gracefully within this window, it's forcefully killed. This ensures the client always completes cleanup in bounded time while still giving well-behaved servers a chance to exit cleanly. This resolves hanging issues reported when MCP servers ignore standard termination signals. resolves #555 Also adds regression tests for #559. Co-authored-by: Cristian Pufu <[email protected]>
The stdio cleanup was hanging indefinitely when processes ignored termination signals or took too long to exit. This caused the MCP client to freeze during shutdown, especially with servers that don't handle SIGTERM properly. This was already being handled on Windows, but not Unix systems. This Commit unifies the two approaches, removing special logic for windows process termination. The fix introduces a 2-second timeout for process termination. If a process doesn't exit gracefully within this window, it's forcefully killed. This ensures the client always completes cleanup in bounded time while still giving well-behaved servers a chance to exit cleanly. This resolves hanging issues reported when MCP servers ignore standard termination signals. resolves #555 Also adds regression tests for #559. Co-authored-by: Cristian Pufu <[email protected]>
Motivation and Context
Be consistent across platforms (Windows, Linux, etc). Some MCP Servers don't react to SIGTERM, so
stdio_client
will hang indefinitely when exiting the context manager.Reference: #372
How Has This Been Tested?
Yes
Breaking Changes
No
Types of changes
Checklist