Skip to content

fix: handle the client disconnect so that the server does not crash. #380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

wolfdancer
Copy link

@wolfdancer wolfdancer commented May 6, 2025

I noticed that the Inspector can only connect to a server once. Once disconnected, nothing appears to work. Upon some debugging, I realized that it is because the server got an error in its SSE message in the case where the client is disconnected.

Motivation and Context

With this change, the "Restart" and "Disconnect" buttons at the bottom of the SideBar will work as expected.

How Has This Been Tested?

Yes. The test scenario is pretty straightforward:

  1. After starting the application, configure it to connect to a MCP server using stdio mode
  2. Click Restart and confirm that everything still works as expected (before the fix, everything stops working at this point)
  3. Click Disconnect, the button changes to Connect as expected
  4. Click Connect and confirm everything works as expected (before the fix Connect button does not do anything at this point)

Breaking Changes

No.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Before I realized it was the server that was crashing, I was looking into client-server communication. In that process, I cleaned up the console logging so that it is clear which API on the server was called without having to remember the code. I believe that makes an easier troubleshooting experience although I understand those changes are not related to the bug fix. I am happy to submit a separate PR for the logging enhancement if that is better.

@cliffhall
Copy link
Contributor

cliffhall commented May 15, 2025

Hi @wolfdancer!

While the changes look sane, I am not able to duplicate the inciting condition.

Here is a video, where I repeatedly connect and disconnect from the server reference implementation server-everyhing with the current Inspector.

Doing this test prompted me to make a PR for that server to replace console.log with console.error and remove the auto-close on client disconnect for the SSE server. But even without these changes, I could connect/disconnect/connect just fine on STDIO and StreamableHTTP, it was just the SSE server that didn't accept reconnects.

If the server stays up, the Inspector doesn't seem to have a problem reconnecting for me. Can you provide a way to duplicate the problem in order to test that your changes are fixing it?

@wolfdancer
Copy link
Author

Hi @cliffhall, I can confirm that the issue is fixed with the latest code based on my test with a MCP server that I am putting together. Let me take another look at the code to see if some other change actually fixed this bug.

@wolfdancer
Copy link
Author

Hi again @cliffhall, I am not sure why at the moment, but apparently this only happens with the github MCP server. The reason that I though it was fixed was beacuse I used a MCP server that I was writing and that MCP server was just a python application.

Please see attached screen recording. In the recording, I cloned this repo and ran the server/client separately. Through comparison I realized it was not the way I started the server, but the MCP server itself that made the difference.

inspector-disconnect-bug-with-github-mcp-server.mov

Since I have Github MCP server installed on my Claude client, I was able to just launch inspector in the following command and reproduce the issue
npx @modelcontextprotocol/inspector --config ~/Library/Application\ Support/Claude/claude_desktop_config.json --server github
I'll look for other docker MCP server and see if it is docker related.

@wolfdancer
Copy link
Author

wolfdancer commented May 19, 2025

hi again @cliffhall, I think I have figured out that the issue is from how the server handles the stderr output from the MCP server. The Github MCP server must print something in stderr, which the inspector server is sending over to the inspector client and fails in the case where the client is disconnected, causing the server to crash.

This can be reproduced with my example MCP server that prints out messages in stderr: https://github.com/wolfdancer/mcp-example.py. The command to launch is vu run arxiv_server.py once clone the repo. It is a standard example MCP server in python.

From the tests and revewing the code, I have realized two things that I missed the first time:

  • The code that handles the client disconnection ("Handle client disconnection") is the most critical change. It is around line 258.
  • In the case that the client crashes, which can be produced by killing the client, the server can have some additional resiliency code. But my originar PR was not good enough. With some testing (and help from Claude!), I have updated my PR to align with the async call.

The PR as of now (2025-05-18 11:46pm Pacific) should be good to go. Please let me know if you have any feedback. Cheers.

@cliffhall
Copy link
Contributor

@wolfdancer Not being a Python person, I don't have vu on my system. You say the Github server in the servers repo is also causing the problem, perhaps you can do a screen capture or provide us with steps to reproduce?

@olaservo @evalstate If either of you have vu installed, maybe you could give @wolfdancer's example server a test to see if you can reproduce this problem?

@wolfdancer
Copy link
Author

Hi there. I did a round of testing and it only happens with the official Github MCP Server, which is written in Go, and my testing server, which is written in Python. This cannot be reproduced with TypeScript MCP servers, which are all the servers in the servers repo.

It appears to be the way node handles the kill signal. I cannot get the MCP server written in TypeScript to catch the signal when the inspector is shutting it down.

If @olaservo or @evalstate , if you have your own python MCP server, something like this should do the trick: https://github.com/wolfdancer/mcp-example.py/blob/main/arxiv_server.py#L101-L109

@cliffhall
Copy link
Contributor

This cannot be reproduced with TypeScript MCP servers, which are all the servers in the servers repo.

@wolfdancer There are a handful of python servers in the servers repo:

@wolfdancer
Copy link
Author

hi @cliffhall, thanks for the list. I should have looked further.

I applied the change to git server and was able to reproduce the issue. Assuming you have docker, I think this will work without the need to mess with python environments

  • pull in my commit from here: modelcontextprotocol/servers@5b74e6c
  • build docker image: docker build -t mcp/git .
  • run published inspector to reproduce the issue
    • npx @modelcontextprotocol/inspector docker run -i --rm mcp/git
    • click connect
    • click disconnect
    • try to click connect again and the error will occur
  • run against changes in this PR to confirm the fix
    • At the project root, start updated inspector npm run dev
    • Use the same docker image built in step above: command "docker", argument "run -i --rm mcp/git"
    • The connect/disconnect button works all the time now.

Please let me know if this clarifies. I can do a screen capturing with the above steps if that helps.

@cliffhall
Copy link
Contributor

  • pull in my commit from here: modelcontextprotocol/servers@5b74e6c

  • build docker image: docker build -t mcp/git .

  • run published inspector to reproduce the issue

    • npx @modelcontextprotocol/inspector docker run -i --rm mcp/git
    • click connect
    • click disconnect
    • try to click connect again and the error will occur

Ok. Breaks as expected.

Breaking.mov
  • run against changes in this PR to confirm the fix

    • At the project root, start updated inspector npm run dev
    • Use the same docker image built in step above: command "docker", argument "run -i --rm mcp/git"
    • The connect/disconnect button works all the time now.

Doesn't break with your fix.

NotBreaking.mov

cliffhall
cliffhall previously approved these changes May 20, 2025
Copy link
Contributor

@cliffhall cliffhall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@cliffhall
Copy link
Contributor

@wolfdancer just need to run prettier to pass the CI.

@wolfdancer
Copy link
Author

Sorry about that. I forgot to run prettier with the async update commit. Updated.

@wolfdancer wolfdancer requested a review from cliffhall May 20, 2025 16:28
@wolfdancer
Copy link
Author

weird... based on CoPilot, it is something to do with ToolsTab.tsx. I'll take a further look later today.

@wolfdancer
Copy link
Author

Hi @cliffhall, can you approve when you get a chance? I ran the prettier. I also figured out how to fix the second PR build. I am not sure why it would suddenly fail but I did confirm the build failure locally, and confirmed that the build pass with the change in 80377bc

@cliffhall
Copy link
Contributor

I am not sure why it would suddenly fail but I did confirm the build failure locally, and confirmed that the build pass with the change in 80377bc

This issue is currently keeping us from merging any PRs. The problem stems from a recent breaking change added to the SDK. You can see more about it in this draft PR I made yesterday to try and address it. We won't be merging it because the changes are going to be fixed in the SDK shortly. You can revert the change in your PR as well.

@wolfdancer
Copy link
Author

I have reverted the latest commit. I will squash the commits when the PR is ready for merge.

@cliffhall cliffhall enabled auto-merge May 21, 2025 20:27
@cliffhall cliffhall disabled auto-merge May 21, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants