You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been investigating an issue that happened in our self hosted Interval set up, which causes the client to be in a limbo, where it is not serving any actions.
To give some more information, we're hosting both client and server on AWS ECS. When the issue happened, we noticed that AWS restarted our server instance due to some network maintenance. When the server restarted, the client stopped serving actions, and it caused us to see the Nothing here yet message, which indicate that the client was not properly connected.
After some investigation, we saw that the server failed to initialize the host. Here's the error log that we could see on the server:
[31merror[39m: [31m Failed handling INITIALIZE_HOST:[39m
{
"error": {
"message": "\nInvalid `prisma.hostInstance.upsert()` invocation:\n\n\nQuery interpretation error. Error for binding '2': AssertionError(\"Expected a valid parent ID to be present for create follow-up for upsert query.\")",
"name": "PrismaClientKnownRequestError",
"stack": "PrismaClientKnownRequestError: \nInvalid `prisma.hostInstance.upsert()` invocation:\n\n\nQuery interpretation error. Error for binding '2': AssertionError(\"Expected a valid parent ID to be present for create follow-up for upsert query.\")\n at Ln.handleRequestError (/usr/local/lib/node_modules/@interval/server/node_modules/@prisma/client/runtime/library.js:121:7753)\n at Ln.handleAndLogRequestError (/usr/local/lib/node_modules/@interval/server/node_modules/@prisma/client/runtime/library.js:121:7061)\n at Ln.request (/usr/local/lib/node_modules/@interval/server/node_modules/@prisma/client/runtime/library.js:121:6745)\n at async l (/usr/local/lib/node_modules/@interval/server/node_modules/@prisma/client/runtime/library.js:130:9633)\n at async INITIALIZE_HOST (/usr/local/lib/node_modules/@interval/server/dist/src/wss/wss.js:436:50)\n at async DuplexRPCClient.handleReceivedCall (/usr/local/lib/node_modules/@interval/server/node_modules/@interval/sdk/dist/classes/DuplexRPCClient.js:90:29)\n at async DuplexRPCClient.onmessage (/usr/local/lib/node_modules/@interval/server/node_modules/@interval/sdk/dist/classes/DuplexRPCClient.js:111:21)\n at async Promise.all (index 0)"
},
"instanceId": "<instance id>",
"organizationId": "<organization id>"
}
The main problem with this is that Interval's public API doesn't allow us to identify when that situation happens. For example, if you run IntervalClient#ping it will succeed, as it only checks if the websocket connection is open and responsive. There's no way for us to check that the given client is connected and initialized/online.
This is a problem in our set up because we use IntervalClient#ping in our healthcheck, however, this gives us a false positive, given that the client is not really able to serve its actions.
Ideas
One way to solve this problem would be to pass a callback that is called when the initialization fails, something that could be added to #initializeHost. If this is too specific, we could also have a new method for inspecting/reacting to the RPC responses somehow.
The text was updated successfully, but these errors were encountered:
Description
I have been investigating an issue that happened in our self hosted Interval set up, which causes the client to be in a limbo, where it is not serving any actions.
To give some more information, we're hosting both client and server on AWS ECS. When the issue happened, we noticed that AWS restarted our server instance due to some network maintenance. When the server restarted, the client stopped serving actions, and it caused us to see the Nothing here yet message, which indicate that the client was not properly connected.
After some investigation, we saw that the server failed to initialize the host. Here's the error log that we could see on the server:
And this on the client:
Reproducing
In order to reproduce, you can simulate this issue with Interval Server by adding a
throw new Error('Reproducing')
in theINITIALIZE_HOST
handler:https://github.com/interval/server/blob/77ae7f0f080f8530bec9f8d64b40f1f63524984a/src/wss/wss.ts#L535-L540
This will force the host initialization to fail.
The problem
The main problem with this is that Interval's public API doesn't allow us to identify when that situation happens. For example, if you run
IntervalClient#ping
it will succeed, as it only checks if the websocket connection is open and responsive. There's no way for us to check that the given client is connected and initialized/online.This is a problem in our set up because we use
IntervalClient#ping
in our healthcheck, however, this gives us a false positive, given that the client is not really able to serve its actions.Ideas
One way to solve this problem would be to pass a callback that is called when the initialization fails, something that could be added to
#initializeHost
. If this is too specific, we could also have a new method for inspecting/reacting to the RPC responses somehow.The text was updated successfully, but these errors were encountered: