-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to upload to api/service/queue. Statuscode: InternalServerError #76
Comments
I was able to isolate this problem on a Sunday when there's almost no other load and it seemed to be caused by one of the queues. I changed it's configuration to use the newest package version and sent a null echo, and the issue is now gone. As far as the issue is concerned, I think we're happy to close this but I'd be interested to know why this worked. Are those configuration updates something we should do with other queues as well? Or is it about updating the config but not necessarily related to the configured package version? 🤔 I got the fix idea from this follow-up line in the logs:
|
Unfortunately this wasn't a permanent fix - after some time both errors started showing up again for the same queue / cashbox. |
One more little update: these 2 kinds of errors: |
Hey, thanks for the update. With the added information we were able to find the problem 🎉 We'll release a fix this week or the begining of next week. I'll notify you here once its out. |
Hey, my estimation was off by a bit 😅 We've just released |
Hey! Just released
|
Hey, we've also not forgotten about this one 😁 we just need to find the time to tackle it |
Hey! Any news on this? |
We've started to observe the same issue after upgrading from 1.3.44 (
After digging into this issue further, we noticed that the requests against Helipad ( When this issue is observed, our requests against the ByoDC middleware are timing out (our HTTP client timeout is set to 10 seconds due to operational reasons); can we please ask for your input/help on this? |
Hey, can you check if these problems are fixed in the |
Hey @volllly, Timeouts
5xx responses in logs
|
Hey @volllly, we tried to find a correlation between the ByoDC request timeouts and the errors, and noticed the following:
At this point, we're not sure if the timeouts are caused by the Helipad uploads, but just wanted to provide input. We'll change the log verbosity to see if we can catch any further logs. |
Hey @volllly, we observed a similar behavior; once we restarted the pods, we did not have any request timeouts against ByoDC for ~3 days. After 3 days, we started to observe timeouts, and one of the pods restarted after it became unhealthy. Before the pod restarted, there was a small spike in the memory usage (the chart given below shows the memory usage percentage where 100% corresponds to 1Gi): After the pod restarted, the spikes in CPU usages dropped even though the pod was receiving the exact same traffic for 13 out of 15 cashboxes: The CPU usage seems to have increasing number of spikes over time for the affected pod: And there is an increase in CPU usage spikes for all pods, but it depends on the # of transactions processed by the cashboxes they currently hold: It looks like the increase in average CPU usage can also be grouped in the following clusters:
We also observed an increase in memory usage over time, which may be an indicator of a leak 🤔: There are spikes in the increases that are caused by zero receipt requests ( Attached logs for the pod that was restarted below: |
@gregzip @mertcanyalhi we've released |
Also we just deployed an update to the sandbox helipad that should fix the 500/503 upload errors. |
@volllly Unfortunately, We tried to get a memory dump multiple times, but unfortunately, we're hitting the memory limit when we're getting a dump after the memory usage increases. We were able to get a dump from a pod that has the lowest amount of memory usage, and uploaded it, but I'm not sure if it'll help. We'll increase the memory limit once again, and will try to get a memory dump once we observe increasing memory usage. |
@volllly Uploaded more memory dumps for multiple pods that were captured in the past 3 days. |
@volllly The following graph shows the memory usage of a ByoDC pod, and once the memory usage started to increase, we started to observe the following more frequently:
|
@volllly We noticed that the # of The growing # of Yesterday while there was low traffic (85 signature requests in total from 3 cashboxes), we captured the The following breakdown shows the # of
The following breakdown shows the # of
|
Hey, we've pushed a fix to helipad you should not see 500 and 503 errors anymore. |
Hey @mertcanyalhi, are you still experiencing this issue, or did the fix deployed in November solve the problem? |
Hey @TSchmiedlechner
We're currently using |
We're still seeing heaps of errors daily like:
At some point I just stopped looking at them and decided to only act when clients are seriously affected, but it's something that'd be good to resolve - the very reason we monitor this is to avoid situations where a client has to call us 🙃 (we're at |
Describe the bug
We're seeing multiple errors with almost no description, pretty much just
Failed to upload to api/service/queue. Statuscode: InternalServerError
.To Reproduce
Let pods process requests for some time.
Expected behavior
Requests are processed correctly.
Screenshots
N/A
STDOUT/STDERR
(yes, the response message is empty)
POSSystem (please complete the following information):
Cashbox Information (please complete the following information):
Hard to say, we send hundreds of requests, some of them fail because of #74 and #75, and the error message doesn't contain it.
Additional context
It's less important than #74 but still annoying as it triggers alerts on our side.
The text was updated successfully, but these errors were encountered: