-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm chart with redis fails due to 'failed to set key to value' #336
Comments
I've tested this by additionally specifying the image tag 0.12.1:
... and it appears to work as expected, so it looks like this is specific to the default version in the current helm chart. |
Sorry, the above comment isn't correct - I believed it was working with 0.12.1 but now I'm seeing the same behaviour, so it seems to be transient / timing related on start up. Scaling the scraper deployment down to 0 and back to 1 after deploying seems to fix it consistently, so it might be related to #184 |
@mrsheepuk thanks for reporting. It indeed looks like a timing issue, but I would assume that the scraper component recovers once it's able to connect to redis. Isn't it the case? Or does the helm install process fail entirely? |
Thanks for responding @sagikazarmark - the helm install completes successfully, and the pods are all up and running as expected, it's just the errors that are logged which make it look like it's not working. I'll double check whether it effectively recovers or not - I was basing this on what was logged, but it might be that it's not logging when it works so I'd assumed it wasn't working when in fact it's recovered correctly. I'll report back shortly after confirming this. |
Well, I'm not completely sure TBH. The scraper will scrape the providers at a predefined interval (by default 24 hours), so chances are if the first scrape fails, but the pod doesn't exit, then it probably waits for the next cycle, which is not ideal. Let me know if it recovers. |
That's what I assumed was happening, however now I want to reproduce the problem, of course, it is working perfectly every time with no errors logged on 0.12.1. Testing again with the default version, however, I can see that on 0.9.15, it doesn't recover cleanly - everything is 'up and running' but there's no data persisted and available through the front end, and the log indicates it's not trying to re-scrape or re-persist anything. I imagine it would recover after 24 hours, as you suggest. I'll keep trying to reproduce this on 0.12.1 - it certainly seems to happen much less often on that version. |
There are no changes between 0.9.x and 0.12.x that would "fix" the recovery, so if it works it's probably because of timing (ie. it takes time to download the new image). I think the scraper should be resilient enough to retry scraping if the result cannot be saved, or hold it in memory until it can. |
First time I've looked at the cloudinfo code, so I might have something wrong, but it looks to me like if there are any failures writing into redis, they will be logged but essentially ignored, for example (added a few comments inline):
And the way this is being called:
The 'sledgehammer' approach to this specific problem could be something that pings a test value into redis repeatedly on startup waiting for it to be available/healthy before completing startup - but that wouldn't help if there was a transient problem with the redis instance at some later point in time. A more nuanced approach would require changing all of these StoreXxx interfaces to return the success/failure so wherever they are called could retry as needed. If I get time later, I'll see if I can work out a PR to attempt to address one or other of these options. |
@sagikazarmark I've raised a PR to simply fail on startup if the product store is not available, let me know what you think. |
Describe the bug
Installing cloudinfo using the helm chart at 0.7.1 and the default version of the cloudinfo image (0.9.15) fails to create a working environment when enabling redis in the chart, as the scraper cannot write to it, logging errors like the below for every value it tries to write:
{"application":"cloudinfo","cistore":"redis","environment":"production","key":"/banzaicloud.com/cloudinfo/providers/google/regions/asia-northeast3/prices/n1-highcpu-2","level":"error","msg":"failed to set key to value","time":"2020-06-22T09:21:22Z","value":{"onDemandPrice":0.0909891416015625,"spotPrice":{}}}
Steps to reproduce the issue:
helm install cloudinftest banzaicloud-stable/cloudinfo -f ./cloudinfo-deploy.yml
... with the following values in cloudinfo-deploy.yml (including correct and valid google credentials):
Expected behavior
Scraper works as expected and posts values into redis for the application.
Screenshots
n/a
Additional context
Tested only with Google Cloud enabled at this time.
The text was updated successfully, but these errors were encountered: