Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address redis cache out of memory error #58

Open
abarciauskas-bgse opened this issue Jul 25, 2024 · 0 comments
Open

Address redis cache out of memory error #58

abarciauskas-bgse opened this issue Jul 25, 2024 · 0 comments

Comments

@abarciauskas-bgse
Copy link
Contributor

abarciauskas-bgse commented Jul 25, 2024

Problem:

  • prod-titiler-xarray started returning 500s with the error "command not allowed when used memory > 'maxmemory'.". I found this while trying to make changes to the VEDA UI codebase to accommodate a new titiler-cmr layer type.
  • looking at cloudwatch logs I could see more, that this was a redis error "redis.exceptions.OutOfMemoryError: command not allowed when used memory > 'maxmemory'."
  • decided to try rebooting, since I believed this would clear the cache
  • rebooting worked to resolve the errors but would still be good to know if it ran out of memory from all existing cached requests or if it was just one request that was too large
  • from the graph of database memory usage, you can see a steady increase to capacity, so I believe this was just a result of reaching the total capacity of the in-memory database.
Screenshot 2024-07-25 at 4 12 34 PM

Solutions:

  1. increase capacity via node size. right now it's using a t3.small which has 1.37GB memory. I think we should do this if we're planning to use titiler-xarray in production, which we don't have plans to do now but will likely do so in the future.
  2. Add a ttl to cache.set (https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Strategies.html#Strategies.WithTTL). Should it be 5 minutes? A week?
  3. Add an alarm to cloudwatch to notify us when database memory usage is near capacity

@vincentsarago @sharkinsspatial do you think the ttl should be 5 minutes, a day, a week?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant