Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve monitoring strategy based on extension of /healthcheck endpoint #348

Open
jimmie opened this issue Jun 10, 2023 · 0 comments
Open

Comments

@jimmie
Copy link
Contributor

jimmie commented Jun 10, 2023

💡 Description

The routine exercising of the /healthcheck endpoint by ECS (once deployed) provides an excellent opportunity to vastly improve automated monitoring capabilities via alerting. Consider the following workflow:

  1. ECS submits a request to the /healthcheck endpoint
    a. this request specifies thresholds (e.g. % of available JVM memory, % of free space on any Opensearch shard)
  2. As part of fulfilling this request, the service:
    a. Sends informative, well-defined messages to stdout
    b. These messages are captured in the service log file, captured in Cloudwatch
    c. Messages indicate any errors, such as violation of any of the thresholds specified in 1a or inability to access Opensearch, etc.
  3. Alerts are placed on the Cloudwatch logs that detect any of the error message patterns produced in CICD did not publish the jar on artifactory #2
  4. Alert messages are sent to operations to be addressed or at least made aware of situations needing attention (e.g. increase disk space)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ToDo
Development

No branches or pull requests

2 participants