Improve monitoring strategy based on extension of /healthcheck endpoint #348

jimmie · 2023-06-10T00:35:18Z

💡 Description

The routine exercising of the /healthcheck endpoint by ECS (once deployed) provides an excellent opportunity to vastly improve automated monitoring capabilities via alerting. Consider the following workflow:

ECS submits a request to the /healthcheck endpoint
a. this request specifies thresholds (e.g. % of available JVM memory, % of free space on any Opensearch shard)
As part of fulfilling this request, the service:
a. Sends informative, well-defined messages to stdout
b. These messages are captured in the service log file, captured in Cloudwatch
c. Messages indicate any errors, such as violation of any of the thresholds specified in 1a or inability to access Opensearch, etc.
Alerts are placed on the Cloudwatch logs that detect any of the error message patterns produced in CICD did not publish the jar on artifactory #2
Alert messages are sent to operations to be addressed or at least made aware of situations needing attention (e.g. increase disk space)

jimmie added B14.0 task i&t.skip labels Jun 10, 2023

jimmie assigned jordanpadams Jun 10, 2023

jordanpadams added p.should-have icebox and removed B14.0 labels Jul 31, 2023

jordanpadams added this to EN Portfolio Backlog Nov 20, 2023

github-project-automation bot moved this to Release Backlog in EN Portfolio Backlog Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve monitoring strategy based on extension of /healthcheck endpoint #348

Improve monitoring strategy based on extension of /healthcheck endpoint #348

jimmie commented Jun 10, 2023

Improve monitoring strategy based on extension of /healthcheck endpoint #348

Improve monitoring strategy based on extension of /healthcheck endpoint #348

Comments

jimmie commented Jun 10, 2023

💡 Description