change rds cpu critical to warning and increase evaluation period #1783

sastels · 2025-02-26T15:39:35Z

Summary | Résumé

This alarm went off yesterday evening and the system stayed solid. We can change this to a warning so that if other badness is happening we will still be aware of the RDS cpu load.

We also change this and the other RDS cpu alarm to require the load to be above the threshold for 5 minutes in a row

Test instructions | Instructions pour tester la modification

View in staging / prod

Release Instructions | Instructions pour le déploiement

None.

Reviewer checklist | Liste de vérification du réviseur

This PR does not break existing functionality.
This PR does not violate GCNotify's privacy policies.
This PR does not raise new security concerns. Refer to our GC Notify Risk Register document on our Google drive.
This PR does not significantly alter performance.
Additional required documentation resulting of these changes is covered (such as the README, setup instructions, a related ADR or the technical documentation).

⚠ If boxes cannot be checked off before merging the PR, they should be moved to the "Release Instructions" section with appropriate steps required to verify before release. For example, changes to celery code may require tests on staging to verify that performance has not been affected.

github-actions · 2025-02-26T16:06:44Z

staging: rds

✅ Terraform Init: success
✅ Terraform Validate: success
✅ Terraform Format: success
✅ Terraform Plan: success
✅ Conftest: success

⚠️ Warning: resources will be destroyed by this change!

Plan: 3 to add, 3 to change, 3 to destroy

Show summary

CHANGE	NAME
delete	`aws_cloudwatch_metric_alarm.high-db-cpu-critical[0]`
	`aws_cloudwatch_metric_alarm.high-db-cpu-critical[1]`
	`aws_cloudwatch_metric_alarm.high-db-cpu-critical[2]`
update	`aws_cloudwatch_metric_alarm.high-db-cpu-warning[0]`
	`aws_cloudwatch_metric_alarm.high-db-cpu-warning[1]`
	`aws_cloudwatch_metric_alarm.high-db-cpu-warning[2]`
add	`aws_cloudwatch_metric_alarm.very-high-db-cpu-warning[0]`
	`aws_cloudwatch_metric_alarm.very-high-db-cpu-warning[1]`
	`aws_cloudwatch_metric_alarm.very-high-db-cpu-warning[2]`

Show plan

Resource actions are indicated with the following symbols:
  + create
  ~ update in-place
  - destroy

Terraform will perform the following actions:

  # aws_cloudwatch_metric_alarm.high-db-cpu-critical[0] will be destroyed
  # (because aws_cloudwatch_metric_alarm.high-db-cpu-critical is not in configuration)
  - resource "aws_cloudwatch_metric_alarm" "high-db-cpu-critical" {
      - actions_enabled                       = true -> null
      - alarm_actions                         = [
          - "arn:aws:sns:ca-central-1:239043911459:alert-critical",
        ] -> null
      - alarm_description                     = "CPU usage of the RDS instance > 95%" -> null
      - alarm_name                            = "high-db-cpu-critical-instance-0" -> null
      - arn                                   = "arn:aws:cloudwatch:ca-central-1:239043911459:alarm:high-db-cpu-critical-instance-0" -> null
      - comparison_operator                   = "GreaterThanThreshold" -> null
      - datapoints_to_alarm                   = 0 -> null
      - dimensions                            = {
          - "DBInstanceIdentifier" = "notification-canada-ca-staging-instance-0"
        } -> null
      - evaluation_periods                    = 1 -> null
      - id                                    = "high-db-cpu-critical-instance-0" -> null
      - insufficient_data_actions             = [] -> null
      - metric_name                           = "CPUUtilization" -> null
      - namespace                             = "AWS/RDS" -> null
      - ok_actions                            = [] -> null
      - period                                = 60 -> null
      - statistic                             = "Average" -> null
      - tags                                  = {} -> null
      - tags_all                              = {} -> null
      - threshold                             = 95 -> null
      - treat_missing_data                    = "notBreaching" -> null
        # (4 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.high-db-cpu-critical[1] will be destroyed
  # (because aws_cloudwatch_metric_alarm.high-db-cpu-critical is not in configuration)
  - resource "aws_cloudwatch_metric_alarm" "high-db-cpu-critical" {
      - actions_enabled                       = true -> null
      - alarm_actions                         = [
          - "arn:aws:sns:ca-central-1:239043911459:alert-critical",
        ] -> null
      - alarm_description                     = "CPU usage of the RDS instance > 95%" -> null
      - alarm_name                            = "high-db-cpu-critical-instance-1" -> null
      - arn                                   = "arn:aws:cloudwatch:ca-central-1:239043911459:alarm:high-db-cpu-critical-instance-1" -> null
      - comparison_operator                   = "GreaterThanThreshold" -> null
      - datapoints_to_alarm                   = 0 -> null
      - dimensions                            = {
          - "DBInstanceIdentifier" = "notification-canada-ca-staging-instance-1"
        } -> null
      - evaluation_periods                    = 1 -> null
      - id                                    = "high-db-cpu-critical-instance-1" -> null
      - insufficient_data_actions             = [] -> null
      - metric_name                           = "CPUUtilization" -> null
      - namespace                             = "AWS/RDS" -> null
      - ok_actions                            = [] -> null
      - period                                = 60 -> null
      - statistic                             = "Average" -> null
      - tags                                  = {} -> null
      - tags_all                              = {} -> null
      - threshold                             = 95 -> null
      - treat_missing_data                    = "notBreaching" -> null
        # (4 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.high-db-cpu-critical[2] will be destroyed
  # (because aws_cloudwatch_metric_alarm.high-db-cpu-critical is not in configuration)
  - resource "aws_cloudwatch_metric_alarm" "high-db-cpu-critical" {
      - actions_enabled                       = true -> null
      - alarm_actions                         = [
          - "arn:aws:sns:ca-central-1:239043911459:alert-critical",
        ] -> null
      - alarm_description                     = "CPU usage of the RDS instance > 95%" -> null
      - alarm_name                            = "high-db-cpu-critical-instance-2" -> null
      - arn                                   = "arn:aws:cloudwatch:ca-central-1:239043911459:alarm:high-db-cpu-critical-instance-2" -> null
      - comparison_operator                   = "GreaterThanThreshold" -> null
      - datapoints_to_alarm                   = 0 -> null
      - dimensions                            = {
          - "DBInstanceIdentifier" = "notification-canada-ca-staging-instance-2"
        } -> null
      - evaluation_periods                    = 1 -> null
      - id                                    = "high-db-cpu-critical-instance-2" -> null
      - insufficient_data_actions             = [] -> null
      - metric_name                           = "CPUUtilization" -> null
      - namespace                             = "AWS/RDS" -> null
      - ok_actions                            = [] -> null
      - period                                = 60 -> null
      - statistic                             = "Average" -> null
      - tags                                  = {} -> null
      - tags_all                              = {} -> null
      - threshold                             = 95 -> null
      - treat_missing_data                    = "notBreaching" -> null
        # (4 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.high-db-cpu-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "high-db-cpu-warning" {
      ~ alarm_description                     = "CPU usage of the RDS instance > 80%" -> "CPU usage of the RDS instance > 80% for 5 minutes"
      ~ datapoints_to_alarm                   = 0 -> 5
      ~ evaluation_periods                    = 1 -> 5
        id                                    = "high-db-cpu-warning-instance-0"
        tags                                  = {}
        # (19 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.high-db-cpu-warning[1] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "high-db-cpu-warning" {
      ~ alarm_description                     = "CPU usage of the RDS instance > 80%" -> "CPU usage of the RDS instance > 80% for 5 minutes"
      ~ datapoints_to_alarm                   = 0 -> 5
      ~ evaluation_periods                    = 1 -> 5
        id                                    = "high-db-cpu-warning-instance-1"
        tags                                  = {}
        # (19 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.high-db-cpu-warning[2] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "high-db-cpu-warning" {
      ~ alarm_description                     = "CPU usage of the RDS instance > 80%" -> "CPU usage of the RDS instance > 80% for 5 minutes"
      ~ datapoints_to_alarm                   = 0 -> 5
      ~ evaluation_periods                    = 1 -> 5
        id                                    = "high-db-cpu-warning-instance-2"
        tags                                  = {}
        # (19 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.very-high-db-cpu-warning[0] will be created
  + resource "aws_cloudwatch_metric_alarm" "very-high-db-cpu-warning" {
      + actions_enabled                       = true
      + alarm_actions                         = [
          + "arn:aws:sns:ca-central-1:239043911459:alert-warning",
        ]
      + alarm_description                     = "CPU usage of the RDS instance > 95% for 5 minutes"
      + alarm_name                            = "very-high-db-cpu-warning-instance-0"
      + arn                                   = (known after apply)
      + comparison_operator                   = "GreaterThanThreshold"
      + datapoints_to_alarm                   = 5
      + dimensions                            = {
          + "DBInstanceIdentifier" = "notification-canada-ca-staging-instance-0"
        }
      + evaluate_low_sample_count_percentiles = (known after apply)
      + evaluation_periods                    = 5
      + id                                    = (known after apply)
      + metric_name                           = "CPUUtilization"
      + namespace                             = "AWS/RDS"
      + period                                = 60
      + statistic                             = "Average"
      + tags_all                              = (known after apply)
      + threshold                             = 95
      + treat_missing_data                    = "notBreaching"
    }

  # aws_cloudwatch_metric_alarm.very-high-db-cpu-warning[1] will be created
  + resource "aws_cloudwatch_metric_alarm" "very-high-db-cpu-warning" {
      + actions_enabled                       = true
      + alarm_actions                         = [
          + "arn:aws:sns:ca-central-1:239043911459:alert-warning",
        ]
      + alarm_description                     = "CPU usage of the RDS instance > 95% for 5 minutes"
      + alarm_name                            = "very-high-db-cpu-warning-instance-1"
      + arn                                   = (known after apply)
      + comparison_operator                   = "GreaterThanThreshold"
      + datapoints_to_alarm                   = 5
      + dimensions                            = {
          + "DBInstanceIdentifier" = "notification-canada-ca-staging-instance-1"
        }
      + evaluate_low_sample_count_percentiles = (known after apply)
      + evaluation_periods                    = 5
      + id                                    = (known after apply)
      + metric_name                           = "CPUUtilization"
      + namespace                             = "AWS/RDS"
      + period                                = 60
      + statistic                             = "Average"
      + tags_all                              = (known after apply)
      + threshold                             = 95
      + treat_missing_data                    = "notBreaching"
    }

  # aws_cloudwatch_metric_alarm.very-high-db-cpu-warning[2] will be created
  + resource "aws_cloudwatch_metric_alarm" "very-high-db-cpu-warning" {
      + actions_enabled                       = true
      + alarm_actions                         = [
          + "arn:aws:sns:ca-central-1:239043911459:alert-warning",
        ]
      + alarm_description                     = "CPU usage of the RDS instance > 95% for 5 minutes"
      + alarm_name                            = "very-high-db-cpu-warning-instance-2"
      + arn                                   = (known after apply)
      + comparison_operator                   = "GreaterThanThreshold"
      + datapoints_to_alarm                   = 5
      + dimensions                            = {
          + "DBInstanceIdentifier" = "notification-canada-ca-staging-instance-2"
        }
      + evaluate_low_sample_count_percentiles = (known after apply)
      + evaluation_periods                    = 5
      + id                                    = (known after apply)
      + metric_name                           = "CPUUtilization"
      + namespace                             = "AWS/RDS"
      + period                                = 60
      + statistic                             = "Average"
      + tags_all                              = (known after apply)
      + threshold                             = 95
      + treat_missing_data                    = "notBreaching"
    }

Plan: 3 to add, 3 to change, 3 to destroy.

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: plan.tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "plan.tfplan"

Show Conftest results

WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.logs_exports"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.db-free-local-storage-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.db-free-local-storage-critical[1]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.db-free-local-storage-critical[2]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.db-free-local-storage-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.db-free-local-storage-warning[1]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.db-free-local-storage-warning[2]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.high-db-cpu-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.high-db-cpu-warning[1]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.high-db-cpu-warning[2]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.high-dbload-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.high-dbload-critical[1]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.high-dbload-critical[2]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.high-dbload-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.high-dbload-warning[1]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.high-dbload-warning[2]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.low-db-memory-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.low-db-memory-critical[1]"]
WARN - plan.json - main - Missing Common Tags:...

github-actions · 2025-02-26T16:22:03Z

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

github-actions · 2025-02-26T16:22:24Z

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

P0NDER0SA

sweet :)

jimleroyer

Thank you -- let's see how that goes.

ben851

LGTM

sastels added 2 commits February 26, 2025 10:37

change rds cpu critical to warning

58b4b77

evaluate over 5 minutes

f05144a

sastels marked this pull request as ready for review February 26, 2025 16:21

sastels requested a review from jimleroyer as a code owner February 26, 2025 16:21

sastels requested a review from a team February 26, 2025 16:22

sastels changed the title ~~change rds cpu critical to warning~~ change rds cpu critical to warning and increase evaluation period Feb 26, 2025

P0NDER0SA approved these changes Feb 26, 2025

View reviewed changes

jimleroyer approved these changes Feb 26, 2025

View reviewed changes

ben851 approved these changes Feb 26, 2025

View reviewed changes

sastels merged commit c62e23b into main Feb 26, 2025
32 checks passed

sastels deleted the rds-cpu-critical-to-warning branch February 26, 2025 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change rds cpu critical to warning and increase evaluation period #1783

change rds cpu critical to warning and increase evaluation period #1783

sastels commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025

github-actions bot commented Feb 26, 2025

github-actions bot commented Feb 26, 2025

P0NDER0SA left a comment

jimleroyer left a comment

ben851 left a comment

change rds cpu critical to warning and increase evaluation period #1783

change rds cpu critical to warning and increase evaluation period #1783

Conversation

sastels commented Feb 26, 2025 • edited Loading

Summary | Résumé

Test instructions | Instructions pour tester la modification

Release Instructions | Instructions pour le déploiement

Reviewer checklist | Liste de vérification du réviseur

github-actions bot commented Feb 26, 2025

staging: rds

github-actions bot commented Feb 26, 2025

github-actions bot commented Feb 26, 2025

P0NDER0SA left a comment

Choose a reason for hiding this comment

jimleroyer left a comment

Choose a reason for hiding this comment

ben851 left a comment

Choose a reason for hiding this comment

sastels commented Feb 26, 2025 •

edited

Loading