Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update RDS SSL/TLS certs #3643

Open
JN-Hernandez opened this issue Aug 15, 2024 · 3 comments
Open

Update RDS SSL/TLS certs #3643

JN-Hernandez opened this issue Aug 15, 2024 · 3 comments
Assignees

Comments

@JN-Hernandez
Copy link

JN-Hernandez commented Aug 15, 2024

Overview

An email notification was received regarding expiring RDS certs:

You are receiving this message because your AWS Account has one or more Amazon RDS, or Amazon Aurora database instances in the US-EAST-1 Region using a SSL/TLS Certificate that is expiring on August 22, 2024.

As such, we will need to update the RDS SSL/TLS certs prior to Aug 22, 2024.

Is your feature request related to a problem? Please describe.

Preliminary investigation shows that both the stg and prd environments are using certs (rds-ca-2019) that will expire at the end of the month:

mmw-stg

$ aws rds describe-db-instances --region us-east-1 | grep DBInstanceIdentifier
            "DBInstanceIdentifier": "dd1gc3iuv75ep7t",
            "ReadReplicaDBInstanceIdentifiers": [],

$ aws rds describe-db-instances --db-instance-identifier dd1gc3iuv75ep7t | grep CACertificateIdentifier
            "CACertificateIdentifier": "rds-ca-2019",

mmw-prd
Screenshot 2024-08-15 at 11 59 15

While CLI is possible for the staging environment, it appears as though CLI access to the production environment is limited:

aws rds describe-db-instances --region us-east-1 | grep DBInstanceIdentifier

An error occurred (InvalidClientTokenId) when calling the DescribeDBInstances operation: The security token included in the request is invalid.

Review through the AWS Console shows that we do not have access to view/manipulate IAM:
Screenshot 2024-08-15 at 12 09 21

RDS SSL certificate rotation for production will need to be completed via the AWS Console. We will also need to be prepared in case we do not have access to modify production to convey work instructions to the client for them to perform the work.

Describe the solution you'd like

We need to update the cert to rds-ca-rsa2048-g1, which will not expire for 40 years.

Additional Context

  • This change will incur a downtime, per testing efforts on a separate Element 84 project.
@JN-Hernandez
Copy link
Author

JN-Hernandez commented Aug 15, 2024

WorkPlan

Summary

An email notification was received regarding expiring RDS certs:

You are receiving this message because your AWS Account has one or more Amazon RDS, or Amazon Aurora database instances in the US-EAST-1 Region using a SSL/TLS Certificate that is expiring on August 22, 2024.

As such, we will need to update the RDS SSL/TLS certs prior to Aug 22, 2024.

Preliminary Investigation

Command to determine applications actively connected using SSL:

SELECT datname, usename, ssl, client_addr 
  FROM pg_stat_ssl INNER JOIN pg_stat_activity ON pg_stat_ssl.pid = pg_stat_activity.pid
  WHERE ssl is true and usename<>'rdsadmin';
  • 4. Check size of the database to help determine size of impact
\l+

Steps

Pre-Implementation

  • 1. Notify client concerning downtime

  • 2. Add the staging and production DB user and password information to 1Password

    • Staging: MMW - Staging DB User
    • PROD: MMW - PROD DB User
  • 3. Update external IP address via CloudFormation

    • Old: 50.243.53.17/32
    • New: 50.174.247.186/32
  • Log into the appropriate AWS account

  • Traverse to CloudFormation > Stacks

  • Select the DataPlane stack

  • Click Update Stack

  • Keep Use existing template checked and click Next

  • Update OfficeCidr to 50.174.247.186/32, then click Next

  • Click Next again

  • Review changes, then click Submit

Implementation

  • 1. Review certificate to confirm it points to the old certificate rds-ca-2019
    • Staging DB Identifier: dd1gc3iuv75ep7t
    • PROD DB Identifier: ddaa37lyfbdqr3
aws rds describe-db-instances --db-instance-identifier <db_identifier> | grep CACertificateIdentifier
  • 2. Modify the DB instance to the new cert rds-ca-rsa2048-g1
aws rds modify-db-instance \
          --db-instance-identifier <db_identifier> \
          --ca-certificate-identifier rds-ca-rsa2048-g1 \
          --apply-immediately
  • 3. Review the AWS RDS Database database status - confirm returns to Available
aws rds describe-db-instances --db-instance-identifier <db_identifier> | grep DBInstanceStatus

Post-Implementation

  • 1. Set the default RDS certificate to rds-ca-rsa2048-g1 - this ensures new database creations use the correct certificate when deployed
aws rds modify-certificates \
          --certificate-identifier rds-ca-rsa2048-g1 \
          --region us-east-1

Criteria for Success

  • 1. Confirm able to connect to the database post implementation

Connect to the bastion host

  • Staging 1Password Entry: Stroud_ModelMyWatershed: mmw-stg.pem (AWS)
    • Staging Bastion IP: 52.200.175.177
    • Staging Bastion Hostname: mmw-stg
  • PROD 1Password Entry: Stroud_ModelMyWatershed: mmw-prd.pem (AWS)
    • PROD Bastion IP: 3.94.62.233
    • PROD Bastion Hostname: mmw-prd
# Connect to the VPN
# Download the appropriate PEM file from 1Password
chmod -400 ~/Downloads/<pem_file>
mv ~/Downloads/<pem_file> ~/.ssh
# Update the ~/.ssh/config file to include the entries written in the 1Password notes
ssh <bastion_hostname>

Connect to the database from the bastion host

# List TMUX sessions and see if there is a PSQL session
tmux ls
# Connect to the existing PSQL session
tmux a -t psql
  • 2. Confirm the application works as expected post-implementation
    • Traverse to the site
    • Perform hard refreshes
    • Check the database connections and confirm SSL shows t (True)

Note: The command may fail if the PSQL session was idle too long. Simply re-run it if need be.

SELECT datname, usename, ssl, client_addr 
  FROM pg_stat_ssl INNER JOIN pg_stat_activity ON pg_stat_ssl.pid = pg_stat_activity.pid
  WHERE usename<>'rdsadmin';
  • 3. Detach from the PSQL TMUX session
  • Hit Ctrl+B
  • Hit D

Risk

High Risk - this change will require database downtime and any connections that use SSL will need to be updated.

  1. Any connections to the database that use SSL and are not updated will no longer be able to connect.
  2. Failure to implement this change will result in certificate expiration on Aug 22, 2024, resulting in SSL connection failures.

Rollback

  • 1. Perform Implementation steps, updating the certificate back to rds-ca-2019
aws rds modify-db-instance \
          --db-instance-identifier <db_identifier> \
          --ca-certificate-identifier rds-ca-2019 \
          --apply-immediately

Additional Details

@JN-Hernandez
Copy link
Author

Paired with @rajadain to complete this work for staging while PROD was a solo work effort. During implementation for staging, we added me to AWS SSO for the Stroud account. Post-implementation testing after the PROD deployment shows that the TilerServers are making connections without using SSL.

Added Hernández to AWS SSO for Stroud account

As I only had the MMW login credentials from 1Password, Terence added me to be able to access via SSO.

PROD TilerServers not connecting via SSL

After implementation in PROD, found that the TileServers are not connecting via SSL:

modelmywatershed=> SELECT datname, usename, ssl, client_addr
  FROM pg_stat_ssl INNER JOIN pg_stat_activity ON pg_stat_ssl.pid = pg_stat_activity.pid
  WHERE usename<>'rdsadmin';
     datname      |     usename      | ssl | client_addr
------------------+------------------+-----+-------------
 modelmywatershed | modelmywatershed | f   | 10.0.1.11
 modelmywatershed | modelmywatershed | f   | 10.0.3.222
 modelmywatershed | modelmywatershed | f   | 10.0.3.185
 modelmywatershed | modelmywatershed | f   | 10.0.1.73
 modelmywatershed | modelmywatershed | t   | 10.0.0.36
(5 rows)

Notified @rajadain about findings.

@rajadain
Copy link
Member

It's quite likely that the PROD tile servers will convert to SSL once we deploy all the new work from staging, since the staging tile servers do use SSL. I tested the site otherwise and it is working well. I think we can call this work done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants