Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize service module #474

Closed
wants to merge 3 commits into from
Closed

Conversation

rocketnova
Copy link
Contributor

@rocketnova rocketnova commented Nov 14, 2023

Ticket

N/A

Changes

What was added, updated, or removed in this PR.

  • Add ability to pass in container environment variables to the service module
  • Add ability to pass in container secrets from Parameter Store to the service module
  • Add ability VPC security group ingress rule allow inbound requests to VPC endpoints from the app (needed for access to Parameter Store)
  • Add ability to run non-custom docker images
  • Add ability to customize healthcheck
  • Add ability to disable read-only containers

Context for reviewers

Testing instructions, background context, more in-depth details of the implementation, and anything else you'd like to call out or ask reviewers.

On a number of my recent projects, I've needed the ability to run non-custom docker images (e.g. flowable. This means I've needed the service module to be a little bit more flexible.

Specifically, I've needed to be able to:

  • Pass in a different docker image (such as from docker hub)
  • Pass in environment variables I can use to configure the container
  • Pass in secrets that are stored in AWS Parameter Store to the container
  • Modify the healthcheck settings (for instance /health doesn't exist, but /healthcheck does or /homepage does)
  • Modify the container to not be read-only (some containers will fail if run in read-only mode)

Testing

Provide evidence that the code works as expected. Explain what was done for testing and the results of the test plan. Include screenshots, GIF demos, shell commands or output to help show the changes working as expected. ProTip: you can drag and drop or paste images into this textbox.

I've created a matching test PR in the platform-test repo that demonstrates how to test these changes: navapbc/platform-test#65

@lorenyu
Copy link
Contributor

lorenyu commented Nov 21, 2023

Nice. This PR looks meaty so it'll take me more time to review it. But some initial thoughts:

  • customizing environment variables has been on my wishlist / backlog for a bit so i'm excited to see what you did with that
  • customizing the healthcheck has also been a painpoint, we've had issues with some base images not having wget (so they would use curl or something else instead) so forcing apps to have wget seemed limiting. not sure if your solution addresses that but looking forward to seeing if it does
  • i'm kind of a networking newb but i was a little surprised to see that we needed the vpc endpoint rule. i thought that the subnet that the app service was in had access to the public internet so i thought it would be able to access all of aws services.

Copy link
Contributor

@lorenyu lorenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot of good ideas in here! Thanks for putting up this PR

# By default, Terraform creates a workspace named “default.” If a non-default workspace is not created this prefix will equal “default”,
# if you choose not to use workspaces set this value to "dev"
# By default, Terraform creates a workspace named “default.” If a non-default workspace is not created this prefix will equal “default”,
# if you choose not to use workspaces set this value to "dev"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you choose not to use workspaces set this value to "dev"

i think this line came from my PR but i don't know what i meant by that lol

Comment on lines +65 to +77
# Uncomment the following resource if you want to grant the service access to SSM parameters:
#data "aws_security_groups" "aws_services" {
# filter {
# name = "group-name"
# values = ["${module.project_config.aws_services_security_group_name_prefix}*"]
# }
#
# filter {
# name = "vpc-id"
# values = [data.aws_vpc.default.id]
# }
#}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over time, I've started to try to minimize the amount of "uncomment"/"comment" instructions in the template. I've noticed a few benefits to this:

  • It makes it easier to update the template from the project when the diffs between project files and template files are kept to as few files as possible
  • It makes it easier to develop and test the template itself, since updating the platform-test* repos benefit from the same thing as the first bullet

So for something like this, I'd move this into app-config. We could have a config something like module.app_config.use_parameter_store.

This is important for a second reason. The current implementation would break if the application doesn't have a database. If you look in the networks module you'll see that the list of AWS services that we need to create VPC endpoints for are based on the app config. So for example if there are two apps, and one of them has a database, we'll create "ssm", "kms", and "secretmanager" VPC endpoints, but of both apps don't have a database we won't create any VPC endpoints. So to make this work, we'll also want to condition the creation of the "ssm" endpoint on whether one of the apps has use_parameter_store set to true.

The pseudo-code would be something like:

aws_service_integrations = set()
for app_config in app_configs:
  if app_config.use_parameter_store:
    aws_service_integrations.add("ssm")
  if app_config.has_database:
    aws_service_integrations.add("ssm", "kms", "secretmanager")

Comment on lines +121 to +124
# Add custom container environment variables like so:
# container_env_vars = [
# { name : "CUSTOM_ENV_VAR", value : "100" },
# ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine some environment variables will need to have different values for each environment. In that case we can't put it in this file, otherwise all environments would be forced to use the same value. Rather than use .tfvars files, we'd want to define the environment variables in app-config, per this ADR. Then dev, staging, and prod can all have different values for the environment variables. And you can define defaults and validation rules for environment variables in env-config.

For example, you can do something like

# env-config/variables.tf

variable custom_env_var {
  description = "Some custom env var"
  type = number
  default = 100
  validation {
    condition = var.custom_env_var > 0
    error_message = "The custom_env_var must be a positive number"
  }
}

# env-config/outputs.tf
output env_vars {
  [
    {
      name = "CUSTOM_ENV_VAR"
      value = var.custom_env_var
    }
  ]
}

@@ -1,7 +1,8 @@
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
data "aws_ecr_repository" "app" {
name = var.image_repository_name
count = var.external_image_url == "" ? 1 : 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: stylistically it feels strange to base the existing of data.aws_ecr_repository on var.external_image_url. it seems more direct and robust to base it off of whether image_repository_name was passed in. in other words,

  1. make var.image_repository_name nullable (set default to null)

  2. update this line to be:

    Suggested change
    count = var.external_image_url == "" ? 1 : 0
    count = var.image_repository_name != null ? 1 : 0
  3. add a validation rule to the variables that requires either var.external_image_url or var.image_repository_name to be set

@@ -16,6 +16,12 @@ variable "image_repository_name" {
description = "The name of the container image repository"
}

variable "external_image_url" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i'd call this image_repository_url.

  1. i don't think we need "external" in the name since i don't think there's any requirement that it has to be external.
  2. i think we need to qualify that this is a "image repository url" not an "image url", since [image url] = [image repository url] + ":" + [image tag]

@@ -67,3 +73,65 @@ variable "db_vars" {
})
default = null
}

variable "container_env_vars" {
type = list(map(string))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i think we can make this type definition more explicit with something like

Suggested change
type = list(map(string))
type = list(object({ name = string, value = string }))

see https://developer.hashicorp.com/terraform/language/expressions/type-constraints

}

variable "container_secrets" {
type = list(map(string))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i think we can make this type definition more explicit with something like

Suggested change
type = list(map(string))
type = list(object({ name = string, value_from = string }))

note that I also think we should stick with snake case (value_from) instead of camel case (valueFrom) since I think that's more common within terraform code)

see https://developer.hashicorp.com/terraform/language/expressions/type-constraints

Comment on lines +70 to +78
resource "aws_vpc_security_group_ingress_rule" "vpc_endpoints_ingress_from_app" {
security_group_id = var.aws_services_security_group_id
description = "Allow inbound requests to VPC endpoints from application ${var.service_name}"

from_port = 443
to_port = 443
ip_protocol = "tcp"
referenced_security_group_id = aws_security_group.app.id
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: for my own understanding, what happens if we don't have this rule? i had thought that we allowed all outbound traffic from the app subnet and security group. like the block right before this is

resource "aws_security_group" "app" {
  ...
  egress {
    description = "Allow all outgoing traffic from application"
    protocol    = "-1"
    from_port   = 0
    to_port     = 0
    cidr_blocks = ["0.0.0.0/0"]
  }
}

does that not already allow the app to access the AWS services without needing to go through VPC endpoints?

Comment on lines +89 to +92
variable "aws_services_security_group_id" {
type = string
description = "Security group ID for VPC endpoints that access AWS Services"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: we could potentially add a validation rule that says that this is required if container_secrets is non-empty

Comment on lines +121 to +125
variable "healthcheck_matcher" {
type = string
description = "The response codes that indicate healthy to the ALB"
default = "200-299"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what are some other possible matchers we'd use? have you seen other examples in your experience with other apps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember going through a fair bit of this healthcheck customization for flowable (https://github.com/navapbc/pfml-starter-kit-app/commit/ea2f9064a6884f94434e03a4ff2ee63f675511a5), though I can't remember why we had that set to 200-401 there. I think probably because we couldn't disable the security on the endpoint, but at least it's not a 500 (or 404) error so something is working.

@rocketnova
Copy link
Contributor Author

Update: @lorenyu Thank you for the feedback on this PR! I haven't had time to address it yet, but have it on my radar.

@lorenyu
Copy link
Contributor

lorenyu commented Oct 30, 2024

This PR is probably too old to be able to merge easily as is, but I created a follow up ticket to review this PR for valuable things that we can pull into future PRs #772

@lorenyu lorenyu closed this Oct 30, 2024
@lorenyu lorenyu deleted the rocket/container-customizations branch October 30, 2024 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants