ContainerImageLambdas

Introduction

The aim of this document is to explain the usage of Container Image-based Lambdas within the Mobile Notifications platform and to document issues and future improvements for this infrastructure.

The Container Image-based lambdas are ordinary AWS Lambdas, except that the code which runs the lambda, rather than being delivered as a Jar file or a Node package, is delivered as a Docker Image. This provides a great deal of flexibility for the environment in which the code runs. The build process for the lambda creates the Docker image and then uploads it to an "Elastic Container Repository" which can then be used to launch a lambda.

Motivation

In our case the motivation for using container images was to circumvent the size limits associated with Jar-based lambdas. The jar-based deployment package is limited to 50MB (250MB unzipped), compared to the 10GB available for a container image. Using Scala based lambdas, and bringing in various dependencies makes it quite easy to hit against this limit and so switching over to container images provides a good solution to this problem.

Infrastructure

Building

The Docker images are built in an extra step that has been added to the SBT-based build configuration using the SBT Native Packager’s Docker support. Once the code is compiled, a Dockerfile will be generated based on the configuration in the build file and the Scala deliverables for the application are copied into this image. The image is based on the official Java Lambda image provided by AWS. This contains all of the components necessary to respond to the lambda’s trigger events and to call the JVM handler function with the appropriate data. Using this image means that we don’t need to change anything in the implementation of the lambda function as this image provides the lambda code with exactly the same execution environment that it would normally have when running as a "regular" lambda. We tell the lambda runtime which handler it should execute by passing it as the argument in the CMD section of the Docker file. By default, when the image is built, it is configured to use a test handler to aid in running and testing the image locally:

// build.sbt	
ExecCmd( "CMD",    "com.gu.notifications.worker.ContainerLambdaTest::handleRequest"),

This is configured on a per-lambda basis by modifing a property in the cloudformation when the lambda is defined. See below for more details.

Once this configuration is included in the build, we have the new command "docker:build" which can be executed from SBT and which will compile the application and build the Docker image, storing it in the local Docker repository. Note that this does mean that in order to build the image you need to have Docker running locally on the machine on which you are doing the build. This is handled by Teamcity for the production builds but for local testing see the Testing section for tips.

After creating the image, it needs to be published to an Elastic Container Repository (ECR) before it can be use as the base for a lambda. Within the AWS console you can see the container repository called notificationworker-lambda-images and within this repository the images. The images are tagged with the Teamcity build number, which allows a specific build to be deployed via RiffRaff.

In production, these builds are created by executing deploy.sh during the teamcity build. This script is stored in the repository and it authenticates with the AWS repository and then publishes the image from the local docker repository to the AWS one. The build number is populated in the environment by Teamcity and this is used by the script to tag the image. The running of this script is configured as a build step within Teamcity.

Cloudformation

Once the image is deploy to a container repository it can be used as the code image for a lambda within the Cloudformation. This is achieved by setting the PackageType to Image, and then populating the Code object's ImageUri field with a link to the corresponding ECR image, followed by a colon, and then the tag to use to indicate a specific instance of this image. In this case the Teamcity build number is used to identify the tag (as mentioned above), and this is obtained from a BuildId parameter which is automatically populated by RiffRaff during the deployment process.

  WorkerLambda:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: !Sub ${Stack}-${App}-ctr-${Stage}
      PackageType: Image
      Code:
        ImageUri: !Join [':', [!ImportValue NotificationLambdaRepositoryUri, !Ref BuildId]]

The container repository itself is also created through cloudformation and therefore also lives in a cloudformation stack. As a result, we can import an exported value from another stack to automatically fetch the URL of the repository (this serves to document the connection between the two parts of the infrastructure, and also means we don't ever have to update it manually in the various places that reference, if it ever changes for some reason). The repository is defined in the ecr.yaml file in the repo and the various attributes associated with the repo are exported ("output"ed) at the end of this file.

The other main bit of configuration associated with a container lambda that is defined in a cloudformation stack is the ImageConfig property, which allows us to override some parts of the Image configuration on a per-lambda basis. It is here that we customise the handler function for each lambda.

This keeps the same behaviour as before: previously we had multiple lambdas which used the same code JAR file, but which were configured to use different handler functions to implement different functionality. Now, there are multiple lambdas within the notifications project which use the same underlying base image and again the difference is the handler function that is called when the lambda is executed.

      ImageConfig:
        Command: [!Ref FullyQualifiedHandler]

Local Testing

Building the image

If you have docker running on your machine, then building the image locally is as trivial as running docker:publishLocal from SBT prompt.

However, Docker is inherently a Linux tool, and we develop on Macs. A popular, and essentially effortless, solution to this is Docker Desktop, but it is not free for commercial purposes.

An alternative is to set-up docker within a virtual machine, for example by using "Virtual Box". This works fine — you would then run SBT and build the docker image on Linux. But this means setting up and maintain Linux.

A further alternative would be to use Vagrant, which is a simple tool for automating the management of disposable VMs. It can be easily installed on a mac with brew install vagrant and once installed, you need to create a file called Vagrantfile somewhere which will describe how to configure a VM for running you project. As such it is common to put these in the root of a project's github repo.

We did put together a branch which contains a very simple vagrant file (the one in the branch is full of useful comments as well!):

sbt_version = "1.6.2"

Vagrant.configure("2") do |config|
  config.vm.provider "virtualbox" do |v|
    v.memory = 1048 * 2
  end

  config.vm.box = "debian/bullseye64"
  config.vm.hostname = "mobile-n10n.box"

  config.vm.provision "shell", inline: "
    apt-get update
    apt-get -y install curl unzip docker.io openjdk-11-jdk-headless ntp
    curl -sLO 'https://github.com/sbt/sbt/releases/download/v1.6.2/sbt-#{sbt_version}.zip'
    unzip -o sbt-#{sbt_version}.zip
    echo 'PATH=$HOME/sbt/bin:$PATH' >>~vagrant/.bashrc
    usermod -a -G docker vagrant
  "
end

This will build a very simple VM which contains docker and the bits you need to run SBT. Once it is placed in your repo, you can run the VM with something like:

Notes:

I never managed to solve the "sbt thinks that server is already booting because of this exception:" error, but if you just answer yes the the "new server?" question, it will work.
You need to have the "guest extensions" install within the VM to get the synced folder stuff working easily (which means you can use the same source directory in the VM and the host machine). The easist way I found to do this is by using a plugin: vagrant plugin install vagrant-vbguest.

$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'debian/bullseye64'...
[...]
$ vagrant ssh
$ # you are now running on the box with docker, yay!
$ cd /vagrant
$ sbt
[info] [launcher] getting org.scala-sbt sbt 1.6.1  (this may take some time)...
[info] [launcher] getting Scala 2.12.15 (for sbt)...
sbt thinks that server is already booting because of this exception:
sbt.internal.ServerAlreadyBootingException: java.io.IOException: org.scalasbt.ipcsocket.NativeErrorException: [13] Permission denied
[...]
Create a new server? y/n (default y)
y
[info] welcome to sbt 1.6.1 (Debian Java 11.0.14)
[...]

Once you have built the image, it is reasonably simple to run it from within Docker and then trigger your lambda. There is some good advice here:

https://docs.aws.amazon.com/lambda/latest/dg/images-test.html

Issues and Future Work

Using an up-to-date base image

One big advantage of the regular lambda style of coding is that we don't need to have any ownership of, or visibility of, the environment that runs our code. We don't need to which OS it runs on or which version of the OS or JVM. Instead, all of this is handled at the lower level, and we just deliver a JAR file that expects a certain defined input and provides a certain, defined, output. Everything else is provided by AWS automatically. A very important benefit of this is that every time the lambda runs it will automatically get the latest OS version, including security updates etc.

When using container-based lambdas this is not the case, because our OS image is built by us when the docker image is constructed, and every time that specific build of the docker image is re-used, the lambda will get the same environment.

The main tool that we have at our disposal to combat this issue is to:

Ensure that we are using the latest version of underlying base image every time we build the image
Frequently re-build the image so that it will be refreshed often

The first part is handled during the build process. By specifying the symbolic tag public.ecr.aws/lambda/java:latest as the image name, we are going to get the last version of the image as defined at the time the build is running.

As side note, the actual image that is being used here is described as:

AWS provided base images for Lambda contain all the required components to run your functions packaged as container images on AWS Lambda. These base images contain the Amazon Linux Base operating system, the runtime for a given language, dependencies and the Lambda Runtime Interface Client (RIC), which implements the Lambda Runtime API. The Lambda Runtime Interface Client allows your runtime to receive requests from and send requests to the Lambda service.

[...]

AWS will regularly provide security patches and other updates for these base images. These images are similar to the AWS Lambda execution environment on the cloud to allow customers to easily packaging functions to the container image. However, we may choose to optimize the container images by changing the components or dependencies included. When deployed to AWS Lambda these images will be run as-is.

As such it should represent roughly the same environment that a regular JVM lambda runs under, as long as we are using the latest image.

The second point should be addressed by scheduling a regular rebuild and redeploy of the project, to ensure that the base image is refreshed, even if no changes have been merged into the repo to otherwise trigger redeploy.

In order to facilitate this work we capture the checksum of the base image that was used during the build (stored as latestVersionOfLambdaSDK in the build.sbt) and burn it into the meta data for the image using a label:

lazy val lambdaDockerCommands = dockerCommands := Seq(
  Cmd    ( "FROM",   latestVersionOfLambdaSDK),
  Cmd    ( "LABEL",  s"sdkBaseVersion=${latestVersionOfLambdaSDK}"),

ISSUE: Flagging out of date container images

What is missing however, is a way to catch and alert on the situation where this falls behind. In other words, there is no equivalent to Aimiable which would flag up out of date images.

It seemed like this would be an easy thing to resolve. Essentially it is a simple problem. We know what version of the base image was used when the lambda image was created. Therefore all we have to do in theory is discover how old this base image version is by checking when it was published — alerting when it is too old — and then also finding out if a later version has been published — alerting when it is out of date. This information is easily available for humans in the AWS ECR gallery: (https://gallery.ecr.aws/lambda/java)

However, suprisingly this information does not seem to be avilable pragmatically for other people's (i.e in this case Amazon's) public repositories.

We have an outstanding ticket available for this.

Using this on other projects

While it would not be a problem to recreate all of this for another project, it would be much more fun to create an SBT plugin that would automate (and become the canoncial implementation of) all of the SBT configuration stuff.

There are alternatives to the sbt-native-packager plugin that might make this more lightweight too. Not so relevant in this case because sbt-native-packager was already in use, but more relavent if attempting to apply this to a thinner project.

https://github.com/Dwolla/sbt-docker-containers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly