-
Notifications
You must be signed in to change notification settings - Fork 4
ContainerImageLambdas
The aim of this document is to explain the usage of Container Image-based Lambdas within the Mobile Notifications platform and to document issues and future improvements for this infrastructure.
The Container Image-based lambdas are ordinary AWS Lambdas, except that the code which runs the lambda, rather than being delivered as a Jar file or a Node package, is delivered as a Docker Image. This provides a great deal of flexibility for the environment in which the code runs. The build process for the lambda creates the Docker image and then uploads it to an "Elastic Container Repository" which can then be used to launch a lambda.
In our case the motivation for using container images was to circumvent the size limits associated with Jar-based lambdas. The jar-based deployment package is limited to 50MB (250MB unzipped), compared to the 10GB available for a container image. Using Scala based lambdas, and bringing in various dependencies makes it quite easy to hit against this limit and so switching over to container images provides a good solution to this problem.
The Docker images are built in an extra step that has been added to
the SBT-based build configuration using the SBT Native Packager’s
Docker support. Once the code is compiled, a Dockerfile will be
generated based on the configuration in the build file and the Scala
deliverables for the application are copied into this image. The image
is based on the official Java Lambda image provided by AWS. This
contains all of the components necessary to respond to the lambda’s
trigger events and to call the JVM handler function with the
appropriate data. Using this image means that we don’t need to change
anything in the implementation of the lambda function as this image
provides the lambda code with exactly the same execution environment
that it would normally have when running as a "regular" lambda. We
tell the lambda runtime which handler it should execute by passing it
as the argument in the CMD
section of the Docker file. By default,
when the image is built, it is configured to use a test handler to aid
in running and testing the image locally:
// build.sbt
ExecCmd( "CMD", "com.gu.notifications.worker.ContainerLambdaTest::handleRequest"),
This is configured on a per-lambda basis by modifing a property in the cloudformation when the lambda is defined. See below for more details.
Once this configuration is included in the build, we have the new command "docker:build" which can be executed from SBT and which will compile the application and build the Docker image, storing it in the local Docker repository. Note that this does mean that in order to build the image you need to have Docker running locally on the machine on which you are doing the build. This is handled by Teamcity for the production builds but for local testing see the Testing section for tips.
After creating the image, it needs to be published to an Elastic
Container Repository (ECR) before it can be use as the base for a
lambda. Within the AWS console you can see the container repository
called notificationworker-lambda-images
and within this repository
the images. The images are tagged with the Teamcity build number,
which allows a specific build to be deployed via RiffRaff.
In production, these builds are created by executing deploy.sh
during the teamcity build. This script is stored in the repository and
it authenticates with the AWS repository and then publishes the image
from the local docker repository to the AWS one. The build number is
populated in the environment by Teamcity and this is used by the
script to tag the image. The running of this script is configured as a
build step within Teamcity.
Once the image is deploy to a container repository it can be used as
the code image for a lambda within the Cloudformation. This is
achieved by setting the PackageType
to Image
, and then populating
the Code
object's ImageUri
field with a link to the corresponding
ECR image, followed by a colon, and then the tag to use to indicate a
specific instance of this image. In this case the Teamcity build
number is used to identify the tag (as mentioned above), and this is
obtained from a BuildId
parameter which is automatically populated
by RiffRaff during the deployment process.
WorkerLambda:
Type: AWS::Lambda::Function
Properties:
FunctionName: !Sub ${Stack}-${App}-ctr-${Stage}
PackageType: Image
Code:
ImageUri: !Join [':', [!ImportValue NotificationLambdaRepositoryUri, !Ref BuildId]]
The container repository itself is also created through cloudformation
and therefore also lives in a cloudformation stack. As a result, we
can
import
an exported value from another stack to automatically fetch the URL of
the repository (this serves to document the connection between the two
parts of the infrastructure, and also means we don't ever have to
update it manually in the various places that reference, if it ever
changes for some reason). The repository is defined in the ecr.yaml
file in the repo and the various attributes associated with the repo
are exported ("output"ed) at the end of this file.
The other main bit of configuration associated with a container lambda
that is defined in a cloudformation stack is the ImageConfig
property, which allows us to override some parts of the Image
configuration on a per-lambda basis. It is here that we customise the
handler function for each lambda.
This keeps the same behaviour as before: previously we had multiple lambdas which used the same code JAR file, but which were configured to use different handler functions to implement different functionality. Now, there are multiple lambdas within the notifications project which use the same underlying base image and again the difference is the handler function that is called when the lambda is executed.
ImageConfig:
Command: [!Ref FullyQualifiedHandler]
If you have docker running on your machine, then building the image
locally is as trivial as running docker:publishLocal
from SBT
prompt.
However, Docker is inherently a Linux tool, and we develop on Macs. A popular, and essentially effortless, solution to this is Docker Desktop, but it is not free for commercial purposes.
An alternative is to set-up docker within a virtual machine, for example by using "Virtual Box". This works fine — you would then run SBT and build the docker image on Linux. But this means setting up and maintain Linux.
A further alternative would be to use
Vagrant, which is a simple tool for
automating the management of disposable VMs. It can be easily
installed on a mac with brew install vagrant
and once installed, you
need to create a file called Vagrantfile
somewhere which will
describe how to configure a VM for running you project. As such it is
common to put these in the root of a project's github repo.
We did put together a branch which contains a very simple vagrant file (the one in the branch is full of useful comments as well!):
sbt_version = "1.6.2"
Vagrant.configure("2") do |config|
config.vm.provider "virtualbox" do |v|
v.memory = 1048 * 2
end
config.vm.box = "debian/bullseye64"
config.vm.hostname = "mobile-n10n.box"
config.vm.provision "shell", inline: "
apt-get update
apt-get -y install curl unzip docker.io openjdk-11-jdk-headless ntp
curl -sLO 'https://github.com/sbt/sbt/releases/download/v1.6.2/sbt-#{sbt_version}.zip'
unzip -o sbt-#{sbt_version}.zip
echo 'PATH=$HOME/sbt/bin:$PATH' >>~vagrant/.bashrc
usermod -a -G docker vagrant
"
end
This will build a very simple VM which contains docker and the bits you need to run SBT. Once it is placed in your repo, you can run the VM with something like:
Notes:
- I never managed to solve the "sbt thinks that server is already booting because of this exception:" error, but if you just answer yes the the "new server?" question, it will work.
- You need to have the "guest extensions" install within the VM to get
the synced folder stuff working easily (which means you can use the
same source directory in the VM and the host machine). The easist way
I found to do this is by using a plugin:
vagrant plugin install vagrant-vbguest
.
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'debian/bullseye64'...
[...]
$ vagrant ssh
$ # you are now running on the box with docker, yay!
$ cd /vagrant
$ sbt
[info] [launcher] getting org.scala-sbt sbt 1.6.1 (this may take some time)...
[info] [launcher] getting Scala 2.12.15 (for sbt)...
sbt thinks that server is already booting because of this exception:
sbt.internal.ServerAlreadyBootingException: java.io.IOException: org.scalasbt.ipcsocket.NativeErrorException: [13] Permission denied
[...]
Create a new server? y/n (default y)
y
[info] welcome to sbt 1.6.1 (Debian Java 11.0.14)
[...]
Once you have built the image, it is reasonably simple to run it from within Docker and then trigger your lambda. There is some good advice here:
One big advantage of the regular lambda style of coding is that we don't need to have any ownership of, or visibility of, the environment that runs our code. We don't need to which OS it runs on or which version of the OS or JVM. Instead, all of this is handled at the lower level, and we just deliver a JAR file that expects a certain defined input and provides a certain, defined, output. Everything else is provided by AWS automatically. A very important benefit of this is that every time the lambda runs it will automatically get the latest OS version, including security updates etc.
When using container-based lambdas this is not the case, because our OS image is built by us when the docker image is constructed, and every time that specific build of the docker image is re-used, the lambda will get the same environment.
The main tool that we have at our disposal to combat this issue is to:
- Ensure that we are using the latest version of underlying base image every time we build the image
- Frequently re-build the image so that it will be refreshed often
The first part is handled during the build process. By specifying the
symbolic tag public.ecr.aws/lambda/java:latest
as the image name, we
are going to get the last version of the image as defined at the time
the build is running.
As side note, the actual image that is being used here is described as:
AWS provided base images for Lambda contain all the required components to run your functions packaged as container images on AWS Lambda. These base images contain the Amazon Linux Base operating system, the runtime for a given language, dependencies and the Lambda Runtime Interface Client (RIC), which implements the Lambda Runtime API. The Lambda Runtime Interface Client allows your runtime to receive requests from and send requests to the Lambda service.
[...]
AWS will regularly provide security patches and other updates for these base images. These images are similar to the AWS Lambda execution environment on the cloud to allow customers to easily packaging functions to the container image. However, we may choose to optimize the container images by changing the components or dependencies included. When deployed to AWS Lambda these images will be run as-is.
As such it should represent roughly the same environment that a regular JVM lambda runs under, as long as we are using the latest image.
The second point should be addressed by scheduling a regular rebuild and redeploy of the project, to ensure that the base image is refreshed, even if no changes have been merged into the repo to otherwise trigger redeploy.
In order to facilitate this work we capture the checksum of the base
image that was used during the build (stored as
latestVersionOfLambdaSDK
in the build.sbt
) and burn it into the
meta data for the image using a
label:
lazy val lambdaDockerCommands = dockerCommands := Seq(
Cmd ( "FROM", latestVersionOfLambdaSDK),
Cmd ( "LABEL", s"sdkBaseVersion=${latestVersionOfLambdaSDK}"),
What is missing however, is a way to catch and alert on the situation where this falls behind. In other words, there is no equivalent to Aimiable which would flag up out of date images.
It seemed like this would be an easy thing to resolve. Essentially it is a simple problem. We know what version of the base image was used when the lambda image was created. Therefore all we have to do in theory is discover how old this base image version is by checking when it was published — alerting when it is too old — and then also finding out if a later version has been published — alerting when it is out of date. This information is easily available for humans in the AWS ECR gallery: (https://gallery.ecr.aws/lambda/java)
However, suprisingly this information does not seem to be avilable pragmatically for other people's (i.e in this case Amazon's) public repositories.
We have an outstanding ticket available for this.
While it would not be a problem to recreate all of this for another project, it would be much more fun to create an SBT plugin that would automate (and become the canoncial implementation of) all of the SBT configuration stuff.
There are alternatives to the sbt-native-packager
plugin that might make this more lightweight too. Not so relevant in this case because sbt-native-packager
was already in use, but more relavent if attempting to apply this to a thinner project.