Building software for the cloud is daunting for a variety of reasons, the particular difficulties extending beyond choosing the fundamental approach and the underlying services; providing a decent local developer experience and provisioning infrastructure in a deterministic, repeatable way provide considerable challenges of their own.
As part of my recent investigation into MongoDB Atlas, a managed cloud service for running and scaling the eponymous NoSQL database, I built a REST API for storing and retrieving events sent from IoT devices. The goal of my spike was to put together a microservice that:
- automatically scales when periodically faced with unforeseen spikes in traffic
- can be run locally with no dependencies upon cloud infrastructure
- uses a single infrastructure as code tool to provision the required cloud resources across both AWS and MongoDB Atlas
- adheres to security best practices when establishing connections between AWS and Atlas
This post won’t serve as a step-by-step tutorial for attaining said goals, but will instead provide a high-level overview of my project, allowing one to dig deeper within the repository and to even deploy it to one’s own AWS and Atlas accounts.
The REST API provides a single
/events endpoint that supports two operations.
IoT devices, such as fridges and smartwatches, can make a HTTP POST request to the endpoint to report arbitrary events and accompanying values, such as the current internal temperature or step counts, identifying themselves with a unique device ID:
It’s also possible to retrieve events for a distinct device ID with a GET request; this is primarily for frontend applications such as monitoring dashboards, as well as for troubleshooting and general auditing:
In spite of AWS Lambda and MongoDB Atlas being central to this solution, I used additional technologies to achieve the aforementioned goals and to generally deliver a good developer experience:
- Docker, to containerise our lambda functions for deployment to AWS and to run them locally, providing a consistent execution environment both on your machine and in the cloud. Our functions are written in Node.js
- Docker Compose, to run our lambdas alongside local MongoDB and NGINX containers — the latter of which is used to emulate Amazon API Gateway — within a single virtual network for local development
- Terraform, to automate our entire infrastructure across both AWS and MongoDB Atlas using a common configuration language
- A plethora of AWS services, which I’ll cover in the subsequent section
It may seem counter-intuitive to provide an overview of the architecture before having even elaborated on the local developer experience, but a high-level understanding of how the system sits together will better justify the decisions I took when designing said local DX.
The entry point is an API Gateway REST API, allowing users to indirectly interact with private resources via a HTTP abstraction; this gateway integrates with our handlers to retrieve and add events across the HTTP GET and POST verbs respectively. The images for the lambdas are pushed to Elastic Container Registry, which are deployed to their corresponding functions when they’re first provisioned. As well as being able to invoke said functions locally using the same runtime and environment that will inherently be used in the cloud, containers are a more efficient transfer mechanism than traditional zip files; given Docker image layers are cached, one only needs to build and push the changes made to the images, rather than uploading everything each time.
To communicate with the Atlas cluster, the solution uses a VPC endpoint and accompanying PrivateLink endpoint service. As I’ll cover later, we can configure our Atlas entities to reside in an Atlas-managed VPC within the same region as our own, meaning that our lambdas can communicate with our database without any traffic leaving our network, let alone round-tripping via the internet (how cool is that?!); aside from the clear performance gains, this is a huge security win. Both the endpoint and our lambdas are attached to common subnets within our VPC, as well as having respective security groups, to allow communication between these private resources.
As show in the architecture diagram, the solution comprises two lambdas: one for adding events to the database, and another for retrieving them. Both of these functions share a common Dockerfile, allowing the particular handler contents to be specified as a build argument:
As well as copying the handler module, the resultant image also includes the
common directory, which stores code that’s shared between the two lambdas. In essence, the lambdas interact with a MongoDB database and return some data based upon that operation, as can be seen in the
Local Developer Experience
A consequence of using the official AWS Lambda container image for our functions is that we can test them locally by making a HTTP POST request to the supplied
/2015–03–31/functions/function/invocations integration endpoint. Calling this path directly is fine for the sake of sanity testing, but proves unwieldy when running multiple containerised lambdas, each typically being exposed on separate ports. It’s also worth noting that this invocation URL understandably responds with the data returned from the lambda verbatim; in our case, we’re returning an Amazon API Gateway integration response, so a HTTP 200 will always be surfaced despite the
statusCode property potentially being a non 2xx code.
An additional pain point is that our functions will need a shared MongoDB instance, since we would want to verify that, end-to-end, we can write to and read from the database when calling the
get-event handlers respectively.
We can create the ultimate developer experience with Docker Compose, defining services for our handlers alongside a database service based upon the official MongoDB image and an API gateway service using the official NGINX image, all residing within the same network:
api-gateway service is our entry point into the entire application, so we expose it to the host system on port 8080; the other services are not called directly, and therefore remain internal.
Since we’re acting on a single path (
/events) with two separate HTTP methods (GET and POST), we provide a small NGINX config that is mounted into the service’s filesystem when it’s built:
Upon receiving a request to
JSON global, which proves invaluable for parsing the Amazon API Gateway integration response and surfacing the body and metadata, such as headers and status codes, through the appropriate HTTP response elements:
With a single
docker-compose up, we can run our entire microservice locally, sending requests to our local API gateway that are compatible with the deployed application:
Note that our local setup is missing a couple of features handled by the production app via various managed services, such as request validation; we could roll our own, but for the sake of local development I don’t deem it necessary.
Deploying to the Cloud
While we could manually provision our AWS and MongoDB Atlas entities, reproducing our infrastructure across multiple AWS accounts and Atlas organisations — a requirement commonly emerging from the need for pre-production environments—would be time-consuming and error-prone. We should instead declare our infrastructure as code, rendering it reproducible from a single source of truth.
AWS provides their own infrastructure as code tool, CloudFormation, but naturally it only supports AWS resources; because our database infrastructure resides in Atlas, we need a solution that can automate multiple cloud platforms. With Terraform, we can integrate with multiple providers with a common configuration language. Our main module, in addition to provisioning an IAM role for our lambdas, integrates with four other custom modules:
Firstly, our VPC module (
./tf-modules/vpc) creates a dedicated virtual private cloud for our microservice that defines private subnets and security groups to which we’ll attach our AWS resources:
As all of our subnets are private, none of our resources can communicate with the wider internet, nor can they be contacted by entities outside of our VPC. However, our lambdas will need to connect to our VPC-backed Atlas cluster, thus we create individual security groups to permit this connectivity within the boundaries of our overall network.
The next module we bring into our main module defines our Atlas resources (
This module creates the requisite Atlas resources with the official
mongodbatlas provider. By specifying
"AWS" as our cluster’s
provider_name, Atlas will provision our MongoDB replica set on a managed VPC in the region specified for the
provider_region_name argument, allowing us to configure the VPC endpoint and corresponding endpoint service; these are needed to allow our VPC to unilaterally connect to the Atlas VPC and query the cluster without routing through the wider internet. I should highlight that we use the
random provider to generate a high-entropy password for the database user, which is returned as a module output and referenced by the lambda module.
What a segue. The ECR Lambda module (
./tf-modules/ecr-lambda) declares an Amazon ECR repository for the function’s image, which is subsequently deployed to AWS Lambda:
We use the
local-exec provisioner to build and push the Docker image for the given function when the repository has been created; these commands will only be invoked when it is first provisioned, so subsequent changes to the image must be built and pushed to ECR manually. Notice that we also create an
aws_lambda_permission resource so that our lambda can be invoked by our Amazon API Gateway REST API.
I really am too good at these segues. Our final custom module defines an API Gateway REST API to which HTTP requests will be sent to interact with our microservice (
Collectively, the resources declared in this module are analogous to our NGINX service and configuration within our local development setup, albeit with request validation and an API key that can be provided to Terraform at deploy time.
We’re almost ready to deploy our microservice to the cloud, but as well as exporting the
MONGODB_ATLAS_PRIVATE_KEY environment variables, we need to provide a few values for our root module’s input variables (these can be specified when running
terraform apply using the
-var option, but I’ve personally been using a root
The final step before applying our configuration is to authenticate Docker against ECR:
We can now run
terraform apply, upon which we’ll be presented with a plan of the resources Terraform will create, which will be committed to AWS and Atlas upon responding with
yes; be aware that the Atlas cluster can take up to 10 minutes to be provisioned.
Once complete, we should see the various primitives generated by Terraform in our AWS account:
We should also observe in the Network Access section of our Atlas project that the private endpoint is available and the corresponding endpoint service is ready for requests:
We can then send requests to our production API gateway in the same vein as our local NGINX gateway:
We can even verify our request validation by removing query parameters and body properties, or by providing a value whose data type is not supported by a given property:
Serverless technologies have vastly reduced the cost and maintenance overhead of building software, as well as eliminating the need to manage and scale on-premises server farms. Conversely, given their dependence upon the cloud, it can be tricky to run them locally; in this regard, the official container images are a game changer, since they provide a consistent execution environment that can be integrated into Docker Compose configurations. That said, the boilerplate required to achieve a good developer experience and to subsequently integrate new functions into this setup is somewhat cumbersome.
Beyond this, AWS Lambda isn’t always the best use case for certain applications; services that are subjected to constant yet steady load might benefit from using a run-of-the-mill application server to eliminate cold starts. For microservices that face sporadic, unpredicatble load, on the other hand, a serverless approach could be sufficiently performant and exponentially more cost-effective.