Skip to main content

Serverless at Scale: Design Patterns and Optimizations


December 9, 2019

In his session, Roberto Iturralde, AWS Solutions Architect, explained what the pitfalls of a system developed using a serverless architecture are in terms of scalability issues.

He based his presentation on a typical synchronous architecture (API Gateway, Lambda, RDS, Secrets Manager and CloudWatch metrics), that most projects start with. In this architecture, every system is called synchronously even any dependencies Lambdas may have, which adds up to the overall latency of the system. This gets even worse due to any possible failures/retries due to throttling of AWS or other third party services (Secrets Manager GetValue or CloudWatch PutMetrics API limits in this case).

In his example, he pointed out several things to have in mind to be able to overcome AWS and database connection limits. These can be categorised as changes in the configuration or in the architecture:

Configuration Changes

These changes are specific to solve the limits problem on AWS services and on any other relying services like a database:
Increase the number of concurrent Lambda executions per minute. By default, there is a limit of 1000 executions per minute which is a soft limit that can be increased.
Extract database connections and secrets outside of the Lambda handler to reuse them between executions to avoid hitting any limits too quickly.
Make use of an in-memory cache to further reduce calls to Secrets Manager or any other service whose data that can be cached.
Make use of the CloudWatch embedded metric format (EMF) to avoid having to maintain separate code that generates the metrics logs and that can batch calls to save costs from calling the CloudWatch API. For more information

Architecture Changes

These are mostly related to modifying the architecture from a synchronous to an asynchronous one. The benefits are related to the decoupling of the system services, making them more resilient to any downstream dependency issue that adds up to the overall latency, which in turn impacts the user experience.
Transform synchronous calls to asynchronous ones using a SQS queue. This buffers the processing of requests and reduces the number of database connections required by controlling the number of concurrent executions.
Further reduce the impact on the database by batching messages (10 max.) consumed from SQS queues by Lambda functions. The downside of this approach is the need to handle partial batch failures.
Compress the data sent to SQS queues to reduce the payload of messages. Unfortunately, this is not done automatically by AWS so it requires some extra work from the developer.
Transform any orchestration done at the Lambda level into Step Functions, saving costs from the extra processing time wasted waiting to get results from the services being called. The downside is that it can impact the number of concurrent executions as more Lambdas are executed in parallel.
Some services can be directly integrated between them, like API Gateway and DynamoDB, removing the need to introduce an extra step in the form of a Lambda.
Remove database connections by leveraging AWS services like Aurora or DynamoDB that provide a REST API.
Return a pre-signed URL to access content in S3 instead of sending the whole payload back in the response, further reducing latency and concurrent Lambda executions.

As an extra point, he described some other architecture changes that could help scaling a serverless application but that required some changes to the exposed API or extra work from the client.

  • Polling: in this scenario the response is returned immediately including an ID that can then be used to poll the processing status and get the results back. The downside is the need to create the extra endpoints for the client to use to get the results back.
  • Webhooks: similar to the first scenario but adding a means to inform the client automatically when the result is ready in the form of either a SNS subscription (the client must be trusted) or a callback URL provided by the client.
  • Websockets: using API-Gateway, after a request is received from the client, a Websocket connection can be open to inform the client of the result of the processing once it is done. There is a blog post describing the whole process in the following link:


A serverless architecture provides many benefits, both to end users and developers alike. Not having to deal with the configuration, patching and scaling of servers under demand speeds up the development process, eliminates the underprovisioning of resources with the impact that that brings to end users, saves costs by not needing to overprovision resources to avoid the aforementioned problem and reduces the need to keep an ops team responsible for the configuration of such resources.
But this comes with its own challenges. Some designs and solutions that were taken for granted in a “serverfull” architecture don’t work well in an ephemeral environment like the Lambda one, where the underlying platform is controlled by the cloud provider and servers come and go depending on demand.
Even in this case, once passed the initial step of getting to know how both the serverless world and the platform from the cloud provider works, the transition to this new way of developing systems is totally worth it.

Post by Juan Manuel Carnicero Vega
December 9, 2019