Scalability and Concurrency

Understand how Lambda scales and deals with concurrency

Dashbird is a monitoring platform for monitoring modern cloud infrastructure by providing alerts, insights and data visualisation.

Start a Free Trial Learn more

Overview

A Lambda function’s concurrency level is the number of invocations being served simultaneously at any given point in time. Lambda doesn’t limit the number of “requests per second/minute”, for example, as is common in API services. Developers can run as many requests per period of time as needed, providing that it doesn’t violates concurrency limits.

What is Concurrency

As stated below, concurrency is the total number of simultaneous requests in a given time. Below is a visual representation of this concept, to make it easier to understand.

Concurrency Visual Representation

Key takeaways from the diagram above:

  1. All requests lasted a few milliseconds, having started and finished within one second
  2. At time Point 1, the concurrency is four requests
  3. At time Point 2, concurrency dropped to only two requests
  4. Despite handling five requests in total, the maximum concurrency was four over this period of one second

Concurrency Limits and Scalability

Lambda concurrency limits will depend on the Region where the function is deployed. It will vary from 500 to 3,000.

New functions are limited to this default concurrency threshold set by Lambda. After an initial burst of traffic, Lambda can scale up every minute by an additional 500 microVMs0.

This scaling process continues until the concurrency limit is met. Developers can request a concurrency increase in the AWS Support Center1.

When Lambda is not able to cope with the amount of concurrent requests an application is experiencing, requesters will receive a throttling error (429 HTTP status code)2.

Reserved Concurrency

The concurrency limit discussed in the previous topic is shared across all functions in an AWS account. Developers might want to limit one or more functions, so that they don’t eat up all the concurrency capacity.

This can be done by setting the Reserved Concurrency parameter in the AWS Lambda configuration. For more information, please follow the AWS documentation about Reserving Concurrency for a Lambda Function.

Provisioned Concurrency

AWS Lambda allows developers to anticipate how many instances of a function should be provisioned and warm to serve requests. By setting a minimal provisioned concurrency level, the performance of all requests are guaranteed to stay below double-digit milliseconds.

Using this feature can be beneficial for workloads that are time-sensitive, such as customer-facing endpoints. Nevermind, it is a step back in the serverless model and comes with several financial caveats.

Learn more about this feature and its caveats in its dedicated Knowledge Base page.

Security Considerations

Reserved concurrency setting is recommended to be used whenever possible in all Lambda functions. Since it prevents Low & Slow DoS attacks3.


Footnotes

Operate Cloud Applications at Highest Quality

Save time spent on debugging applications.

Increase development velocity and quality.

Get actionable insights to your infrastructure.

Finish setup in 2 minutes!