A Lambda function’s concurrency level is the number of invocations being served simultaneously at any given point in time. Lambda doesn’t limit the number of “requests per second/minute“, for example, as is common in API services. Developers can run as many requests per period of time as needed, providing that it doesn’t violates concurrency limits.
What is Concurrency
As stated below, concurrency is the total number of simultaneous requests in a given time. Below is a visual representation of this concept, to make it easier to understand.
Key takeaways from the diagram above:
- All requests lasted a few milliseconds, having started and finished within one second
- At time Point 1, the concurrency is four requests
- At time Point 2, concurrency dropped to only two requests
- Despite handling five requests in total, the maximum concurrency was four over this period of one second
Concurrency Limits and Scalability
Lambda concurrency limits will depend on the Region where the function is deployed. It will vary from 500 to 3,000.
New functions are limited to this default concurrency threshold set by Lambda. After an initial burst of traffic, Lambda can scale up every minute by an additional 500 microVMs1 (or instances of a function).
This scaling process continues until the concurrency limit is met. Developers can request a concurrency increase in the AWS Support Center2.
When Lambda is not able to cope with the amount of concurrent requests an application is experiencing, requesters will receive a throttling error (429 HTTP status code)3.
The concurrency limit discussed in the previous topic is shared across all functions in an AWS account. Developers might want to limit one or more functions, so that they don’t eat up all the concurrency capacity.
This can be done by setting the Reserved Concurrency parameter in the AWS Lambda configuration. For more information, please follow the AWS documentation about Reserving Concurrency for a Lambda Function.
AWS Lambda allows developers to anticipate how many instances of a function should be provisioned and warm to serve requests. By setting a minimal provisioned concurrency level, the performance of all requests are guaranteed to stay below double-digit milliseconds.
Using this feature can be beneficial for workloads that are time-sensitive, such as customer-facing endpoints. Nevermind, it is a step back in the serverless model and comes with several financial caveats.
Learn more about this feature and its caveats in its dedicated Knowledge Base page.
Reserved concurrency setting is recommended to be used whenever possible in all Lambda functions. Since it prevents Low & Slow DoS attacks4.