Provisioned Concurrency

Dashbird continuously monitors and analyses your serverless applications to ensure reliability, cost and performance optimisation and alignment with the Well Architected Framework.

Product Features Start Free Trial


The Lambda model1 is based on ephemeral microVMs (or containers). Once a request is received, AWS will provision resources on-demand to run developer’s code and compute what’s necessary.

Depending on the runtime2 and how large is the codebase and its dependencies, the start up process of a function may take some time (from a few hundred milliseconds to several seconds). This is what is called “cold start” (the function was “cold” when the request was received).

AWS allows developers to set a minimal concurrency threshold in which the function would be warm to answer all requests in under double-digit millisecond. It is also possible to use auto-scaling to increase provisioned concurrency proactively when demand is on the rise.


Main caveats

Provisioned Concurrency is a step away from the serverless model of paying for what is used. By enabling it in a Lambda function, it is going back to renting compute capacity for time. This model defeats one of the main arguments in favor of using serverless: not paying for idle time.

One major caveat is that, by enabling Provisioned Capacity, the function is inelligible for the Lambda Free Tier.

Another problem is the complexity that the pricing model for Provisioned Concurrency3 adds to the overall Lambda financials. The amount of memory needed by a function is usually fixed and determined by the workload, so developers are left with only one dimension to analyze Lambda costs: duration of the executions.

Additionally to duration, with Provisioned Concurrency developers must also observe:

  • For how long provisioned capacity is active
  • How many concurrent instances of the function should be available


AWS Lambda Provisioned Concurrency will charge for the following dimensions:

  • $0.000004167 for every GB-second of provisioned capacity
  • $0.000009722 for every GB-second of function execution time


Consider a function with 512 MB allocated running for 31 days. It receives 10 Million requests, with a duration of 2 seconds, each.

In the traditional, on-demand pricing model, this function would cost:

  • Invocations: $0.20 * 10 Million = $2.00
  • Compute time:
    • $0.000000833 per 100 milliseconds
    • Total compute time: 10 Million * 2 sec = 20,000,000,000 seconds
    • Total compute cost: $0.000000833 * 20,000,000,000 / 100 ms = $166.60
  • Total Cost: $168.60

In Provisioned Capacity, with 50 function concurrent instances provisioned:

  • Invocations: $0.20 * 10 Million = $2.00
  • Provisioning:
    • $0.000004167 for every GB-second
    • 31 days * 24 hours * 3600 seconds = 2,678,400 seconds
    • Total concurrency provisioned: 512 MB/1,024 GB * 2,678,400 = 1.339.200 GB-second
    • Concurrency provisioned: 50
    • Total provisioning cost is 50 * 1.339.200 * $0.000004167 = $279.00
  • Compute time:
    • $0.000009722 for every GB-second
    • 10 Million x 2 seconds = 20,000,000 seconds
    • Total compute cost: 512/1024 * 20,000,000 * $0.000009722 = $97.22
  • Total Cost: $379.92

With this example, we can see that using provisioned concurrency can greatly increase the costs of running serverless workloads on AWS Lambda. In light of that, developers should plan and anticipate costs carefully before using it.

Impacts on Lambda Limits

The Provisioned Concurrency level counts to the function’s Reserved Concurrency4 limit and also to the account regional limits5.

Auto-scaling Provisioned Concurrency

It is possible to use Application Auto Scaling6 to automatically scale up and down the concurrency provisioned threshold.

There are three ways to implement the auto-scaling:

  1. Targeting7 a particular CloudWatch metric
  2. Step-scaling8: set metric thresholds for CloudWatch alarms to trigger the scaling process
  3. Scheduled-scaling9: scale concurrency level up/down depending on time

The first two options have some similarities in the way they work and are suitable for applications with unpredictable load behavior.

The last one (Scheduled-scaling) is suitable for applications that have predictable spikes in demand, such as an e-commerce during the Black Friday period, for example.

Configuring Provisioned Concurrency

AWS Console

The provisioned concurrency can be set manually from the AWS Console. Under the Provisioned concurrency configurations option, click “Add” or “Add configuration”:

Provisioned concurrency

It will open a new screen to select a version or alias of the function and the desired concurrency level:

Provisioned concurrency

Command Line Interface

With the AWS CLI, we can addlist and delete provisioned resources to our functions. Please see examples below:

Add 50 as the concurrency level for the version 123 of my-function:

aws lambda put-provisioned-concurrency --function-name my-function --qualifier 123 --provisioned-concurrent-executions 50 

List concurrency settings for my-function:

aws lambda list-provisioned-concurrency-configs --function-name my-function

Delete concurrency provisioned for the version 123 of my-function:

aws lambda delete-provisioned-concurrency-config --function-name my-function --qualifier 123


In the AWS SAM YAML, declare Provisioned Concurrency settings like the example below.

Bear in mind that AWS SAM will raise an error if this feature is used when AutoPublishAlias is not set.

Type: AWS::Serverless::Function
  Handler: hello_lamba
  ProvisionedConcurrencyConfig: {
    ProvisionedConcurrentExecutions: 50

Serverless Framework

Provisioned Concurrency can be configured in the Serverless framework YAML file such as:

    handler: hello_handler
    provisionedConcurrency: 50


  1. Read more about the Lambda programming model and virtualization strategy ↩︎
  2. Some runtimes are slower than others. Usually, Python and nodejs are the fastest ones. Compile languages, such as .NET and Java are the worse performers. ↩︎
  3. The Provisioned Concurrency pricing model adds complexity to the Lambda financials ↩︎
  4. Reserved Concurrency ↩︎
  5. AWS Lambda account regional limits ↩︎
  6. AWS Application Auto Scaling ↩︎
  7. Target-based Application Auto-Scaling ↩︎
  8. [Step-scaling] Application Auto-Scaling ↩︎
  9. Scheduled-based Application Auto-Scaling ↩︎

No results found