Retries and Idempotency

Understand the Lambda retry mechanism and how functions should be designed

Dashbird is a monitoring platform for monitoring modern cloud infrastructure by providing alerts, insights and data visualisation.

Start a Free Trial Learn more

Lambda Retry Mechanics

When a function invocation fails for some reason, Lambda may retry multiple times until the execution is successful. A retry is simply invoking the same function again with the same event payload.

This retry behavior makes it easier for developers to account for transient errors and network issues, for example. When the error persists for too many retry requests, Lambda will give up retrying and may send the failed invocation to a Dead Letter Queue0.

What developers must account for

Consider a function that processes checkout sales in an e-commerce website. Depending on how the function code is implemented, if something goes wrong and the request gets retried, it’s possible that the customer credit card will be charged more than once.

This issue might happen to hundreds or perhaps thousands of customers before getting fixed. It can easily become a nightmare for the customer support, financial department and, obviously to the development team behind the badly architected application.

Every code implemented in Lambda must follow a principle called Idempotence. Wikipedia defines it as a “property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application”.1.

In other words: when something goes wrong in an e-commerce website, the application makes sure a credit card will never be charged more than once for the same order.

Implementing Idempotence

Internal Services

A good practice to combine with idempotency is the separation of concerns2. Apart from being a good practice, it will help in the idempotence implementation. One of the reasons is that idempotency needs to be analyzed from the perspective of the operation. Having each operation properly isolated simplifies the process.

Read operations usually do not produce any side effects and are idempotent by nature. Checking if an item is available in stock, for example. This operation can be repeated multiple times without violating idempotence. Implement this type of operation as a dedicated Lambda function and eliminate them from the idempotence analysis.

Insert and delete aren’t idempotent operations by nature, but they can be with a unique identifier (UID) for the resource. In an e-commerce, the order could have a UID. The storing operation can be performed multiple times without creating multiple different order placements. All Lambda retries will have the same order UID.

The order UID could be, for example, a hash of the customer email or username, the purchase timestamp, and a list of items purchased. These variables would be sent as a parameter to the API when the site receives the order request. The UID can be easily re-generated by retry invocations based on the same parameters.

External APIs

Applications that rely on write-enabled third-party APIs can be tricky to ensure idempotency.

Some services will provide idempotency features by default. This will be the case of a credit card processing platform. Stripe3, for example, provides an idempotency key4 that enables safe retries.

In other cases, it might be necessary to run all operations internally first, validate the success and then interact with the external API. This wouldn’t be the ideal implementation but could be as good as one can get in some circumstances.

Tracking Retry Invocations

AWS Lambda and CloudWatch Logs will not highlight retry requests, nor link them to the original invocation request by default. Some professional serverless monitoring services - such as Dashbird - will link all invocations and retries automatically, allowing easy navigation across the stack of requests.


Footnotes

Operate Cloud Applications at Highest Quality

Save time spent on debugging applications.

Increase development velocity and quality.

Get actionable insights to your infrastructure.

Finish setup in 2 minutes!