The Asynchronous Messaging pattern is very useful to decouple serverless functions and make systems more maintainable.
For example, a common anti-pattern in serverless projects is the extensive tight coupling between services. Having Lambda functions invoking each other directly usually leads to a project that is difficult to deploy, maintain, and extend.
This is a common outline of how serverless development roadmaps evolve:
- Multiple services are developed, each performing a particular role
- They are wired up using the Lambda Invoke API1
- Perhaps there is a central function or process that orchestrates others for relatively complex job requirements
- An apparently cohesive system is ready to ship
Although this seems easy to implement, services will be tightly coupled. Developers might not feel the pain for small implementations. But as the overall system grows, it will become increasingly hard to maintain and evolve:
- I/O time is wasted when one function depends on another to keep processing, which leads to poor resource utilization
- Changes to one service are more likely to have negative and unintended consequences in other parts of the system
- When one service fails, other services have a higher potential of failing as well
- Retrying failed requests becomes difficult and riskier
- Harder to ensure idempotency2 across system components
Before we get into more details about the architecture itself, let’s analyze why coupling is an issue for serverless functions.
An e-commerce system is something most of us are quite familiar with. Consider when a customer submits a purchase. Several processes need to be performed:
- Temporarily set aside products in stock
- Authorize and confirm payment (charge the credit card, for example)
- If the payment succeeds:
- Decrease products purchased from stock
- Submit purchase order to the distribution center
- Send confirmation message to the customer
- Perhaps other activities go on here, depending on the company
- If payment fails:
- Send failure message to the customer
- Alert financial department to verify
- Release products reserved in stock
The following architectural ideas are only for conceptual illustration purposes. They omit aspects that would be required in a production environment and might not be the most suitable for the particular context used as an example.
Consider there is a Lambda function to perform each of the actions above.
In a highly coupled architecture, there could be a central function called
SubmitOrder that would perform all these actions in a single run and imperatively.
Let’s analyze each disadvantage point mentioned above in light of this example:
I/O time is wasted when one function depends on another to keep processing, which leads to poor resource utilization.
Observe that each step of the process (in grey) blocks the execution of the main function Submit Order (in blue). Also, multiple I/O-bound processes are running in the background (in yellow).
As a result, the main function will be waiting idly for considerable periods of time. This leads to poor resource usage and higher costs since serverless functions are charged per execution time.
When one service fails, other services have a higher potential of failing as well.
What happens in the above outline if the Decrease Stock function fails? Perhaps the database was under heavy load and couldn’t process the decrement query.
Surely the request could be retried. But now there’s one more thing that needs to be coordinated by the Submit Order function, which increases complexity. As a result, the chances of experiencing even more errors only increase.
Retrying failed requests becomes difficult and riskier
To reduce complexity, let’s not handle the retrying of individual components within the Submit Order function. If anything fails, the main function will be retried from start.
How should a failure within the Decrease Stock step be handled? Running the entire Submit Order request again will authorize payment twice.
Should the previous payment authorization be canceled? If we keep it, the main function must skip the payment step in the retry. But now there’s one more logic to handle, which also adds complexity.
We can see that the tight coupling of services is already making it difficult to manage our system and make it perform in a reliable way without many hurdles.
Harder to ensure idempotency across system components
Idempotency is a “property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application“.2.
In other words: the Submit Order function would be idempotent if it was possible to retry it in case of failure without worrying about double charges. We dedicated an entire page in our Knowledge Base to cover the Lambda Retry behavior and Idempotency.
Every single serverless function should be idempotent. That is a trait of reliable and maintainable serverless systems.
What we need to know is: accomplishing idempotency for the Submit Order function would be a lot harder and complex in the tightly coupled architecture outlined above.
Let’s consider a different strategy and compare it against the first one. When a customer submits a purchasing order, the Submit Order function only dispatches two messages:
- Temporarily set aside products in stock
- Request payment authorization
The main function is now streamlined to only interact with a message queue service:
Each step in the process is now decoupled, having asynchronous messages as the means to make the system work cohesively.
Messages are further consumed by each step of the process independently. The “Set aside stock” for example, would pull pending messages from the queue and reserve products in the database.
Having each step working independently improves resource utilization and reduces costs by cutting down idle I/O time. It is also easier now to handle retries in case anything fails. When a failure occurs within one step of the process, the other functions will not be impacted.
Another benefit is being able to replace of modifying one service without having to touch others. Consider the system was only sending a confirmation e-mail to the customer. Now, sending an SMS message becomes a requirement. In the tightly coupled architecture, it would be hard to introduce any changes independently. With an asynchronous messaging architecture, we only need to add one message consumer.
Lastly, ensuring the scalability and reliability of a decoupled architecture is much easier. Consider Service A publishes a message for Service B to get a task done. In the case of Service B relying on a database, for instance, that poses scalability difficulties, Service A can still scale up without a problem. Messages may start to pile up for some time, but Service B will eventually catch up.
Even if this situation is undesirable (a list of messages piling up and taking more time to process than usual), it is better than having Service B crashing due to a peak demand coming from tight coupling with Service A.
Message handling options
Below is a brief introduction to the main options for handling asynchronous messages in a distributed serverless environment.
We mentioned a “message queue” in the example for the loosely-coupled architecture above. This is exactly what the name suggests: messages are received and piled up by the messaging system in a queue. A consumer can pull messages from the queue for processing. When processing is finished, the messaging system is notified to delete the message. If it fails, the message goes back to the queue.
The main components of a Pub/Sub system are:
- Publisher: publishes a message to a given topic (e.g. “Send confirmation message about purchase order #123456 to customer XYZ” to the topic “customer notifications”)
- Topic: receives messages and distributes to consumers
- Subscribers: enlist to receive messages published to a certain topic
In the “confirmation message” example, if we had only an e-mail sending mechanism and wanted to add SMS messaging as well, it would be a matter of deploying an SMS sender service and subscribing it to the “customer notifications” topic. The rest of the system remains unaware of this change. This makes the overall project a lot more manageable and extensible.
Read our detailed page about the Pub/Sub pattern.
An Event Bridge is somewhat similar to the above two, but with a difference: messages (called events) are matched to subscribers depending on a fine-grained pattern matching mechanism. Events that match a certain pattern would be forwarded to a particular subscriber responsible for processing them.
A Stream Processing system would be more suitable for continuous data ingestion. An example would be application or database logs, tracking a series of clicks in a user interface for analytical purposes, etc. It is not quite in the exact same category as the others in this list, but it can be helpful for batching.
Stream Processing services will usually allow to group multiple data points and invoke a service processor every 10 minutes, for example, or whenever the amount of data ingested reaches 10 megabytes. This batching mechanism can help improve performance and reduce costs in serverless architectures.