Solving invisible scaling issues with Serverless and MongoDB

Don’t follow blindly, weigh your actions carefully.


Ever since software engineering became a profession, we have been trying to serve users all around the globe. With this comes the issue of scaling and how to solve it. Many times these thoughts of scaling up our software to unimaginable extents are premature and unnecessary.

This has turned into something else altogether with the rise of serverless architectures and back-end-as-a-service providers. Now we’re not facing issues of how to scale up and out, but rather how to scale our database connections without creating heavy loads.

With the reduced insight we have about the underlying infrastructure, there’s not much we can do except for writing sturdy, efficient code and use appropriate tools to mitigate this issue.

Or is it? ?

How do databases work with serverless?

With a traditional server, your app will connect to the database on startup. Quite logical, right? The first thing it does is hook up to the database via a connection string and not until that’s done, the rest of the app will initialize.

Serverless handles this a bit differently. The code will actually run for the first time only once you trigger a function. Meaning you have to both initialize the database connection and interact with the database during the same function call.

Going through this process every time a function runs would be incredibly inefficient and time-consuming. This is why serverless developers utilize a technique called connection pooling to only create the database connection on the first function call and re-use it for every consecutive call. Now you’re wondering how this is even possible?

The short answer is that a lambda function is, in all essence, a tiny container. It’s created and kept warm for an extended period of time, even though it is not running all the time. Only after it has been inactive for over 15 minutes will it be terminated.

This gives us a time frame of 15 to 20 minutes where our database connection is active and ready to be used without suffering any performance loss.

Using Lambda with MongoDB Atlas

Here’s a simple code snippet for you to check out.

// db.js
const mongoose = require('mongoose')
const connection = {}

module.exports = async () => {
  if (connection.isConnected) {
    console.log('=> using existing database connection')
    return
  }

  console.log('=> using new database connection')
  const db = await mongoose.connect(process.env.DB)
  connection.isConnected = db.connections[0].readyState
}

Once you take a better look at the code above you can see it makes sense. At the top, we’re requiring mongoose and initializing an object called connection. There’s nothing more to it. We’ll use the connection object as a cache to store whether the database connection exists or not.

The first time the db.js file is required and invoked it will connect mongoose to the database connection string. Every consecutive call will re-use the existing connection.

Here’s what it looks like in the handler which represents our lambda function.

const connectToDatabase = require('./db')
const Model = require('./model')

module.exports.create = async (event) => {
  try {
    const db = await connectToDatabase()
    const object = Model.create(JSON.parse(event.body))
    return {
      statusCode: 200,
      body: JSON.stringify(object)
    }
  } catch (err) {
    return {
      statusCode: err.statusCode || 500,
      headers: { 'Content-Type': 'text/plain' },
      body: 'Could not create the object.'
    }
  }
}

This simple pattern will make your lambda functions cache the database connection and speed them up significantly. Pretty cool huh? ?

All of this is amazing, but what if we hit the cap of connections our database can handle? Well, great question! Here’s a viable answer.

What about connection limits?

If capping your connection limit has you worried, then you might think about using a back-end-as-a-service to solve this issue. It would ideally create a pool of connections your functions would use without having to worry about hitting the ceiling. Implementing this would mean the provider will give you a REST API which handles the actual database interaction while you only use the APIs.

You hardcore readers will think about creating an API yourselves to house the connection pool or use something like GraphQL. Both of those solutions are great for whichever use case fits you best. But, I’ll focus on using off-the-shelf tools for getting up and running rather quickly.

Using Lambda with MongoDB Stitch

If you’re a sucker for MongoDB, like I am, you may want to check out their backend-as-a-service solution called Stitch. It gives you a simple API to interact with the MongoDB driver. You just need to create a Stitch app, connect it to your already running Atlas cluster and your set. In the Stitch app, you make sure to enable anonymous login and create your database name and collection.

Install the stitch npm module and reference your Stitch app id in your code then start hitting the APIs.

const { StitchClientFactory, BSON } = require('mongodb-stitch')
const { ObjectId } = BSON
const appId = 'notes-stitch-xwvtw'
const database = 'stitch-db'
const connection = {}

module.exports = async () => {
  if (connection.isConnected) {
    console.log('[MongoDB Stitch] Using existing connection to Stitch')
    return connection
  }

  try {
    const client = await StitchClientFactory.create(appId)
    const db = client.service('mongodb', 'mongodb-atlas').db(database)
    await client.login()
    const ownerId = client.authedId()
    console.log('[MongoDB Stitch] Created connection to Stitch')

    connection.isConnected = true
    connection.db = db
    connection.ownerId = ownerId
    connection.ObjectId = ObjectId
    return connection
  } catch (err) {
    console.error(err)
  }
}

As you can see the pattern is very similar. We create a Stitch client connection and just re-use it for every consequent request.

The lambda function itself looks almost the same as the example above.

const connectToDatabase = require('./db')

module.exports.create = async (event) => {
  try {
    const { db } = await connectToDatabase()
    const { insertedId } = await db.collection('notes')
      .insertOne(JSON.parse(event.body))

    const addedObject = await db.collection('notes')
      .findOne({ _id: insertedId })

    return {
      statusCode: 200,
      body: JSON.stringify(addedObject)
    }
  } catch (err) {
    return {
      statusCode: err.statusCode || 500,
      headers: { 'Content-Type': 'text/plain' },
      body: 'Could not create the object.'
    }
  }
}

Seems rather similar. I could get used to it. However, Stitch has some cool features out of the box like authentication and authorization for your client connections. This makes it really easy to secure your routes.

How to know it works?

To make sure I know which connection is being used at every given time, I use Dashbird’s invocation view to check my Lambda logs.

Here you can see it’s creating a new connection on the first invocation while re-using it on consecutive calls.

The service is free for 14 days, so you can check it out if you wantLet me know if you want an extended trial or just join my newsletter. ?

Wrapping up

In an ideal serverless world, we don’t need to worry about capping our database connection limit. However, the amount of users required to hit your APIs to reach this scaling issue is huge. This example above shows how you can mitigate the issue by using back-end-as-a-service providers. Even though Stitch is not yet mature, it is made by MongoDB, which is an amazing database. And using it with AWS Lambda is just astonishingly quick.

To check out a few projects which use both of these connection patterns shown above jump over here:

If you want to read some of my previous serverless musings head over to my profile or join my newsletter!

Or, take a look at a few of my other articles regarding serverless:

Hope you guys and girls enjoyed reading this as much as I enjoyed writing it. Until next time, be curious and have fun.


We aim to improve Dashbird every day and user feedback is extremely important for that, so please let us know if you have any feedback about these improvements and new features! We would really appreciate it!

 
 

Read our blog

ANNOUNCEMENT: new pricing and the end of free tier

Today we are announcing a new, updated pricing model and the end of free tier for Dashbird.

4 Tips for AWS Lambda Performance Optimization

In this article, we’re covering 4 tips for AWS Lambda optimization for production. Covering error handling, memory provisioning, monitoring, performance, and more.

AWS Lambda Free Tier: Where Are The Limits?

In this article we’ll go through the ins and outs of AWS Lambda pricing model, how it works, what additional charges you might be looking at and what’s in the fine print.

Made by developers for developers

Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.

What our customers say

Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.

Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.

Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.

I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.

Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.

Great UI. Easy to navigate through CloudWatch logs. Simple setup.

Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.