Monitoring platform for keeping systems up and running at all times.
Full stack visibility across the entire stack.
Detect and resolve any incident in record time.
Conform to industry best practices.
The word serverless starts to become a hot topic in the world of Computer Programming. Maybe you heard the word Serverless a couple of times, either by going to conferences or by talking with other people.
We took the decision of using Google as our cloud provider, although everything that is presented in this article can be achieved using other cloud providers like Amazon, Azure etc.
In this article, we will see how we can take advantage of serverless functions in order to build a Processing Data Pipeline for analyzing and processing data.
Let’s imagine that we are working at an IT Company and every couple of weeks we receive files that contain information about issues (tasks) from our projects. Our managers look from time to time into our application where they want to see statistics from all projects.
The project managers look every month to see what is the status of the projects from the company, like seeing the number of issues that were done in total from when the project was started and the number of story points done on that project. Sometimes they also want to see all the issues that were not of type bugs and were finished when the file was received.
In order to fulfill their needs, we are going to build a pipeline that filters and aggregates the data they are interested in.
Why are serverless technologies good in this case?
In this article we will see how we can implement a processing data pipeline using Google technologies. The same concept applies to any Cloud Provider that has Serverless technologies.
We are going to present the technologies we are going to use and then see how we can build this pipeline.
Serverless functions are isolated functions that have only one purpose. Keeping this in mind, we can think about our functions as being a _black box _with an **input **and an output.
Serverless functions shouldn’t be the replacement of a REST API, they should be additions to the main API that have a single, isolated dedicated purpose.
A good example would be when uploading a file into our system, we want to apply some filtering and do some calculations on that file.
There are multiple types of events that can trigger a serverless function. The types that we are going to use today are:
For communicating between our functions, we can have 2 approaches using services from google:
BigQuery is a serverless data warehouse. In BigQuery the data is organized in datasets and tables.
We are going to use this service for storing our data and analyzing it.
In our system we are going to upload csv files that have the following structure:
Our first serverless function will get the data from the file uploaded to Cloud Storage and upload it to BigQuery.
Trigger Type – Google Cloud Storage Finalize
This will be triggered when the file was uploaded successfully into our storage. This type of function will receive as parameters
A cool thing when working with multiple services from the same cloud provider is that we do not need to authenticate the services we are working with because they are automatically authenticated when deployed in the cloud.
In this function we are getting the reference of the file that we just uploaded and we load it’s content into a BigQuery table.
The function bigQuerySafeName creates a table name from the file name that respects the following conditions:
After we loaded the data into the table we publish a message on the filter-uploaded-data in order to trigger the second function from our pipeline.
We just wrote our first function, now all we need to do is deploy it in the cloud. Google provides a cli that can be used for achieving this.
With this command we say that we want to deploy a function named upload-to-bigquery in the region us-central1. The name of the function(entry point) in our application is uploadToBigQuery. We want this function to trigger when a new file finished uploading in the bucket projects_files. We give our function the maximum memory allowed by Google which is 2GB and we specify the maximum amount of time our function is allowed to run which is 9 minutes.
We are receiving csv files that contain a lot of data. Our analytics team is interested to see all the issues that are done and that are not bugs.
This brings us to our second function of the pipeline which filters the data and saves it into another table in our data warehouse. The function listens to a Pub Sub topic and when a message it’s published, the function is run automatically.
BigQuery allows us to run queries and save the results into a table. The main thing that we are doing here is making a query, running an asynchronous job and waiting for the results. After the results are returned, we are publishing a message on the update-final-table topic to trigger the last step of our pipeline.
The last step we want to do in our pipeline is to update the data in the last table were we keep the number of issues done and the number of story points done from when the project began.
This function runs a Job to update the data in the final BigQuery table.
Serveless technologies can be used in multiple cases and have many benefits from cost reduction to small, readable chunks of code. As we saw today in our implementation, the serverless functions should be small, isolated functions with a single purpose. With serverless functions we can also create complex data processing pipelines.
In this guide, we’ll talk about common problems developers face with serverless applications on AWS and share some practical strategies to help you monitor and manage your applications more effectively.
Today we are announcing a new, updated pricing model and the end of free tier for Dashbird.
In this article, we’re covering 4 tips for AWS Lambda optimization for production. Covering error handling, memory provisioning, monitoring, performance, and more.
Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.
Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.
Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.
Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.
I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.
Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.
Great UI. Easy to navigate through CloudWatch logs. Simple setup.
Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.