Monitoring platform for keeping systems up and running at all times.
Full stack visibility across the entire stack.
Detect and resolve any incident in record time.
Conform to industry best practices.
Engineering time is a precious resource. We often have to balance many tasks and often conflicting priorities. In this article, we’ll look at ten activities for which allocating more time can be beneficial.
Let’s learn from the mistakes of others.
Have you ever deleted something prematurely only to figure out that there is no backup? A good rule of thumb is to check three times before deleting anything. This may involve cross-checking if we are in the right environment, region, database schema, or S3 bucket.
Additionally, there are many ways of mitigating the impact of unintentional deletions:
Most cloud providers offer an object storage service. Most of them (AWS S3 or GCP GCS) store data in buckets. Microsoft chose the naming “containers” rather than “buckets” for their Azure Blob Storage. Why is this confusing? Because containers are typically associated with running instances of Docker images, not with storing BLOB objects. This example shows that even the largest technology companies in the world make confusing naming decisions sometimes.
One of my colleagues often repeats this quote which nicely summarizes it:
“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton
Great technology products are rarely created in a vacuum; they emerge when several smart people support each other in solving challenging problems. That’s why feedback and code reviews are so critical. They serve as a basis for constructive discussions that lead to better engineering processes and better code.
Most engineers creating pull requests genuinely want to hear your thorough feedback and learn from your experience. They rely on you to identify mistakes and issues they might not have thought about.
You probably encountered it many times in your career. Somebody assigned you a ticket saying: “do XYZ” without specifying the WHY behind it. It’s assumed upfront that XYZ is the right solution to the problem, and you should simply do it. But when you start diving deeper into the actual issue, you notice that XYZ may not be the optimal approach.
It’s always helpful that a ticket or user story defines the problem and the stakeholders involved. It may suggest potential solutions but not exactly prescribing what must be done unless you’re really sure about that. Once the problem is defined, engineers are smart enough to figure out the best ways to tackle it.
Architecture decisions typically have far-reaching consequences. Once things are implemented, it’s expensive to “undo” them (think: time-consuming migration projects).
Still, often we don’t take time to evaluate enough options and fail to ask for feedback. The most popular tool may not necessarily be the best one for the problem at hand.
Sometimes reaching an agreement on common standards is half the battle. Failing to communicate with other teams may lead to frictions and conflicts.
For instance, many data teams invested heavily in cloud data warehouses, data ingestion platforms, and SQL-based transformation tools in recent years. Then, they started to advocate that everyone should, from now on, use SQL to solve all their analytical problems.
Following this advice, one may think that we no longer need distributed clusters such as Dask or Spark. But trying to standardize (solely) on SQL, we forget other teams. What about data scientists and quantitative researchers? SQL is not enough to solve their problems. The same is true when you need to serve data for consumption by APIs, process automation, and many other interesting use cases that leverage data for more than reporting. Python would be a much better choice for those use cases.
Similarly, Paul Singman suggested in this article that agreeing on common data definitions and interfaces between software engineers and data teams can sometimes eliminate the need for ETL jobs. Note that this is feasible only in a few circumstances when building tools based on a pub-sub architecture.
Giving praise or paying compliments may feel awkward sometimes, but we all crave validation to some extent. It’s fascinating to see the positive impact of a simple “Great job!”. Watch out for false praise, though; people can tell when compliments are insincere.
It’s challenging to find and manage engineering talent these days. Still, hiring prematurely and then firing people can harm the company culture and team morale. Many adopt the “hell yes or no” approach. Whichever strategy you choose, it’s advisable to take time and be intentional about the team you want to build.
Some issues in engineering stem from not reading the logs properly.
A true story: in my first consulting project, we were working on a Hadoop cluster, and I had difficulties figuring out why my Spark job has failed. I asked a colleague, and he pointed me to a Java stack trace with the error message, which I seemed to have overlooked in a large directory of log files. I had to (shamefully) admit that he was right — the answer was in the logs; I didn’t take enough time to read it all thoroughly.
If you want to avoid similar embarrassing situations and you happen to use serverless, have a look at Dashbird. The platform allows you to filter through the logs of all your serverless AWS resources. You don’t need to set up any custom log handlers in your code. Dashbird automatically pulls the logs directly from the CloudWatch APIs so that you don’t need to set up anything.
You can then search through all your logs in real-time from the UI, including X-Ray traces.
Global log search in Dashbird.io — image by the author
Did you know that most IT projects fail due to communication issues? We have more communication tools than we ever had in history, yet it’s still challenging to find a balance between under-and overcommunicating.
This article discussed ten challenging aspects in engineering to which we sometimes don’t dedicate enough time. These include deleting and naming things, reading logs, giving feedback and praise, defining problems, designing architectures, hiring, and finally communicating within the team and other teams.
Thank you for reading!
Further reading:
Why are some engineers missing the point of serverless?
Q&A with Dashbird’s CTO: Leading and managing a serverless dev team
Dashbird explained
In this guide, we’ll talk about common problems developers face with serverless applications on AWS and share some practical strategies to help you monitor and manage your applications more effectively.
Today we are announcing a new, updated pricing model and the end of free tier for Dashbird.
In this article, we’re covering 4 tips for AWS Lambda optimization for production. Covering error handling, memory provisioning, monitoring, performance, and more.
Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.
Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.
Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.
Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.
I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.
Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.
Great UI. Easy to navigate through CloudWatch logs. Simple setup.
Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.