5 Common Amazon Kinesis Issues

Amazon Kinesis is the real-time stream processing service of AWS. Whether you got video, audio, or IoT streaming data to handle, Kinesis is the way to go.

Kinesis is a serverless managed service that integrates nicely with other services like Lambda or S3. Often, you will use it when SQS or SNS is too low-level.  

But as with all the other services on AWS, Kinesis is a professional tool that comes with its share of complications. This article will discuss the most common issues and explain how to fix them. So, let’s get going!

1. What Limits Apply when AWS Lambda is Subscribed to a Kinesis Stream?

If your Kinesis stream only has one shard, the Lambda function won’t be called in parallel even if multiple records are waiting in the stream. To scale up to numerous parallel invocations, you need to add more shards to a Kinesis Stream.

Kinesis will strictly serialize all your invocations per shard. This is a nice feature for controlling your parallel Lambda invocations. But it can slow down overall processing if the function takes too long to execute.

If you aren’t relying on previous events, you can use more shards, and Lambda will automatically scale up to more concurrent invocations. But keep in mind that Lambda itself has a soft limit on 1,000 concurrent invocations. You can reach out to AWS to get this limit lifted. There isn’t an explicitly defined hard limit above that, but AWS mentions its multiples of 10,000.

2. Data Loss with Kinesis Streams and Lambda

If you call put_record in a loop to publish records from a Lambda function to a Kinesis stream, this can fail mid-loop. To fix this, make sure you catch any errors the put_record method throws; otherwise, your function will crash and only partially publish the list of records.

If one Lambda invocation is responsible for publishing multiple records to a Kinesis stream, you have to make sure a crash of the Lambda function doesn’t lose data. Depending on your use case, this could mean you need to use retries or another queue in front of your Lambda function. 

You can also try to catch any errors instead of crashing and then put the missing records somewhere else to ensure they don’t get lost.

3. InvokeAccessDenied Error When Pushing Records from Firehose to Lambda

You’re trying to push a record from Kinesis Firehose to a Lambda function but get an error. This is usually a permission issue with IAM roles. To fix this, make sure to assign your firehose the correct IAM role.

In the Resource section of your policy document, you need to make sure all your Lambda functions’ ARNs are listed. You achieve this with either a wildcard in the ARN or an array of ARNs. 

But there can be many other permission problems that prevent invocation. Some of them are:

  • Missing the “Action”: [“lambda:InvokeFunction”]
  • Having an “Effect”: “Deny” somewhere
  • Assigning the wrong role to the firehose

4. Error When Trying to Update the Shard Count

You tried to update the shard count too often in a given period. The UpdateShardCount method has rather tight limits. To get around this issue, you can call other functions like SplitShard and MergeShards, with more generous quotas.

Often, you don’t know how many shards are sufficient to handle your load, so you have to update their numbers over time. AWS limits how you meddle with the shard count. To quote the docs here, you can’t

  • Scale more than ten times per rolling 24-hour period per stream
  • Scale up to more than double your current shard count for a stream
  • Scale down below half your current shard count for a stream
  • Scale up to more than 10000 shards in a stream
  • Scale a stream with more than 10000 shards down unless the result is less than 10000 shards
  • Scale up to more than the shard limit for your account

If you use other methods, you can get around some of the limitations, which give you more flexibility around sharding.

5. Shard is Not Closed

You interacted too soon after you created a new stream. Creating a new stream can take up to 10 minutes to complete. You can set timeouts after creating a stream or ensure that you retry a few times to fix this.

Creating new streams or shards isn’t an instant action. It happens very quickly, but you might have to wait for minutes in the worst case. As with any distributed system, you have to keep latencies in mind. Otherwise, your logs will be littered with errors.

Summary 

If you have to process your data or media in real-time, it’s best to go for Kinesis on AWS

Sadly, it’s not as straightforward as SQS and SNS, but it’s also more flexible than those services.

Your best course of action is to learn about the limitations of the service so you aren’t littered with avoidable error messages. Also, make sure to program your Lambda functions robustly so they don’t crash with half your data not processed yet.

Monitoring Kinesis with Dashbird

Dashbird will monitor all your Kinesis streams out of the box. Additionally, Dashbird will evaluate all your Kinesis logs according to the Well-Architected Framework. So, it’s not just metrics and errors en masse, but actionable information to improve your architecture with AWS best practices. 

Try Dashbird now for free, or check out our product tour!

At Dashbird, we understand that serverless’s core idea and value is to focus on the customer and the ability to avoid heavy lifting. That’s precisely what we provide. Finally, we allow developers to think about the end-user again and not be distracted by debugging and alarm management or worry about whether something is working.


Further reading:

Read our blog

Why and How to Monitor AWS Elastic Load Balancing

Dashbird recently added support for ELB, so now you can keep track of your load balancers in one central place. It comes with all the information you expect from AWS monitoring services and more!

Monitor Your AWS AppSync GraphQL APIs with Simplicity

Dashbird has just added support for AppSync to help you monitor all of your AppSync endpoints without needing to browse dozens of logs or stumble through traces in the X-Ray UI.

AWS AppSync as a Gateway to Your Cloud Infrastructure

This article will discuss AppSync, AWS’s managed GraphQL service. Read on if you’re building a new backend or want to see if there is a more refined solution to the gateway problem than ELB and API Gateway.

More articles

Made by developers for developers

Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.

What our customers say

Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.

Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.

Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.

I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.

Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.

Great UI. Easy to navigate through CloudWatch logs. Simple setup.

Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.