AWS Certified Machine Learning - Specialty: Troubleshooting Kinesis Data Firehose and S3 Data Lake Integration Failure

Troubleshooting Kinesis Data Firehose and S3 Data Lake Integration Failure

Question

You are deploying your data streaming pipeline for your machine learning environment.

Your cloud formation stack has a Kinesis Data Firehose using the Data Transformation feature where you have configured Firehose to write to your S3 data lake.

When you stream data through your Kinesis Firehose, you notice that no data is arriving your S3 bucket.

What might be the problem that is causing the failure?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: D.

Option A is incorrect.

The maximum memory setting for lambda is 3 GB.

Using the maximum memory would not cause Firehose to fail to write to S3

It will increase the cost of your solution.

However, since per the AWS documentation, “Lambda allocates CPU power linearly in proportion to the amount of memory configured.”

Option B is incorrect.

Your S3 bucket used by Kinesis Data Firehose to output your data must be in the same region as your Firehose.

Since they are in the same region, this would not cause a failure to write to the S3 bucket.

Option C is incorrect.

The Kinesis Data Firehose documentation states that “Kinesis Data Firehose buffers incoming data before delivering it to Amazon S3

You can choose a buffer size (1-128 MBs) or buffer interval (60-900 seconds)

The condition that is satisfied first triggers data delivery to Amazon S3.” Using the default setting would not prevent Firehose from writing to S3.

Option D is correct.

The lambda timeout value default is 3 seconds.

For many Kinesis Data Firehose implementations, 3 seconds is not enough time to execute the transformation function.

Reference:

Please see the Amazon Kinesis Data Firehose developer guide documentation titled Configure Settings, the Amazon Kinesis Data Firehose developer guide documentation titled Amazon Kinesis Data Firehose Data Transformation, and the AWS Lambda developer guide documentation titled AWS Lambda Function Configuration.

The most likely reason for the failure is option C - "Your Kinesis Data Firehose buffer setting is set to the default value."

When data is streaming through a Kinesis Data Firehose, it is first buffered before being written to the destination. By default, Kinesis Data Firehose uses a buffer size of 5 MB or a buffer interval of 300 seconds (5 minutes), whichever comes first. This means that if the incoming data is less than 5 MB or arrives less frequently than every 5 minutes, the data will not be written to the destination.

To address this issue, you can adjust the buffer settings of the Kinesis Data Firehose. You can either increase the buffer size or reduce the buffer interval, depending on your requirements. For example, if your data is arriving in small amounts but frequently, you might want to reduce the buffer interval to ensure that data is written to the destination in a timely manner.

Option A - "Your lambda memory setting is set to the maximum value allowed" is not related to the issue at hand, as it pertains to the amount of memory allocated to a Lambda function, which is not directly involved in the Kinesis Data Firehose pipeline.

Option B - "Your S3 bucket is in the same region as your Kinesis Data Firehose" is a best practice, but not a likely cause of the issue, as data will not be lost if the bucket is in a different region.

Option D - "Your lambda timeout value is set to the default value" is also not related to the issue, as it pertains to the maximum amount of time a Lambda function can run, and not to the Kinesis Data Firehose pipeline.