AWS Certified Big Data - Specialty: Implementing Kinesis Streams Firehose for Log Data

Implementing Kinesis Streams Firehose for Log Data

Question

A company is planning on using Kinesis streams firehose to stream their log data from various web servers that host the Apache web server.

An application will then read the data which needs to be in JSON format from the underlying destination bucket.

Which of the following ideally needs to be in place to ensure that this flow can be implemented?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - A.

The AWS Documentation mentions the following.

#######

Lambda Blueprints.

Kinesis Data Firehose provides the following Lambda blueprints that you can use to create a Lambda function for data transformation.

General Firehose Processing - Contains the data transformation and status model described in the previous section.

Use this blueprint for any custom transformation logic.

Apache Log to JSON - Parses and converts Apache log lines to JSON objects, using predefined JSON field names.

Apache Log to CSV - Parses and converts Apache log lines to CSV format.

Syslog to JSON - Parses and converts Syslog lines to JSON objects, using predefined JSON field names.

Syslog to CSV - Parses and converts Syslog lines to CSV format.

Kinesis Data Firehose Process Record Streams as source - Accesses the Kinesis Data Streams records in the input and returns them with a processing status.

Kinesis Data Firehose CloudWatch Logs Processor - Parses and extracts individual log events from records sent by CloudWatch Logs subscription filters.

#######

Options B and C are incorrect since this needs to be done by an AWS Lambda function.

Option D is incorrect since there is no such configuration option.

For more information on data transformation, please refer to the below URL.

https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html

To implement the given flow, the following components must be in place:

  1. Kinesis Firehose to stream the log data
  2. An application to read data from the underlying destination bucket
  3. Data should be in JSON format

Option A: Ensure that a Lambda transformation is used along with Kinesis Firehose This option suggests using a Lambda function to transform the data before sending it to the destination bucket. Lambda function can be used to parse the log data and convert it into JSON format. Although this option can work, it is not necessary to use a Lambda function if the log data is already in JSON format. Using a Lambda function would add extra processing time and cost, which may not be required.

Option B: Ensure that the KPL library is used to parse the records in JSON format. The KPL library (Kinesis Producer Library) is used to send data to Kinesis streams. It is not used for parsing data in JSON format. Therefore, this option is incorrect.

Option C: Ensure that the KCL library is used to parse the records in JSON format. The KCL library (Kinesis Client Library) is used to consume data from Kinesis streams. It is not used for parsing data in JSON format. Therefore, this option is incorrect.

Option D: Change the configuration of the underlying Kinesis data firehose stream to store JSON formatted data. This option suggests configuring Kinesis Firehose to store the data in JSON format. This is the correct option because it directly addresses the requirement of having data in JSON format. Kinesis Firehose can be configured to transform the data and store it in JSON format. This can be done by using AWS Lambda or AWS Glue to transform the data, or by providing a pre-built data transformation script to Kinesis Firehose.

Therefore, the correct answer is option D - Change the configuration of the underlying Kinesis data firehose stream to store JSON formatted data.