AWS Certified Big Data - Specialty Exam: Lambda Blueprints for Data Transformation

Available Lambda Blueprints for Data Transformation

Question

HikeHills.com (HH) is an online specialty retailer that sells clothing and outdoor refreshment gear for trekking, go camping, boulevard biking, mountain biking, rock hiking, ice mountaineering, skiing, avalanche protection, snowboarding, fly fishing, kayaking, rafting, road and trace running, and many more. HH runs their entire online infrastructure on java based web applications running on AWS.

The HH is capturing clickstream data and use custom-build recommendation engine to recommend products which eventually improve sales, understand customer preferences and already using AWS Kinesis Streams API and Agents to collect events and transaction logs and process the stream.

The event/log size is around 12 Bytes. HH is using Kinesis firehose to apply data conversion to store the data in a standardized format into S3 and uses Lambda Blueprints to process data conversion.

What kind of Lambda blueprints are available to process transformations? Select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Answer: A,B,C.

Kinesis Data Firehose provides the following Lambda blueprints that you can use to create a Lambda function for data transformation.

General Firehose Processing - Contains the data transformation and status model described in the previous section.

Use this blueprint for any custom transformation logic.

Apache Log to JSON - Parses and converts Apache log lines to JSON objects, using predefined JSON field names.

Apache Log to CSV - Parses and converts Apache log lines to CSV format.

Syslog to JSON - Parses and converts Syslog lines to JSON objects, using predefined JSON field names.

Syslog to CSV - Parses and converts Syslog lines to CSV format.

Kinesis Data Firehose Process Record Streams as source - Accesses the Kinesis Data Streams records in the input and returns them with a processing status.

Kinesis Data Firehose CloudWatch Logs Processor - Parses and extracts individual log events from records sent by CloudWatch Logs subscription filters.

https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html#lambda-blueprints

AWS Lambda is a compute service that enables you to run code without provisioning or managing servers. Lambda functions can be used to process data and perform transformations on data streaming through AWS Kinesis Firehose.

Kinesis Firehose is a fully managed service that delivers real-time streaming data to destinations such as Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service. Firehose provides a set of predefined Lambda blueprints that can be used to transform incoming data before it is stored in the destination.

The available Lambda blueprints for processing transformations in Kinesis Firehose are as follows:

A. General Firehose Processing: This blueprint contains the data transformation and status model used for any custom transformation logic. This can be used to implement custom transformation logic in Lambda, which can be applied to data before it is stored in S3.

B. Parses and converts Apache log lines to JSON or CSV format: This blueprint can be used to parse Apache log lines and convert them into JSON or CSV format before storing them in S3. This is useful for organizations that use Apache logs for monitoring and troubleshooting.

C. Parses and converts Syslog lines to JSON or CSV format: This blueprint can be used to parse Syslog lines and convert them into JSON or CSV format before storing them in S3. This is useful for organizations that use Syslog for collecting system logs.

D. Parses and converts Apache log lines to Apache Parquet or Apache ORC format: This blueprint can be used to parse Apache log lines and convert them into Apache Parquet or Apache ORC format before storing them in S3. This is useful for organizations that want to store large amounts of log data in a highly compressed, columnar format.

E. Parses and converts Syslog lines to Apache Parquet or Apache ORC format: This blueprint can be used to parse Syslog lines and convert them into Apache Parquet or Apache ORC format before storing them in S3. This is useful for organizations that want to store large amounts of system log data in a highly compressed, columnar format.

In summary, AWS Kinesis Firehose provides a set of Lambda blueprints that can be used to process and transform data before it is stored in S3. These blueprints include General Firehose Processing for custom transformations, and specific blueprints for parsing and converting Apache logs and Syslog data into JSON, CSV, Apache Parquet, or Apache ORC formats.