AWS Big Data Specialty Exam: Processing Clickstream Data with AWS Kinesis and Redshift

Process Clickstream Data with AWS Kinesis and Redshift

Question

HikeHills.com (HH) is an online specialty retailer that sells clothing and outdoor refreshment gear for trekking, go camping, boulevard biking, mountain biking, rock hiking, ice mountaineering, skiing, avalanche protection, snowboarding, fly fishing, kayaking, rafting, road and trace running, and many more. HH runs their entire online infrastructure on java based web applications running on AWS.

The HH is capturing clickstream data and use custom-build recommendation engine to recommend products which eventually improve sales, understand customer preferences and already using AWS Kinesis Producer Library to collect events and transaction logs and process the stream.

The event/log size is around 12 bytes. HH has the following requirements to process the data that is being ingested - Apply transformation of syslog data to CSV format Load the data capture, along with other transformations into Redshift Capture transformation failures Capture delivery failures Backup the syslog streaming data into a separate S3 bucket Select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F. G.

Answer: B, D, F.

Option A is incorrect -For Amazon Redshift destinations, streaming data is delivered to your S3 bucket first.

Kinesis Data Firehose then issues an Amazon Redshift COPY command to load data from your S3 bucket to your Amazon Redshift cluster.

If data transformation is enabled, you can optionally back up source data to another Amazon S3 bucket.

https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html#data-flow-diagrams

Option B is correct - For Amazon Redshift destinations, streaming data is delivered to your S3 bucket first.

Kinesis Data Firehose then issues an Amazon Redshift COPY command to load data from your S3 bucket to your Amazon Redshift cluster.

If data transformation is enabled, you can optionally back up source data to another Amazon S3 bucket.

https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html#data-flow-diagrams

Option C is incorrect - For Amazon Redshift destinations, streaming data is delivered to your S3 bucket first.

Kinesis Data Firehose then issues an Amazon Redshift COPY command to load data from your S3 bucket to your Amazon Redshift cluster.

If data transformation is enabled, you can optionally back up source data to another Amazon S3 bucket.

https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html#data-flow-diagrams

Option D is correct -when S3 is selected as destination, and Source record S3 backup is enabled, untransformed incoming data can be delivered to a separate S3 bucket and errors are delivered to processing-failed and errors folder in S3 bucket.

https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html https://docs.aws.amazon.com/firehose/latest/dev/basic-deliver.html#retry

Option E is incorrect - when S3 is selected as destination, and Source record S3 backup is enabled, untransformed incoming data can be delivered to a separate S3 bucket and errors are delivered to processing-failed and errors folder in S3 bucket.

https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html https://docs.aws.amazon.com/firehose/latest/dev/basic-deliver.html#retry

Option F is correct - when S3 is selected as destination, and Source record S3 backup is enabled, untransformed incoming data can be delivered to a separate S3 bucket.

https://docs.aws.amazon.com/firehose/latest/dev/create-destination.html#create-destination-s3

Option G is incorrect - when S3 is selected as destination, and Source record S3 backup is enabled, untransformed incoming data can be delivered to a separate S3 bucket.

https://docs.aws.amazon.com/firehose/latest/dev/create-destination.html#create-destination-s3

Sure, I'll explain each option in detail.

A. Streaming data can be directly loaded into Redshift from Kinesis Firehose. This option is incorrect. While Kinesis Firehose can directly load streaming data into Amazon Redshift, it does not support data transformation. Therefore, HH's requirement of transforming syslog data to CSV format cannot be met with this option alone.

B. Streaming data is delivered to your S3 bucket first. Kinesis Data Firehose then issues an Amazon Redshift COPY command to load data from your S3 bucket to your Amazon Redshift cluster. This option is correct. Kinesis Data Firehose can be configured to deliver streaming data to an S3 bucket first, where it can then be transformed into the required CSV format using an AWS Glue ETL job. The transformed data can then be loaded into Amazon Redshift using the COPY command. Kinesis Data Firehose can also capture and deliver any transformation failures and delivery failures to the specified destination.

C. Streaming data is delivered to your S3 bucket first. Kinesis Data Firehose then issues an Amazon Redshift Export command to load data from your S3 bucket to your Amazon Redshift cluster. This option is incorrect. The Amazon Redshift Export command is used to export data from Amazon Redshift to S3, not the other way around. This option cannot meet the requirement of transforming syslog data to CSV format before loading it into Amazon Redshift.

D. The transformation failures and delivery failures are loaded into processing-failed and errors folders in the same S3 bucket. This option is incorrect. While Kinesis Data Firehose can capture and deliver transformation failures and delivery failures to a specified destination, it does not create processing-failed or errors folders in the same S3 bucket.

E. The transformation failures and delivery failures are loaded into transform-failed and delivery-failed folders in the same S3 bucket. This option is correct. Kinesis Data Firehose can be configured to capture and deliver transformation failures and delivery failures to specified folders in the same S3 bucket where the data is being delivered.

F. When Redshift is selected as the destination, and Source record S3 backup is enabled, and Backup S3 Bucket is defined, untransformed incoming data can be delivered to a separate S3 bucket. This option is incorrect. While Kinesis Data Firehose can be configured to back up the incoming data to a separate S3 bucket, it cannot deliver untransformed incoming data to a separate S3 bucket when Amazon Redshift is selected as the destination.

G. S3 backups can be managed to bucket policies. This option is incorrect. While S3 bucket policies can be used to manage access to S3 backups, they cannot be used to manage the backups themselves.

Therefore, the correct options are B, E, and either A or D (depending on whether HH wants to load the untransformed data into a separate S3 bucket or the same S3 bucket as the transformed data).