Kinesis Stream Data Processing for AWS Certified Big Data - Specialty Exam

Ensure All Data Processing from Stream

Question

A company is making use of Kinesis streams for transferring data from various sources.

The Consumers will run at different times depending on the priority of data retrieval.

Most consumers run within the hour and there are some which run once in 2 days.

Which of the following must be implemented on the stream to ensure all data gets processed from within the stream?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - B.

The AWS Documentation mentions the following.

Amazon Kinesis Data Streams supports changes to the data record retention period of your stream.

A Kinesis data stream is an ordered sequence of data records meant to be written to and read from in real time.

Data records are therefore stored in shards in your stream temporarily.

The time period from when a record is added to when it is no longer accessible is called the retention period.

A Kinesis data stream stores records from 24 hours by default, up to 168 hours.

All of the other options are incorrect since these are not key requirements for ensuring that data gets processed from the stream.

For more information on the retention period, please refer to the below URL.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-extended-retention.html

The correct answer to the question is B. Ensure that the data retention is changed for the stream.

Explanation:

Kinesis is a real-time streaming data platform provided by AWS that enables organizations to collect, process, and analyze data in real-time. Kinesis streams are used to collect and process large amounts of data from multiple sources. Kinesis streams can have multiple consumers that can read data from the stream at different times.

In the given scenario, the company is using Kinesis streams to transfer data from various sources, and the consumers run at different times depending on the priority of data retrieval. Some consumers run within the hour, while some run once in 2 days. To ensure that all data gets processed from within the stream, the data retention for the stream needs to be changed.

Data retention is the amount of time that data remains in the stream after it has been added. By default, the data retention period is 24 hours, but it can be increased up to 7 days. Increasing the data retention period will ensure that all data is retained in the stream for a longer period, allowing consumers to read data at their own pace without the risk of losing data.

The other options in the answer are incorrect because:

A. Ensuring that encryption is enabled on the stream will not ensure that all data gets processed from within the stream.

C. Attaching Kinesis Firehose to the stream is used for delivering data from Kinesis streams to other AWS services like S3, Redshift, and Elasticsearch. It does not ensure that all data gets processed from within the stream.

D. Ensuring that the consumer runs on an EC2 instance will not ensure that all data gets processed from within the stream.