AWS Kinesis Stream Shards Calculation

AWS Kinesis Stream Shards Calculation

Prev Question Next Question

Question

You are working on a system that will use AWS Kinesis, and it is getting data from various log sources.

You are looking at creating an initial number of shards for the Kinesis stream.

Which of the following can be used to calculate the initial number of shards for the Kinesis stream? Choose 2 answers from the options given below.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - A and D.

This is given in the AWS Documentation.

You can calculate the initial number of shards (

number_of_shards.

) that your stream needs by using the input values in the following formula:

number_of_shards = max(incoming_write_bandwidth_in_KiB/1024, outgoing_read_bandwidth_in_KiB/2048)

For more information on Amazon Kinesis streams, please refer to the below URL-

https://docs.aws.amazon.com/streams/latest/dev/amazon-kinesis-streams.html
To determine the initial size of a stream, you need the following input values:

° The average size of the data record written to the stream in kilobytes (KB),
rounded up to the nearest 1 KB, the data size (average_data_size_in_KB).

¢ The number of data records written to and read from the stream per second
(records_per_second).

¢ The number of Kinesis Data Streams applications that consume data
concurrently and independently from the stream, that is, the consumers
(number_of_consumers).

© The incoming write bandwidth in KB (incoming_write_bandwidth_in_KB),
which is equal to the average_data_size_in_KB multiplied by the
records_per_second.

© The outgoing read bandwidth in KB (outgoing_read_bandwidth_in_KB),
which is equal to the incoming_write_bandwidth_in_kB multiplied by the
number_of_consumers.

When designing an Amazon Kinesis data stream, one of the most important factors to consider is the number of shards. A shard represents a unit of data capacity in a Kinesis stream, and it can process up to 1MB of data per second or 1000 write transactions per second. Therefore, the number of shards you create for a Kinesis stream determines the maximum amount of data that the stream can handle.

When deciding on the number of shards to create for a Kinesis stream, there are several factors to consider, such as the amount of incoming write bandwidth, the expected outgoing write bandwidth, and the incoming and outgoing read bandwidth. Based on these factors, you can use the following approaches to determine the initial number of shards for your Kinesis stream:

  1. Incoming write bandwidth: One way to determine the initial number of shards for a Kinesis stream is to consider the expected incoming write bandwidth. To do this, you can estimate the total amount of data that the stream is expected to receive per second and divide it by the maximum amount of data that a single shard can handle (1 MB/s). For example, if you expect the stream to receive 5 MB/s of data, you would need to create five shards to handle the incoming write bandwidth.

  2. Incoming and outgoing read bandwidth: Another approach to determine the initial number of shards for a Kinesis stream is to consider the expected incoming and outgoing read bandwidth. To do this, you can estimate the maximum number of read transactions per second that the stream is expected to receive and divide it by the maximum number of read transactions that a single shard can handle (5 read transactions/s). For example, if you expect the stream to receive 10,000 read transactions per second, you would need to create 2,000 shards to handle the incoming and outgoing read bandwidth.

Outgoing write bandwidth and outgoing read bandwidth are not used to calculate the initial number of shards for a Kinesis stream, as these factors are determined by the number of consumers and the amount of data they are consuming from the stream.

In summary, when determining the initial number of shards for a Kinesis stream, you should consider the expected incoming write bandwidth and the expected incoming and outgoing read bandwidth.