Removing Unused Capacity: Resharding Strategy for AWS Kinesis Stream

Resharding Strategy for AWS Kinesis Stream

Question

HikeHills.com (HH) is an online specialty retailer that sells clothing and outdoor refreshment gear for trekking, go camping, boulevard biking, mountain biking, rock hiking, ice mountaineering, skiing, avalanche protection, snowboarding, fly fishing, kayaking, rafting, road and trace running, and many more. HH runs their entire online infrastructure on java based web applications running on AWS.

The HH is capturing click stream data and use custom-build recommendation engine to recommend products which eventually improve sales, understand customer preferences and already using AWS kinesis KPL to collect events and transaction logs and process the stream. HH IT team identified lot of performance issues with the Kinesis Stream and based on the metrics captured, identified hot and cold shards.IT team wants to effectively remove the unused capacity.

There are 2 shards SHARD 1 with a hash key range of 276...381 and SHARD 2 with a hash key range of 382...454

What Resharding strategy needs to be applied and how can it be applied? Select 1 option.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: D.

Merging the shards removes unnecessary shards and improves usage thereby reducing total costs.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding-merge.html

The correct answer is B.

To understand why, let's first review what a Kinesis shard is and how resharding works in AWS Kinesis.

A Kinesis stream is made up of one or more shards, each of which is a fixed capacity data store that can hold up to 1 MB of data per second or 1000 records per second, whichever comes first. Each shard is identified by a unique sequence number range, known as a hash key range. The hash key range determines which data records are stored in that shard. As data is ingested into the Kinesis stream, it is distributed among the shards based on the hash value of the partition key.

Resharding is the process of adjusting the number of shards in a Kinesis stream, either by splitting an existing shard into two or more smaller shards or by merging two or more shards into a larger shard. The goal of resharding is to optimize the number of shards in the stream to match the incoming data rate, so that the stream can handle the load efficiently without incurring additional costs for unused capacity.

In this scenario, the IT team at HH has identified performance issues with the Kinesis stream and has determined that there are hot and cold shards. To effectively remove unused capacity, the team needs to reshard the stream to optimize the shard distribution.

Based on the provided information, there are currently two shards in the stream: SHARD 1 with a hash key range of 276...381 and SHARD 2 with a hash key range of 382...454. To optimize the shard distribution, the team should split SHARD 1 into two smaller shards and adjust the hash key ranges accordingly.

Option A suggests splitting SHARD 1 into two shards with hash key ranges of 276...332 and 332...381. This would create two smaller shards, but the new range for SHARD 1B overlaps with the old range for SHARD 1A. This could result in data being duplicated or lost during the resharding process, and could cause further performance issues.

Option B, on the other hand, splits SHARD 1 into two shards with hash key ranges of 276...332 and 333...381. This ensures that there is no overlap between the new shard ranges, which reduces the risk of data duplication or loss. It also splits SHARD 2 into two smaller shards with hash key ranges of 382...410 and 411...454, which helps to balance the load across the stream.

Option C suggests merging and splitting both shards into three smaller shards, which would further balance the load across the stream, but would also increase the complexity of the resharding process.

Option D suggests merging the two existing shards into a single larger shard with a hash key range of 276...454. While this would eliminate unused capacity, it would also limit the stream's ability to scale in the future if the data rate increases.

In summary, the most appropriate resharding strategy for HH's Kinesis stream is option B, which splits SHARD 1 into two shards with hash key ranges of 276...332 and 333...381, and splits SHARD 2 into two shards with hash key ranges of 382...410 and 411...454.