Processing Data Across Multiple Shards | AWS Kinesis | DVA-C01 Exam

Processing Data Across Multiple Shards

Prev Question Next Question

Question

You are developing an application that is going to make use of Amazon Kinesis.

Due to the high throughput, you decide to have multiple shards for the streams.

Which of the following is TRUE when it comes to processing data across multiple shards?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - A.

Kinesis Data Streams lets you order records and read and replay records in the same order to many Kinesis Data Streams applications.

To enable write ordering, Kinesis Data Streams expects you to call the PutRecord API to write serially to a shard while using the sequenceNumberForOrdering parameter.

Setting this parameter guarantees strictly increasing of sequence numbers for puts from the same client and to the same partition key.

Option A is correct as it cannot guarantee the ordering of records across multiple shards.

Option B, C and D are incorrect becauseKinesis Data Streams can order records on a single shard.

Each data record has a sequence number that is unique within its shard.

Kinesis Data Streams assigns the sequence number after you write to the stream with putRecords or client.putRecord.

For more information, please refer to:

https://aws.amazon.com/blogs/database/how-to-perform-ordered-data-replication-between-applications-by-using-amazon-dynamodb-streams/ https://docs.aws.amazon.com/streams/latest/dev/key-concepts.html

When using Amazon Kinesis, data is ingested into a stream, which is composed of multiple shards. Each shard is an independent sequence of data records, and Kinesis assigns a unique sequence number to each record that is inserted into the stream.

When processing data across multiple shards in Kinesis, it is important to keep in mind that there is no inherent ordering of data across shards. Each shard operates independently, and data may be processed in a different order than it was ingested.

Therefore, the correct answer is A: You cannot guarantee the order of data across multiple shards. It's possible only within a shard.

However, if you need to maintain order across shards, you can use the sequence number assigned to each record by Kinesis. By using the sequence numbers, you can determine the order in which the records were ingested into the stream and reorder them accordingly during processing. Alternatively, you can use a timestamp as part of the record data to maintain ordering across multiple shards.

In summary, when processing data across multiple shards in Kinesis, you cannot rely on inherent ordering, but you can use sequence numbers or timestamps to reorder the data as needed. It is not necessary to use Kinesis Firehose to guarantee the order of data.