Mitigating Concerns with Kinesis Data Streams API | BDS-C00 Exam Solution

Addressing Increased Latency, Diminished Throughput, and Data Protection in AWS | BDS-C00 Exam Solution

Question

RetailEcst, an online company recently moved their business application onto AWS, uses Kinesis Data Streams API for both data ingestion and consumption of transaction data into a multi-shard stream to address real-time data integration into RDS database and enhance search through ES.

The team identified couple of issues pertaining to the following Increased latency, Diminished throughput at heavy workloads during business hours Reliability of the ingested data Concerns around protection of data at rest to meet regulatory requirements Which of the following 3 options would help us to mitigate above mentioned concerns?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Answer : A, D and E.

Option A is correct - Server-side encryption using AWS Key Management Service (AWS KMS) keys can be enabled to meet strict data management requirements by encrypting your data at rest within Amazon Kinesis Data Streams.

This encrypts data before it's at rest by using an AWS KMS customer master key (CMK) you specify.

Data is encrypted before it's written to the Kinesis stream storage layer, and decrypted after it's retrieved from storage.

As a result, your data is encrypted at rest within the Kinesis Data Streams service.

This allows you to meet strict regulatory requirements and enhance the security of your data.

https://docs.aws.amazon.com/streams/latest/dev/what-is-sse.html

Option B is incorrect - Batching is not enabled for Streams API.

This can only be used with KPL for data ingestion.

Besides Collection reduces the overhead of making many separate HTTP requests for a multi-shard stream.

Batching refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item.

Collection refers to batching multiple Kinesis Data Streams records and sending them in a single HTTP request with a call to the API operation PutRecords, instead of sending each Kinesis Data Streams record in its own HTTP request.

This increases throughput compared to using no collection because it reduces the overhead of making many separate HTTP requests.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

Option C is incorrect - Built-in Retry mechanism can only be enabled with KPL.

Besides When Kinesis Producer Library (KPL) user records are added to a stream; a record is given a time stamp and added to a buffer with a deadline set by the RecordMaxBufferedTime configuration parameter.

This time stamp/deadline combination sets the buffer priority.

Records are flushed from the buffer based on the following criteria:

Buffer priority.

Aggregation configuration.

Collection configuration.

Records flushed are then sent to your Kinesis data stream as Amazon Kinesis Data Streams records.

The PutRecords operation sends requests to your stream that occasionally exhibit full or partial failures.

Records that fail are automatically added back to the KPL buffer.

The new deadline is set based on the minimum of these two values:

Half the current RecordMaxBufferedTime configuration.

The record's time-to-live value.

This strategy allows retried KPL user records to be included in subsequent Kinesis Data Streams API calls, to improve throughput and reduce complexity while enforcing the Kinesis Data Streams record's time-to-live value.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-producer-adv-retries-rate-limiting.html

Option D is correct - The PutRecords operation sends multiple records to Kinesis Data Streams in a single request.

By using PutRecords, producers can achieve higher throughput when sending data to their Kinesis data stream.

A PutRecords request can include records with different partition keys.

The scope of the request is a stream; each request may include any combination of partition keys and records up to the request limits.

Requests made with many different partition keys to streams with many different shards are generally faster than requests with a small number of partition keys to a small number of shards.

The number of partition keys should be much larger than the number of shards to reduce latency and maximize throughput.

https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-sdk.html

Option E is correct - Handling failures using PutRecords is through failure handlers written in the code.

By default, failure of individual records within a request does not stop the processing of subsequent records in a PutRecords request.

This means that a response Records array includes both successfully and unsuccessfully processed records.

You must detect unsuccessfully processed records and include them in a subsequent call.

Records that were unsuccessfully processed can be included in subsequent PutRecords requests.

First, check the FailedRecordCount parameter in the putRecordsResult to confirm if there are failed records in the request.

If so, each putRecordsEntry that has an ErrorCode that is not null should be added to a subsequent request.

https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-sdk.html

Option F is incorrect - Client side encryption does not provide protection for kinesis data streams.

Data protection can only be enabled through Server side encryption.

Server-side encryption using AWS Key Management Service (AWS KMS) keys can be enabled to meet strict data management requirements by encrypting your data at rest within Amazon Kinesis Data Streams.

This encrypts data before it's at rest by using an AWS KMS customer master key (CMK) you specify.

Data is encrypted before it's written to the Kinesis stream storage layer, and decrypted after it's retrieved from storage.

As a result, your data is encrypted at rest within the Kinesis Data Streams service.

This allows you to meet strict regulatory requirements and enhance the security of your data.

https://docs.aws.amazon.com/streams/latest/dev/what-is-sse.html

RetailEcst is facing several challenges related to increased latency, diminished throughput, data reliability, and data protection. The following options can help mitigate these concerns:

A. Protection through server-side encryption using AWS KMS customer master key (CMK)

AWS Key Management Service (KMS) provides secure and easy-to-use key management features. By using AWS KMS, you can create and control customer master keys (CMKs) that encrypt your data. You can use server-side encryption with Amazon Kinesis Data Streams to encrypt data as it is written to Amazon S3. In this case, RetailEcst can use server-side encryption to encrypt the data at rest to meet regulatory requirements. AWS KMS provides a high level of control over the CMK and provides protection against unauthorized access to the key.

B. Batching, Collection

Batching and collection can help improve the throughput of data ingestion into Kinesis Data Streams. Instead of sending individual records, RetailEcst can group multiple records together and send them in a single HTTP request. This can reduce the overhead of sending multiple requests and improve the overall throughput. RetailEcst can also use collection, which is a way of grouping data records into logical groups to enhance processing efficiency and reduce the number of shards required.

C. Built-in Retry mechanism

Kinesis Data Streams provides a built-in retry mechanism that can help mitigate issues related to reliability. When an error occurs during data ingestion or consumption, the Kinesis client library will automatically retry the operation multiple times before reporting an error. This can help improve the reliability of the ingested data and reduce the need for manual intervention.

D. Sends multiple records to multiple shards of the stream per HTTP request

To improve throughput, RetailEcst can send multiple records to multiple shards of the stream per HTTP request. This can help increase the parallelism of data ingestion and reduce the time it takes to ingest large volumes of data.

E. Failure Handlers, Track Errors, Reprocess

RetailEcst can implement failure handlers, track errors, and reprocess the data to improve data reliability. If an error occurs during ingestion or consumption, RetailEcst can capture the error and implement a failure handler that can reprocess the data or take corrective action to resolve the issue.

F. Protection through client-side encryption using AWS KMS-managed customer master key (CMK)

RetailEcst can also use client-side encryption using AWS KMS-managed CMKs to encrypt the data before sending it to Kinesis Data Streams. This can provide an additional layer of protection for the data in transit and at rest.

In summary, RetailEcst can use a combination of these options to mitigate concerns related to increased latency, diminished throughput, data reliability, and data protection. Protection through server-side encryption, batching, and collection can help improve the overall performance of Kinesis Data Streams. Built-in retry mechanisms, failure handlers, and tracking errors can help improve the reliability of ingested data. And finally, client-side encryption can provide an additional layer of protection for data in transit and at rest.