FundsLawn - Addressing Performance and Cost Issues with Streaming Platform

Addressing Performance and Cost Issues with Streaming Platform

Question

FundsLawn, a financial services company provides fully automated funding to small businesses in minutes, leverage on data generated through business activity to understand performance and processing of funding requests.

Uses multi-shard kinesis data streams as data integration backbone,KPL to ingest data generated from various business segments, load the data into applications like RedShift, ES, DynamoDB for invoicing and S3 for long term storage using KCL library. Resulting in heavy inflow of requests from existing and new customers in different business segments post recent successful campaign, FundsLawn observed a need for strategy to address the following issues with the existing platform TCO for maintaining the streaming platform is too high Performance of the streaming platform does not meet SLA's Understand the performance of ingestion to improve throughput Please select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Answer : A, C, F.

Option A is correct - Aggregation helps to improve the per shard throughput.

This is also optimizes the overall TCO of the stream.

Batching refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item.

Aggregation refers to the storage of multiple records in a Kinesis Data Streams record.

Aggregation allows customers to increase the number of records sent per API call, which effectively increases producer throughput.

Kinesis Data Streams shards support up to 1,000 Kinesis Data Streams records per second, or 1 MB throughput.

The Kinesis Data Streams records per second limit binds customers with records smaller than 1 KB.

Record aggregation allows customers to combine multiple records into a single Kinesis Data Streams record.

This allows customers to improve their per shard throughput.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

Option B is incorrect -This may not be a viable option because still we are at a phase we are working on the strategy to redesign our sharding mechanism.

We need metrics to identify hot and cold shards and proceed with redesigning the sharding mechanism.

Besides, the purpose of resharding in Amazon Kinesis Data Streams is to enable your stream to adapt to changes in the rate of data flow.

Split shards to increase the capacity (and cost) of your stream.

You merge shards to reduce the cost (and capacity) of your stream.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding-strategies.html

Option C is correct - Collection reduces the overhead of making many separate HTTP requests for a multi-shard stream.

Batching refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item.

Collection refers to batching multiple Kinesis Data Streams records and sending them in a single HTTP request with a call to the API operation PutRecords, instead of sending each Kinesis Data Streams record in its own HTTP request.

This increases throughput compared to using no collection because it reduces the overhead of making many separate HTTP requests.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

Option D is incorrect - This may not be a viable option because still we are at a phase we are working on the strategy to redesign our sharding mechanism.

We need metrics to identify hot and cold shards and proceed with redesigning the sharding mechanism.

Besides, the purpose of resharding in Amazon Kinesis Data Streams is to enable your stream to adapt to changes in the rate of data flow.

Split shards to increase the capacity (and cost) of your stream.

You merge shards to reduce the cost (and capacity) of your stream.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding-strategies.html

Option E is incorrect - Enhanced Kinesis Data Streams monitoring level Metrics provide information of the streams at shards.

This does not provide information about data ingestion.

Kinesis sends the following shard-level metrics to CloudWatch every minute.

These metrics are not enabled by default.

There is a charge for enhanced metrics emitted from Kinesis.

Shard-level metrics are for specific monitoring tasks, usually related to troubleshooting.

https://docs.aws.amazon.com/streams/latest/dev/monitoring-with-cloudwatch.html#kinesis-metrics

Option F is correct - The Kinesis Producer Library (KPL) for Amazon Kinesis Data Streams publishes custom Amazon CloudWatch metrics.

Specify an application name when launching the KPL, which is then used as part of the namespace when uploading metrics.

Configure the KPL to add arbitrary additional dimensions to the metrics.

This is useful if you want finer-grained data in your CloudWatch metrics.

One of two important factors for a metric includes level and granularity.

The levels are NONE, SUMMARY, and DETAILED.

While granularity at GLOBAL, STREAM, and SHARD.

When SHARD is chosen, metrics are emitted with the stream name and shard ID as dimensions.

Metrics for the current KPL instance are available locally in real time; you can query the KPL at any time to get them.

The KPL locally computes the sum, average, minimum, maximum, and count of every metric, as in CloudWatch.

https://docs.aws.amazon.com/streams/latest/dev/monitoring-with-kpl.html

FundsLawn, a financial services company, uses Kinesis Data Streams as a data integration backbone to ingest data generated from various business segments, and then load the data into applications like RedShift, ES, DynamoDB, and S3 using the KCL library. However, the company has observed a need for a strategy to address some issues with the existing platform, such as high TCO for maintaining the streaming platform, poor performance that does not meet SLAs, and the need to understand the performance of ingestion to improve throughput.

To address these issues, the company should consider the following options:

  1. Batching: Batching is a technique used to group multiple records into a single record and process them together. Batching can help reduce the number of requests to the Kinesis stream, which can reduce costs and improve performance.

  2. Resharding: Resharding is a technique used to increase or decrease the number of shards in a Kinesis data stream to improve performance or reduce costs. When the current number of shards is not sufficient to handle the incoming data load, resharding can help increase throughput. Similarly, if the current number of shards is too high and not fully utilized, resharding can help reduce costs.

  3. Enhanced Kinesis Data Streams Monitoring Level Metrics: Monitoring metrics such as "IncomingBytes" and "IncomingRecords" can help identify the performance of ingestion and overall stream throughput. By monitoring these metrics, FundsLawn can understand the performance of the stream and make necessary changes to improve the performance.

  4. Shard Split Operation: This option is similar to resharding, but instead of increasing the number of shards, it involves splitting a single shard into two shards. This can help improve performance by increasing the number of parallel consumers.

  5. Shard Merge Operation: This option is the opposite of shard splitting and involves merging two shards into one. This can help reduce costs by reducing the number of shards and parallel consumers.

  6. KPL Metrics at SHARD granularity: The Kinesis Producer Library (KPL) is used to ingest data into Kinesis data streams. Monitoring KPL metrics such as "PutRecord.Success" and "PutRecord.Latency" can help identify the performance of data ingestion at the shard level.

From the given options, the best three options for FundsLawn to address the issues with their existing platform are:

A. Batching, Aggregation: Using batching and aggregation can help reduce the number of requests to the Kinesis stream, which can reduce costs and improve performance.

D. Resharding, Shard Merge Operation: Resharding and shard merge operations can help reduce costs by reducing the number of shards and parallel consumers.

E. Enhanced Kinesis Data Streams Monitoring Level Metrics: Monitoring metrics such as "IncomingBytes" and "IncomingRecords" can help identify the performance of ingestion and overall stream throughput, which can help improve the performance of the stream.

Therefore, options A, D, and E are the correct answers to the given question.