Tick-Bank Stream Data Consumption: Best Approach for S3, Redshift, and Elasticsearch

Stream Data Consumption Approach

Question

Tick-Bank is a privately held Internet retailer of both physical and digital products founded in 2008

The company has more than six-million clients worldwide.

Tick-Bank aims to serve as a connection between digital content makers and affiliate dealers, who then promote them to clients.

Tick-Bank's technology aids in payments, tax calculations and a variety of customer service tasks.

Tick-Bank assists in building perceptibility and revenue making opportunities for entrepreneurs. Tick-Bank runs multiple java based web applications running on windows based EC2 machines in AWS managed by internal IT Java team, to serve various business functions.

Tick-Bank is looking to enable web-site traffic analytics there by understanding user navigational behavior, preferences and other click related info.

Tick-Bank uses event based streaming, based on Kinesis Stream to address data integration and uses producer library to integrate events. Tick-Bank want to use the data captured for multiple functions which include,Storage of data into S3, which later processed by lambda, load the data to support enterprise search built on ES Service,and Integrate into Data warehouse built on Redshift in near real-time.

What is the best approach to consume all the data captured in the stream is shared with all the applications mentioned abovewhich includes S3, Redshift and Elasticsearch (ES)

Select 1 option.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer:D.

If you want to send stream records directly to services such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), or Splunk, you can use a Kinesis Data Firehose delivery stream instead of creating a consumer application.

https://docs.aws.amazon.com/streams/latest/dev/amazon-kinesis-consumers.html

Tick-Bank has multiple Java-based web applications running on Windows-based EC2 machines in AWS. They are looking to enable website traffic analytics to understand user navigational behavior, preferences, and other click-related information. To achieve this, Tick-Bank uses an event-based streaming approach based on Kinesis Stream. The producer library is used to integrate events. Tick-Bank wants to use the data captured for multiple functions, including storing data into S3, processing with Lambda, loading data to support enterprise search built on ES Service, and integrating into a data warehouse built on Redshift in near real-time.

The question asks for the best approach to consume all the data captured in the stream, which is shared with all the applications mentioned above, including S3, Redshift, and Elasticsearch. Let's consider the available options:

Option A: Use enhanced Fan out consumers to integrate with above-mentioned downstream applications like S3, ES, and Redshift.

Enhanced Fan-Out is a feature in Amazon Kinesis Data Streams that allows consumers to receive real-time data updates with lower end-to-end latency. It is used to distribute streaming data to multiple consumers while maintaining low latency and high throughput. This approach is suitable when all downstream applications have the same requirements for consuming the streaming data. In this case, since S3, ES, and Redshift have different requirements, Enhanced Fan-Out might not be the best approach to consume data captured in the stream.

Option B: Use KCL to capture, process using Lambda blueprints and integrate with downstream applications like S3, ES, and Redshift.

KCL (Kinesis Client Library) is an open-source software library that helps developers build applications that consume data from Kinesis Data Streams. It simplifies the process of consuming and processing data in real-time using Amazon Kinesis Data Streams. Lambda blueprints are pre-built code templates that enable developers to quickly get started with creating Lambda functions. This approach seems suitable for this use case as it allows the data to be processed in real-time using Lambda functions and then integrated with downstream applications like S3, ES, and Redshift.

Option C: Use API to integrate, process using Lambda blueprints with downstream applications like S3, ES, and Redshift.

API (Application Programming Interface) is a set of protocols, routines, and tools for building software applications. It allows different software applications to communicate with each other. This option suggests using an API to integrate data processing using Lambda functions with downstream applications like S3, ES, and Redshift. This approach seems feasible, but it requires additional development effort to create the API.

Option D: Create separate Kinesis Firehose for different downstream applications like S3, ES, and Redshift.

Amazon Kinesis Data Firehose is a fully managed service that delivers real-time streaming data to destinations such as Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service. This approach suggests creating separate Kinesis Firehose for each downstream application, which can process data in near real-time. While this approach seems feasible, it requires creating multiple Kinesis Firehose, which might increase the cost of the solution.

In summary, option B (Use KCL to capture, process using Lambda blueprints and integrate with downstream applications like S3, ES, and Redshift) is the best approach to consume all the data captured in the stream, which is shared with all the applications mentioned above, including S3, Redshift, and Elasticsearch.