Amazon BDS-C00: AWS Certified Big Data - Specialty Exam: Capture Aggregates with Amazon Kinesis Data Analytics

Capture Aggregates with Amazon Kinesis Data Analytics

Question

HikeHills.com (HH) is an online specialty retailer that sells clothing and outdoor refreshment gear for trekking, go camping, boulevard biking, mountain biking, rock hiking, ice mountaineering, skiing, avalanche protection, snowboarding, fly fishing, kayaking, rafting, road and trace running, and many more. HH runs their entire online infrastructure on multiple java based web applications and other web framework applications running on AWS.

The HH is capturing clickstream data and use custom-build recommendation engine to recommend products which eventually improve sales, understand customer preferences and already using AWS Kinesis Streams (KDS) to collect events and transaction logs and process the stream.

Multiple departments from HH use different streams to address real-time integration and induce analytics into their applications and uses Kinesis as the backbone of real-time data integration across the enterprise. HH uses a VPC to host all their applications and is looking at integration of kinesis into their web application.

To understand the network flow behavior based on every 15 minutes, HH is looking at aggregating data based on the VPC logs for analytics.

VPC Flow Logs have a capture window of approximately 10 minutes.

What kind of queries can be used to capture aggregates based on each client for every 15 mins using Amazon Kinesis Data Analytics.

Select 1 option.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: A.

Option A is correct -Stagger windows query, A query that aggregates data using keyed time-based windows that open as data arrives.

The keys allow for multiple overlapping windows.

This is the recommended way to aggregate data using time-based windows.

VPC Flow Logs have a capture window of approximately 10 minutes.

But they can have a capture window of up to 15 minutes if you're aggregating data on the client.

Stagger windows are ideal for aggregating these logs for analysis.

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/stagger-window-concepts.html

Option B is incorrect -Tumbling Windows query, A query that aggregates data using distinct time-based windows that open and close at regular intervals.

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html

Option C is incorrect -Sliding windows query, A query that aggregates data continuously, using a fixed time or rowcount interval.

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html

Option D is incorrect -Continuous Query is a query over a stream executes continuously over streaming data.

This continuous execution enables scenarios, such as the ability for applications to continuously query a stream and generate alerts.

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/continuous-queries-concepts.html

To capture aggregates based on each client for every 15 minutes using Amazon Kinesis Data Analytics, we need to use a windowing function to group the data into time intervals. The different types of windowing functions in Kinesis Data Analytics are:

  1. Tumbling Windows queries: Tumbling windows divide the data stream into non-overlapping fixed size windows. For example, if we want to capture aggregates for every 15 minutes, we can define a tumbling window of 15 minutes. The tumbling window slides forward in fixed intervals, and all records that fall within a particular window are aggregated. This approach is useful when we want to capture data at a fixed interval.

  2. Sliding Windows queries: Sliding windows divide the data stream into overlapping fixed size windows. For example, if we want to capture aggregates for every 15 minutes, we can define a sliding window of 15 minutes, with a slide interval of 5 minutes. This means that the window slides forward every 5 minutes, and records that fall within a window are included in the aggregation. This approach is useful when we want to capture data at regular intervals, but also want to capture data that falls between intervals.

  3. Stagger Windows queries: Stagger Windows is a variation of sliding windows where each window starts at a different point in time. For example, we can define multiple stagger windows of 15 minutes, each starting at a different minute of the hour. This approach can be useful when we want to capture data at irregular intervals.

  4. Continuous queries: Continuous queries do not use windows and instead process all data in real-time. This approach is useful when we want to continuously process and analyze data as it arrives, without the need to divide it into windows.

In this scenario, since we want to capture aggregates based on each client for every 15 minutes, the most appropriate windowing function to use would be Tumbling Windows queries. We can define a tumbling window of 15 minutes, which would slide forward every 15 minutes, and records that fall within a window are included in the aggregation.