Aggregating Data Continuously with Windowed Queries on AWS for Improved Sales

AWS Streaming and Kinesis Analytics for Windowed Queries

Question

HikeHills.com (HH) is an online specialty retailer that sells clothing and outdoor refreshment gear for trekking, go camping, boulevard biking, mountain biking, rock hiking, ice mountaineering, skiing, avalanche protection, snowboarding, fly fishing, kayaking, rafting, road and trace running, and many more. HH runs their entire online infrastructure on java based web applications running on AWS.

The HH is capturing clickstream data and use custom-build recommendation engine to recommend products which eventually improve sales, understand customer preferences and already using AWS Streaming capabilities to collect events and transaction logs and process the stream. HH is using kinesis analytics to build SQL querying capability on streaming and planning to use windowed Queries to process the data.

What kind of windows queries need to be used to that aggregates data continuously, using a fixed time or rowcount interval for e.g.

after 1 minute or after 2000 rows.

select 1 option.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: C.

Option A is incorrect -Stagger windows query, A query that aggregates data using keyed time-based windows that open as data arrives.

The keys allow for multiple overlapping windows.

This is the recommended way to aggregate data using time-based windows.

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/stagger-window-concepts.html

Option B is incorrect -Tumbling Windows query, A query that aggregates data using distinct time-based windows that open and close at regular intervals.

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html

Option C is correct -Sliding windows query, A query that aggregates data continuously, using a fixed time or rowcount interval.

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html

Option D is incorrect -Continuous Query is a query over a stream executes continuously over streaming data.

This continuous execution enables scenarios, such as the ability for applications to continuously query a stream and generate alerts.

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/continuous-queries-concepts.html

The correct answer is B. Tumbling Windows queries.

Explanation: In AWS Kinesis Analytics, windowing functions are used to group streaming data records together based on a specific time period or record count. There are three types of windowing functions: Tumbling Windows, Sliding Windows, and Stagger Windows.

  • Tumbling Windows queries: Tumbling windows are non-overlapping windows of fixed duration. In this type of windowing, data records are grouped into non-overlapping windows of fixed time interval or record count. For example, a Tumbling Window query might group incoming streaming data into 1-minute intervals or into groups of 2000 records.

  • Sliding Windows queries: Sliding windows are overlapping windows of fixed duration. In this type of windowing, data records are grouped into overlapping windows of fixed time interval or record count. For example, a Sliding Window query might group incoming streaming data into 1-minute intervals, but each window overlaps with the previous window by a fixed amount of time.

  • Stagger Windows queries: Stagger windows are overlapping windows with staggered start and end times. In this type of windowing, data records are grouped into overlapping windows with staggered start and end times. For example, a Stagger Window query might group incoming streaming data into 1-minute intervals, but each window starts and ends at a different point in time.

Continuous queries are used to process the streaming data continuously without any grouping or windowing functions.

In the given scenario, HH needs to use Tumbling Windows queries to group incoming streaming data into non-overlapping windows of fixed time interval or record count, such as 1-minute intervals or groups of 2000 records, to aggregate data continuously.