Azure Stream Analytics: Windowing Functions for Counting Tweets | Exam DP-200

Windowing Functions for Counting Tweets

Question

You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to an Azure Blob storage account.

You need to output the count of tweets from the last five minutes every minute.

Which windowing function should you use?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

D

Hopping window functions hop forward in time by a fixed period.

Incorrect Answers:

A: Sliding windows, unlike Tumbling or Hopping windows, output events only for points in time when the content of the window actually changes. In other words, when an event enters or exits the window.

B: Session window functions group events that arrive at similar times, filtering out periods of time where there is no data. A session window begins when the first event occurs. If another event occurs within the specified timeout from the last ingested event, then the window extends to include the new event. Otherwise if no events occur within the timeout, then the window is closed at the timeout.

C: Tumbling window functions are used to segment a data stream into distinct time segments. A Tumbling windows do not overlap, and an event cannot belong to more than one tumbling window.

https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions

In this scenario, the requirement is to output the count of tweets from the last five minutes every minute using Azure Stream Analytics. To achieve this, we need to use a windowing function.

Windowing functions in Stream Analytics allow you to segment your input data stream into smaller, more manageable sets. These sets can be processed separately and the results of each set can be combined to produce the final output.

There are four types of windowing functions in Stream Analytics: Tumbling, Hopping, Sliding, and Session. Let's see what each one does and which one is suitable for this scenario:

  • Tumbling Window: A tumbling window is a fixed-size, non-overlapping window. The window is divided into equal-sized segments, and each segment is processed independently. The segments do not overlap, so each input event belongs to exactly one segment. In this case, we need to output the count of tweets from the last five minutes every minute. So, we can use a tumbling window of size 5 minutes and a step of 1 minute.

  • Hopping Window: A hopping window is a fixed-size, overlapping window. The window is divided into equal-sized segments, but the segments can overlap. Each input event belongs to one or more segments. A hopping window can be useful when you want to capture trends or patterns that occur over a period of time. In this case, we do not need overlapping windows, so we can rule out hopping windows.

  • Sliding Window: A sliding window is a variable-size, overlapping window. The window slides over the input data stream at a fixed interval, and each window includes the events that occurred during the sliding period. Sliding windows are useful when you want to capture trends or patterns that occur over time but do not want to miss any data. In this case, we need to output the count of tweets from the last five minutes every minute. So, we can use a sliding window of size 5 minutes and a slide of 1 minute.

  • Session Window: A session window groups events that occur within a certain period of time, with the end of a session triggered by a period of inactivity. Session windows are useful when you want to group events that are related to a single session. In this case, we need to output the count of tweets from the last five minutes every minute. So, we can rule out session windows.

Based on the above analysis, we can conclude that the suitable windowing function for this scenario is a Sliding Window.