Azure Synapse Data Distribution: Round-Robin, Hashing, and Replication | DP-203 Exam | Microsoft

Data Distribution Types in Azure Synapse: Round-Robin, Hashing, Replication

Question

In Azure Synapse, the data is distributed in three different ways: Round-Robin, Hashing, and Replication.

Which distribution type to be used depends upon the scenario and the requirements.

Which of the following statement(s) is/are true about these distribution types? (Select all that are applicable)

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Correct Answers: B and C.

The following table describes which distribution to use and not to use in which scenario.

Type of Distribution

Best Fit for..

Do not use when...

Replicated

Small dimension tables in a
star schema with less then 2
GB of storage after the
compression (Synapse does
5x compression).

Good for small lookup
tables.

Good for dimension tables
that are frequently joined
with other big tables.

-Many write transactions are
on the table (for example
insert, delete and updates).

-If you change the
datawarehouse Units
frequently.

-You only use 2 -3 columns
‘out of many columns in your
tables.

~you are indexing a
replicated table.
Round Robin (default)

Temporary /staging Table.

No obvious joining key
candidate is found in the
table or If your data doesn't
frequently join with data
from other tables.

-When you cannot identify a
single key to distribute your
data.

-If the table is being used to
hold temporary data.

~If you are using a staging
table for faster loads.

~If you are unsure of query
patterns and data, you can
start with all tables in round-
robin distribution. And as you
learn the patterns the data
can be easily redistributed
ona hash key.

-Small dimension table.

Performance is slow due to
data movement

Hash

~Large Fact Tables or
historical Transaction tables
are good candidates.

~Large dimension tables.

The distribution key can not
be updated

~Anullable column is a bad
candidate for any hash
distributed table.

-Fact tables that has a
default value in a column is
also not a good candidate to
create a hash distributed
table.

Option A is incorrect.

Round Robin Distribution, not Hash distribution, is ideal when you can't identify a single key for distributing your data.

Option B is correct.

Round Robin Distribution is recommended when you can't identify a single key for distributing your data.

Option C is correct.

You should choose the Round Robin distribution if the table is having temporary data.

Option D is incorrect.

Choosing a Replicated distribution is not the ideal choice if the table is having temporary data.

Option E is incorrect.

Choosing Hash distribution is not the ideal choice if the table is having temporary data.

Option F is incorrect.

Replicated distribution, not Hash distribution, is ideal for dimension tables that are very frequently joined with other big tables.

To know more about the Right distribution strategy, please visit the below-given link:

In Azure Synapse, data distribution is used to spread data across multiple nodes in a distributed database system. The way data is distributed can impact the performance and efficiency of the system. Azure Synapse provides three different data distribution types: Round-Robin, Hashing, and Replication.

  • Round-Robin distribution: In Round-Robin distribution, data is distributed evenly across all available nodes in a circular fashion. It is typically used when there is no obvious key for distributing data. Round-Robin distribution can be useful for scenarios where data distribution needs to be balanced, or where the data is temporary and does not require optimized queries.

  • Hashing distribution: In Hashing distribution, data is distributed based on the hash value of a column or a set of columns. Hashing distribution can be useful when there is a natural key for distributing data or when data distribution needs to be optimized for queries that use certain columns. Hashing distribution is not suitable for tables with frequently changing data or temporary data, as it can be expensive to redistribute the data when new data is added or old data is removed.

  • Replication distribution: In Replication distribution, data is replicated across multiple nodes, which means that each node has a complete copy of the data. Replication distribution can be useful when there is a need for high availability or when queries need to be optimized for read-heavy workloads. Replication distribution is not suitable for tables with frequently changing data or temporary data, as it can be expensive to keep all copies of the data synchronized.

Based on the above, the following statements are true:

  • A. Choose Hash distribution when you can't identify a single key for distributing your data.

    • This is true. Hash distribution is a good choice when there is no natural key for distributing data.
  • B. Choose Round Robin distribution when you can't identify a single key for distributing your data.

    • This is also true. Round Robin distribution is a good choice when there is no natural key for distributing data.
  • C. Choose Round Robin distribution if the table is having temporary data.

    • This is true. Round Robin distribution can be a good choice for temporary data, as it does not require optimized queries.
  • D. Choose Replicated distribution if the table is having temporary data.

    • This is not true. Replicated distribution is not a good choice for temporary data, as it can be expensive to keep all copies of the data synchronized.
  • E. Choose Hash distribution if the table is having temporary data.

    • This is not always true. Hash distribution can be expensive for tables with frequently changing data, as it can be expensive to redistribute the data.
  • F. Hash distribution is ideal for dimension tables that are very frequently joined with other big tables.

    • This is true. Hash distribution is often used for dimension tables that are frequently joined with other large tables, as it can optimize query performance.