Implementing Azure Data Lake Gen2 Storage Account: Minimum Cost Replication for Data Accessibility

Minimum Cost Replication for Data Accessibility

Question

You are implementing an Azure Data Lake Gen2 storage account.

You need to ensure that data will be accessible for write and read operations both even if an entire data center (zonal or non-zonal) becomes unavailable.

Which kind of replication would you use for the storage account? (Choose the solution with minimum cost)

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: B

Zone-redundant storage replicates the Azure Storage data in a synchronous manner across 3 Azure availability zones in the primary region.

With Zone-redundant storage, the data remains accessible for write and read operations both even if a zone is not available.

The following table describes the durability and availability by outage scenario:

Outage scenario

Anode within a data center becomes unavailable

‘An entire data center (zonal or non-zonal) becomes unavailable
‘A region-wide outage occurs in the primary region

Read access to the secondary region is available if the primary region
becomes unavailable

LRS

Yes

No

No

No

ZRS

Yes

Yes

No

No

GRS/RA-GRS

Yes

Yes!

Yes (with RA-
GRS)

GZRS/RA-GZRS

Yes

Yes

Yes!

Yes (with RA-
GZRS)

Account failover is needed for restoring the write availability if the primary region becomes unavailable.

Option A is incorrect.

LRS ensures availability only if a node within a data center becomes unavailable.

Option B is correct.

Zone-redundant storage replicates the Azure Storage data in a synchronous manner around 3 Azure availability zones within primary region.

Option C is incorrect.

GRS is not a cost-effective method.

ZRS will be a more suitable option in the given scenario.

Option D is incorrect.

GZRS will also ensure availability but it is not the redundant method with minimum cost.

ZRS will achieve the goal in the given scenario.

To know more about Azure Storage Redundancy, please visit the below-given link:

To ensure high availability and durability of data, Azure Data Lake Gen2 storage account offers several types of replication. Each type of replication provides a different level of data redundancy and resiliency to ensure data is always available even in the event of a disaster.

In this scenario, the requirement is to ensure that data will be accessible for both read and write operations, even if an entire data center (zonal or non-zonal) becomes unavailable. This requirement implies that the storage account must have a replication strategy that provides redundancy across different data centers.

The replication types available for Azure Data Lake Gen2 storage accounts are as follows:

  1. Locally-redundant storage (LRS): This is the cheapest replication option that provides redundancy of data within a single data center. LRS replicates data synchronously within a single storage scale unit (SSU), which is a logical container of data within a single data center. LRS provides a recovery point objective (RPO) of zero and a recovery time objective (RTO) of a few minutes.

  2. Zone-redundant storage (ZRS): This replication option provides redundancy of data across multiple availability zones (AZs) within a single region. ZRS replicates data synchronously across three availability zones within a single region, providing a high level of resiliency to protect against data center failures. ZRS provides an RPO of zero and an RTO of a few minutes.

  3. Geo-redundant storage (GRS): This replication option provides redundancy of data across multiple data centers in different regions within a single geographic area. GRS replicates data synchronously within a single region and asynchronously to a paired region, providing a high level of resiliency to protect against regional disasters. GRS provides an RPO of 15 minutes and an RTO of a few hours.

  4. Geo-zone-redundant storage (GZRS): This replication option provides redundancy of data across multiple data centers in different regions within multiple geographic areas. GZRS replicates data synchronously across three availability zones within a single region and asynchronously to a paired region, providing the highest level of resiliency to protect against data center and regional disasters. GZRS provides an RPO of 15 minutes and an RTO of a few hours.

Therefore, to meet the requirement of ensuring that data is accessible for write and read operations, even if an entire data center (zonal or non-zonal) becomes unavailable, the recommended replication type is Geo-zone-redundant storage (GZRS). While it is the most expensive option, it provides the highest level of data resiliency by replicating data across multiple data centers in different regions within multiple geographic areas. This ensures that data is always available, even in the event of a disaster.