Retrieving 1 PB of Data from Amazon S3 Glacier: Cost-Effective Steps

Retrieving Data from Amazon S3 Glacier

Question

A large engineering firm has uploaded all its project documents to Amazon S3 Glacier.

This 1 PB data needs to be audited by a team of auditors as a part of an annual IT audit.

You have been assigned a task to provide this data to auditors who plan to start working in one week. Which of the following steps can be used to retrieve this data at the lowest cost?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer -B.

To initiate an archive retrieval job, an archive ID is required which can be found from vault inventory.

Also, one of the below 3 retrieval options can be specified.

1)Expedited - For quick retrieval of data, typically in 1-5 minutes.

2)Standard - Default retrieval option, data is retrieved in 3-5 hours.

3)Bulk - With this retrieval option, retrieval is completed in 5-12 hours for a large amount of data & has the lowest cost.

In the above case, since data is required in one week, the Bulk retrieval option can be used to save cost.

Option A is incorrect as since a large amount of data is to be retrieved & the client is looking for the lowest cost.

The bulk retrieval option is a better choice than the Standard retrieval option.

Options C & D are incorrect as Job ID is returned by S3 Glacier post retrieval job is initiated, & Archive ID is required to initiate archive retrieval job.

For more information on retrieving data in Amazon S3 Glacier, refer to the following URL-

https://docs.aws.amazon.com/amazonglacier/latest/dev/downloading-an-archive-two-steps.html

The correct answer for the given scenario is option B, which is to initiate an archive retrieval job specifying archive ID and using the Bulk retrieval option.

Explanation:

Amazon S3 Glacier is a low-cost, long-term storage service designed for data archival, backup, and disaster recovery. Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are acceptable. Amazon Glacier provides three retrieval options - Standard, Expedited, and Bulk, with different retrieval times and costs.

In the given scenario, the large engineering firm has uploaded 1 PB of project documents to Amazon S3 Glacier, and the data needs to be audited by a team of auditors as part of an annual IT audit. The objective is to retrieve the data at the lowest cost.

Option A - Initiate an archive retrieval job specifying archive ID and using the Standard retrieval option: This option is not recommended as it would result in higher retrieval costs. The Standard retrieval option is designed for data that can be retrieved in a few minutes to hours. As the dataset is very large, retrieving the entire 1 PB of data using Standard retrieval can result in high retrieval costs.

Option B - Initiate an archive retrieval job specifying archive ID and using the Bulk retrieval option: This option is the most cost-effective solution for the given scenario. Bulk retrieval is optimized for large data sets and has the lowest retrieval cost among the three retrieval options. Although the retrieval time is longer than the other options, it is acceptable as the auditors plan to start working in one week.

Option C - Initiate an archive retrieval job specifying Job ID and using the Bulk retrieval option: This option is not relevant to the given scenario. The Job ID is used to retrieve multiple archives associated with a single retrieval job. It is not required in this scenario as only one archive needs to be retrieved.

Option D - Initiate an archive retrieval job specifying Job ID and using the Standard retrieval option: This option is not recommended as it would result in higher retrieval costs. As mentioned earlier, the Standard retrieval option is not optimized for large data sets and is designed for data that can be retrieved in a few minutes to hours.

In conclusion, the recommended option to retrieve the 1 PB of project documents from Amazon S3 Glacier is to initiate an archive retrieval job specifying archive ID and using the Bulk retrieval option.