Minimizing Costs with AWS Archival Storage and Messaging Services for Aerial Image Data Processing

Leveraging AWS Archival Storage and Messaging Services for Aerial Image Data Processing

Prev Question Next Question

Question

Your firm has uploaded a large amount of aerial image data to S3

In the past, in your on-premises environment, you used a dedicated group of servers to process this critical data.

You used Rabbit MQ - An open source messaging system to get job information to the servers.

Once processed, the data would go to tape and be shipped offsite.

Your manager told you to stay with the current design and leverage AWS archival storage and messaging services to minimize cost.

Which is correct?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - B.

The best option for reducing costs is Glacier, since everything was stored on tape anyway in the on-premise location.

Hence option A is out.

Next, SQS should be used, since RabbitMQ was used internally.

Hence option D is out.

The first step is to leave the objects in S3 and not tamper with that.

Hence option B is more suited.

The following diagram shows how SQS is used in a worker span environment.

For more information on SQS queues, please visit the below URL.

http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-how-it-works.html
Amazon SQS Queue Bawhizlad

hale
AWS

Request Queue

[-

<n
>

Processing Server
Respose Queue (Auto Scaling Group)

The scenario involves processing large amounts of aerial image data that were previously processed using a group of dedicated servers and RabbitMQ messaging system in an on-premises environment. The goal is to move to AWS and utilize archival storage and messaging services to minimize cost. Among the given options, option B is the most appropriate solution.

Option A is incorrect because it suggests using SQS for passing job messages, but it does not mention anything about worker instances. In addition, changing the storage class of the S3 objects to Reduced Redundancy Storage does not provide the desired level of durability and availability for the processed data. This is because Reduced Redundancy Storage is meant for non-critical, reproducible data that can be easily recreated.

Option C is incorrect because changing the storage class of the S3 objects to Reduced Redundancy Storage before processing the data could result in data loss if any of the S3 objects become corrupted or lost. Additionally, using spot instances for worker nodes may lead to processing delays or job failures, as the spot instances can be terminated if the spot price increases beyond the maximum bid price.

Option D is incorrect because SNS is a notification service, not a messaging service. It can be used to notify subscribers about events, but it cannot be used to send messages to worker instances. Additionally, terminating spot worker instances when they become idle may not be cost-effective, as the spot price can fluctuate frequently.

Option B is the correct solution as it suggests setting up Auto-Scaled workers triggered by queue depth that use spot instances to process messages in SQS. This ensures that the processing capacity can scale up or down automatically based on the number of messages in the queue. Once the data is processed, a lifecycle policy can be set up to move the object from S3 to Glacier. This ensures that the processed data is stored cost-effectively while still providing the necessary level of durability and availability.

In summary, option B is the best solution as it offers the most cost-effective and scalable approach while ensuring that the processed data is stored in a secure and durable manner.