Minimizing Costs with SQL Access to Amazon S3 for Data Lakes

The Best AWS Service for a Cost-Effective Data Lake Solution

Question

A retail organization is developing a data lake solution utilizing Amazon S3 to store a large amount of data.The solution must be accessible via SQL queries.The organization wants to minimize infrastructure costs.What AWS service should be part of their solution?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: D.

Option A is incorrect because DynamoDB is not accessible via SQL queries.

Option B is incorrect because it is not the most optimal solution.

Amazon Redshift Spectrum requires an Amazon Redshift cluster, thus requiring additional infrastructure costs.

Option C is incorrect because Amazon Aurora requires infrastructure to store the data.

Option D is CORRECT because Amazon Athena can be used to query data in S3 directly using SQL query syntax.It is also a serverless service requiring no infrastructure.

Reference:

https://docs.aws.amazon.com/athena/latest/ug/what-is.html

Based on the requirement of the retail organization to develop a data lake solution utilizing Amazon S3 to store a large amount of data and make it accessible via SQL queries while minimizing infrastructure costs, the best AWS service that meets these criteria is Amazon Athena (Option D).

Amazon Athena is a serverless interactive query service that makes it easy to analyze data directly from Amazon S3 using standard SQL. It allows users to query data in S3 without having to create and maintain any infrastructure or manage any servers, which helps to reduce costs. Athena is highly scalable, automatically executing queries in parallel, and is built on Presto, an open-source distributed SQL engine.

Option A, Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability, but it is not suitable for this scenario because the organization requires an SQL query interface. DynamoDB is designed for document and key-value data models, and its query language is not based on SQL.

Option B, Amazon Redshift Spectrum allows users to run queries against data stored in Amazon S3 using SQL, but it requires a Redshift cluster to be set up and maintained, which can be expensive.

Option C, Amazon Aurora is a relational database engine that is compatible with MySQL and PostgreSQL, but it requires infrastructure and management, and it may not be the best fit for a data lake solution that is focused on storing and analyzing large amounts of unstructured data.

Therefore, Amazon Athena (Option D) is the most appropriate and cost-effective solution for the retail organization's requirement.