AWS Data Lake Solution for Retail Organizations | Querying and Reporting on Amazon S3 Data | Exam DBS-C01

AWS Services for Data Lake Solution and Reporting on Amazon S3 Data

Question

A retail organization is developing a data lake solution utilizing Amazon S3 to store a large amount of data.

They would like to be able to perform data exploration and discovery activities by running SQL queries on the data.Based on the output of those activities, they would like to produce complex reports accessible to a large number of users via BI applications.What AWS services should be part of their solution (SELECT TWO)?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Answer: A and D.

Option A is CORRECT because Amazon Redshift Spectrum can be used to query data from files in Amazon S3 without loading the data into Amazon Redshift tables.

Redshift Spectrum compute-intensive queries employ massive parallelism to execute very fast against large datasets.

Option B is incorrect because this is not the optimal solution as it requires custom code development and deployment using AWS Lambda.

Option C is incorrect because AWS Glue is an ETL service used to categorize and transform data.It cannot be used for querying data.

Option D is CORRECT because Amazon Athena can be used to perform ad-hoc queries on data in S3 directly using SQL syntax.

Option E is incorrect because Amazon QuickSight is a business analytics service used to build visualizations and business insights reports.

It is not used for data exploration activities.

Reference:

https://docs.aws.amazon.com/athena/latest/ug/when-should-i-use-ate.html https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html

To build a data lake solution on Amazon S3, organizations can perform data exploration and discovery activities by running SQL queries on the data using various AWS services. Additionally, they can produce complex reports accessible to a large number of users via BI applications.

The two AWS services that can be a part of their solution are:

  1. Amazon Athena for Data Discovery Activities: Amazon Athena is a serverless interactive query service that allows querying data stored in S3 using standard SQL. Athena allows users to run ad-hoc queries to perform data exploration and discovery activities without requiring any infrastructure setup. Athena is fully integrated with AWS Glue, which allows users to automatically discover the schema of data stored in S3 and create corresponding table definitions. This enables users to query data without having to create and manage any infrastructure.

  2. Amazon Redshift Spectrum for Complex Reporting: Amazon Redshift Spectrum is a serverless feature of Amazon Redshift that allows querying data stored in S3 using standard SQL. Redshift Spectrum allows users to run complex queries that combine data from Redshift clusters and S3. Redshift Spectrum can be used to produce complex reports that can be accessed by a large number of users via BI applications. Redshift Spectrum supports popular BI tools like Tableau, Power BI, and QuickSight.

Hence, the correct answers are A. Amazon Redshift Spectrum for the complex reporting and D. Amazon Athena for the data discovery activities. Option C. AWS Glue can also be used for data discovery activities, but it is not necessary in this scenario. Option B. Amazon Lambda is a serverless computing platform and is not relevant in this scenario. Option E. Amazon QuickSight is a BI tool and can be used to access reports but is not necessary to be used in this scenario.