AWS Data Lake Solution with Amazon S3 | Data Exploration and Reporting | DBS-C01 Exam

AWS Data Lake Solution

Question

A retail organization is developing a data lake solution utilizing Amazon S3 to store a large amount of data.

They would like to be able to perform data exploration and discovery activities by running SQL queries on the data.Based on the output of those activities, they would like to produce complex reports accessible to a large number of users via BI applications.What AWS services should be part of their solution (SELECT TWO)?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Answer: A and D.

Option A is CORRECT because Amazon Redshift Spectrum can be used to query data from files in Amazon S3 without loading the data into Amazon Redshift tables.

Redshift Spectrum compute-intensive queries employ massive parallelism to execute very fast against large datasets.

Option B is incorrect because this is not the optimal solution as it requires custom code development and deployment using AWS Lambda.

Option C is incorrect because AWS Glue is an ETL service used to categorize and transform data.It cannot be used for querying data.

Option D is CORRECT because Amazon Athena can be used to perform ad-hoc queries on data in S3 directly using SQL syntax.

Option E is incorrect because Amazon QuickSight is a business analytics service used to build visualizations and business insights reports.

It is not used for data exploration activities.

Reference:

https://docs.aws.amazon.com/athena/latest/ug/when-should-i-use-ate.html https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html

The retail organization wants to develop a data lake solution utilizing Amazon S3 to store a large amount of data and perform data exploration and discovery activities by running SQL queries on the data. They would also like to produce complex reports accessible to a large number of users via BI applications. To achieve this goal, they need to choose the right set of AWS services that can meet their requirements.

Here are the two AWS services that should be part of their solution:

  1. Amazon Athena for Data Discovery Activities: Amazon Athena is an interactive query service that allows users to easily analyze data stored in Amazon S3 using standard SQL. Athena can handle large-scale data sets and enables users to quickly search, filter, and analyze data without the need for any infrastructure setup. With Athena, the retail organization can easily explore and discover data using SQL queries on the data stored in Amazon S3. Athena is a great choice for data discovery activities as it can quickly provide insights into the data by allowing users to query it directly without any ETL process.

  2. Amazon Redshift Spectrum for Complex Reporting: Amazon Redshift is a fully managed, petabyte-scale data warehouse service that can be used to store and analyze large amounts of structured and semi-structured data. Amazon Redshift Spectrum is a feature of Amazon Redshift that allows users to extend their queries to data stored in Amazon S3. This means that users can query data in Amazon S3 using the same SQL syntax as they do for data stored in Redshift. By using Redshift Spectrum, the retail organization can easily run complex queries on the data stored in Amazon S3 and produce complex reports accessible to a large number of users via BI applications.

While AWS Glue can be used for data discovery activities, it is primarily used for ETL (extract, transform, and load) processes, which may not be necessary for the retail organization's use case. Amazon QuickSight is a BI service that can be used to visualize data, but it is not well suited for data discovery activities. Amazon Lambda is a compute service and not a good fit for this use case.

In summary, the retail organization should use Amazon Athena for data discovery activities and Amazon Redshift Spectrum for complex reporting.