Benefits of Using AWS Athena for Querying User-Activity Data on Amazon EMR

Boosting Performance and Reducing Costs | Simplifying Management Complexity

Prev Question Next Question

Question

A company has used Amazon Elastic MapReduce (Amazon EMR) clusters to capture data about user actions and push it to Amazon Simple Storage Service (S3)

The database grows up to 50GB per day.

Then it uses Apache Hive for querying user-activity data.

However, the DevOps lead is unsatisfied with its performance, cost, and management complexity.

You have proposed to use AWS Athena to query the data instead.

Which benefits does this new solution bring? (Select TWO.)

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Correct Answer - B, D.

Amazon Athena is a query service that uses standard SQL to analyze data in Amazon S3.

Check https://aws.amazon.com/athena/ for its various features.

Option A is incorrect because, with Amazon Athena, you only pay for the queries.

And you are charged based on the amount of data scanned by each query.

It does not charge every hour.

Option B is CORRECT because Amazon Athena is totally serverless, meaning that users only need to consider how to query the data without managing the infrastructure.

Option C is incorrect because various standard data formats are supported, including CSV, JSON, ORC, Avro, and Parquet.

Option D is CORRECT because Amazon Athena can automatically allocate resources for queries.

As a result, performance has been improved if compared with traditional solutions.

Option E is incorrect because the source data should be located in S3 only.

AWS RDS, EFS and Glacier are not valid.

Check out the Amazon Athena workflow as below.

RDS Option groups Add option

Add Option

Option details

Option group name

oracleoptiongroup

Option

Name of Option you want to add to this group

S$3_INTEGRATION v

Version
Choose the version of option software you want to install

1.0 v

Apply Immediately Info
Yes

© No

Cancel Add Option

The company has been using Amazon EMR clusters to collect data from user actions, which is stored in Amazon S3. The database grows by 50GB per day, and Apache Hive is used for querying user-activity data. However, the DevOps lead is not satisfied with the performance, cost, and management complexity of the current solution.

You have proposed using AWS Athena to query the data instead, which offers the following benefits:

A. Cost-effective: AWS Athena charges per query and every hour, making it a cost-effective option. Unlike Amazon EMR, which requires upfront infrastructure costs, AWS Athena offers a pay-as-you-go model, where you only pay for the queries you run.

B. Serverless: AWS Athena is a serverless service that does not require any infrastructure setup or compute capability considerations. You can focus on querying data and analyzing results without worrying about underlying infrastructure.

C. Support for standard SQL and various formats: Amazon Athena provides support for standard SQL, which can be used to query data in S3. Additionally, it supports various file formats, such as CSV, JSON, and Parquet, making it easy to query data in multiple formats.

D. Parallel execution of queries: AWS Athena can automatically execute queries in parallel, improving query performance and reducing response times. Most query results come back within seconds, making it easy to analyze data and make informed decisions quickly.

E. Support for various data sources: In addition to S3, AWS Athena can query data from various data sources, such as AWS RDS, Amazon EFS, and Amazon Glacier. This makes it easy to query data from different sources and analyze it all in one place.

In summary, AWS Athena is a cost-effective, serverless solution that supports standard SQL and various file formats. It can execute queries in parallel, making it a fast and efficient way to analyze large datasets. Additionally, it supports various data sources, making it easy to analyze data from different sources in one place.