Log File Streaming and Storage for Analysis | AWS Certified Big Data - Specialty Exam

Data Store for Log Files from EC2 Instances

Question

A company wants to have a data store for their log files from various EC2 Instances.

These log files need to be streamed from the various servers and then stored for analysis at a later stage.

Which of the following can be used for this requirement?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - C.

One example is given in the AWS Blog site.

The architecture is given below.

Remember you can connect your Kinesis stream to Firehose to directly store the data into S3.

Options A and B are incorrect since here AWS Kinesis is better for ingestion of data.

Option D is incorrect since S3 would be better for storage of log files.

For more information on this use case scenario, please refer to the below URL.

https://aws.amazon.com/blogs/big-data/persist-streaming-data-to-amazon-s3-using-amazon-kinesis-firehose-and-aws-lambda/
Site data au Site data

> ——»
Streams
Spark on
EMR
Site
data Processed
data
Checkpointer site data
>

DynamoDB Amazon Kinesis/ 53

$3 application
(KCL)

The best solution for the given requirement would be to ingest the data using AWS Kinesis and then store the data in S3. This is because Kinesis is a real-time data streaming service, which is designed to handle large amounts of data, whereas SQS is a message queuing service, which is better suited for messaging between distributed components of cloud applications. Also, storing the log data in S3 makes it easier to access the data for analysis later on.

Therefore, option C is the correct answer.

Option A is not the best solution because SQS is a messaging service and does not provide storage capabilities. DynamoDB is a NoSQL database and can be used for storing structured data, but it may not be the best choice for storing large amounts of unstructured log data.

Option B is a possible solution, but it may not be the most efficient one because S3 is not optimized for real-time data streaming. However, it is a reliable and durable storage service and can be used for storing log data.

Option D is not the best solution because DynamoDB may not be the best choice for storing unstructured log data. It is more suitable for storing structured data that requires low latency access.