AWS Certified Big Data - Specialty Exam: Streaming Log Files from NGINX and Apache Web Servers

Stream Log Files in Real-Time | AWS Certified Big Data - Specialty

Question

A company has a set of EC2 Instances that host web applications.

The web servers used are NGINX and Apache.

The IT Security team need to stream the log files from these servers and perform real time analytics from the log files to check for any abnormal behaviour.

Which of the following would be the easiest way to get the log file data and the right storage platform for the streaming data?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - C.

The AWS Documentation mentions the following.

Kinesis Agent is a stand-alone Java software application that offers an easy way to collect and send data to Kinesis Data Streams.

The agent continuously monitors a set of files and sends new data to your stream.

The agent handles file rotation, checkpointing, and retry upon failures.

It delivers all of your data in a reliable, timely, and simple manner.

It also emits Amazon CloudWatch metrics to help you better monitor and troubleshoot the streaming process.

Option A is incorrect since using the Kinesis Agent would be a more efficient tool rather than using the Lambda functions.

Options B and D are incorrect since Amazon Redshift is used as a data warehousing system.

For more information on working with agents, please refer to the below URL.

https://docs.aws.amazon.com/streams/latest/dev/writing-with-agents.html

The best solution for the given scenario would be to use AWS Kinesis to stream the log files from the EC2 instances and then perform real-time analytics on the data to check for any abnormal behavior.

Option A suggests using a Lambda function to poll the servers, which can be an effective solution, but it requires more effort to set up than using the Kinesis agent. Also, it will require constant polling of the servers, which can be resource-intensive and costly.

Option B suggests sending data to AWS Redshift, which is a data warehousing solution. This is not ideal for streaming data as it is more suited for batch processing of large volumes of data. It can also be more costly and complex to set up than using Kinesis.

Option C suggests using the AWS Kinesis Agent, which is an agent that runs on EC2 instances and enables easy collection and processing of streaming data. This option is the most efficient and cost-effective option as it can be easily installed on the EC2 instances, and it can continuously stream data to Kinesis with minimal effort.

Option D suggests sending data to Amazon Redshift, which, as mentioned before, is not suitable for streaming data.

Therefore, the correct answer is Option C, which uses the AWS Kinesis Agent to stream the log files from the EC2 instances to Kinesis. This enables real-time analytics to be performed on the data to check for any abnormal behavior.