Automating Repair of Unhealthy System Status Check | AWS DevOps Exam

Automating Repair of Unhealthy System Status Check

Prev Question Next Question

Question

One of your EC2 instance is reporting an unhealthy system status check.

However, this is not something you should have to monitor and repair on your own.

How might you automate the repair of the system status check failure in an AWS environment? Choose the correct answer from the options given below.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - A.

Using Amazon CloudWatch alarm actions, you can create alarms that automatically stop, terminate, reboot, or recover your EC2 instances.

You can use the stop or terminate actions to help you save money when you no longer need an instance to be running.

You can use the reboot and recover actions to automatically reboot those instances or recover them onto new hardware if a system impairment occurs.

For more information on using alarm actions, please refer to the below link:

http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html

The correct answer is A. Create CloudWatch alarms for StatuscheckFailed_System metrics and select EC2 action-Recover the EC2 instance.

Explanation:

Amazon EC2 instances undergo system status checks and instance status checks. System status checks verify that the instance's underlying host system is healthy and capable of providing service. On the other hand, instance status checks verify that the instance is running and functioning as intended.

AWS recommends that you use Amazon CloudWatch to monitor your EC2 instances' status checks. CloudWatch can be configured to send alerts if a specific metric crosses a threshold that you define.

To automate the repair of the system status check failure in an AWS environment, you can create CloudWatch alarms for StatusCheckFailed_System metrics and choose the EC2 action to recover the instance.

If a system status check fails, the instance is rebooted, and the instance's physical host is changed. This often resolves issues with the underlying host system and improves overall system health.

Option B is incorrect because it is time-consuming and manual. Writing a script that queries the EC2 API for each EC2 instance status check does not ensure automation, and it is not scalable in a dynamic environment where instances may change regularly.

Option C is also incorrect because it is risky and could lead to data loss. Shutting down and starting EC2 instances periodically can cause applications to fail or lose data if they were not correctly configured for a graceful shutdown.

Option D is incorrect because it does not provide a specific solution to automating the repair of the system status check failure. A third-party monitoring tool may be used in conjunction with CloudWatch to enhance the overall monitoring capabilities of your environment, but it is not a direct solution to automate the repair of system status check failures.