Mitigating Latency Impact on Users: Best Practices for Cloud DevOps Engineers

Understanding and Resolving Latency Issues for User Login

Question

You support a web application that is hosted on Compute Engine.

The application provides a booking service for thousands of users.

Shortly after the release of a new feature, your monitoring dashboard shows that all users are experiencing latency at login.

You want to mitigate the impact of the incident on the users of your service.

What should you do first?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

C.

In this scenario, the web application hosted on Compute Engine is experiencing latency at login for all users after a new feature was released. The first step in mitigating the impact of this incident on users is to identify the root cause of the problem.

Option B, reviewing the Stackdriver monitoring, is the most appropriate initial step to take in identifying the cause of the issue. Stackdriver is a monitoring, logging, and diagnostics tool that is integrated with Compute Engine. It provides real-time visibility into the performance and health of your applications, infrastructure, and services. By reviewing the Stackdriver monitoring, you can quickly identify any spikes in CPU usage, network traffic, or errors that could be causing the latency issue.

Once you have identified the root cause of the issue, you can then take appropriate action to address it. In some cases, the issue may be related to the recent release of the new feature. In such a case, option A, rolling back the recent release, would be an appropriate action to take. This would revert the application back to the previous version, which may resolve the latency issue.

Option C, upsizing the virtual machines running the login services, may be an appropriate action to take if the latency issue is related to resource constraints on the virtual machines. However, this should only be done after identifying that this is indeed the cause of the issue, as simply increasing the size of the virtual machines may not necessarily resolve the issue.

Option D, deploying a new release to see whether it fixes the problem, should be avoided until the root cause of the issue has been identified and addressed. Deploying a new release may introduce additional issues or exacerbate the existing issue, which could further impact the users of the service.

In summary, the first step in mitigating the impact of an incident such as latency at login for all users of a web application hosted on Compute Engine is to identify the root cause of the issue. This can be achieved by reviewing the Stackdriver monitoring. Once the root cause has been identified, appropriate action can be taken to address the issue, which may include rolling back the recent release, upsizing the virtual machines running the login services, or deploying a new release.