Troubleshooting Linux Kernel Module Failure on Google Compute Engine (GCE) | PCA Exam

Collecting Details on Batch Server Failure | PCA Exam

Question

Your development team has installed a new Linux kernel module on the batch servers in Google Compute Engine (GCE) virtual machines (VMs) to speed up the nightly batch process.

Two days after the installation, 50% of the batch servers failed the nightly batch run.

You want to collect details on the failure to pass back to the development team.

Which three actions should you take? (Choose three.)

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

ACE.

The correct options are A, C, and E.

A. Use Stackdriver Logging to search for the module log entries: This option will help you search for specific log entries related to the installation of the Linux kernel module. You can use the search feature in Stackdriver Logging to filter the logs based on time and module-related keywords to identify any potential issues that occurred during the installation. This will give you a better understanding of the cause of the failure and help you troubleshoot the issue with the development team.

C. Use gcloud or Cloud Console to connect to the serial console and observe the logs: Using the gcloud command-line tool or the Cloud Console, you can connect to the serial console of the VMs to access the system logs. By observing the system logs, you can identify any potential kernel module-related errors that may have occurred during the nightly batch run. This option will give you real-time access to the system logs and allow you to debug the issue as it occurs.

E. Adjust the Google Stackdriver timeline to match the failure time, and observe the batch server metrics: By adjusting the Stackdriver timeline to match the time when the nightly batch run failed, you can observe the batch server metrics during the failure. This will give you an understanding of any resource-related issues that may have occurred, such as high CPU utilization or memory exhaustion. By analyzing the metrics, you can identify any potential performance bottlenecks that may have caused the failure.

B. Read the debug GCE Activity log using the API or Cloud Console: This option will give you access to the GCE Activity log, which logs all events related to VMs in GCE. However, this log may not provide you with enough detail to identify the cause of the failure. While it's still useful to check for any potential live migration events or other issues, it's not as effective as the other options in identifying kernel module-related issues.

D. Identify whether a live migration event of the failed server occurred, using in the activity log: This option is useful in identifying whether any live migration events occurred during the nightly batch run. However, it may not provide you with enough detail to identify the root cause of the failure. Additionally, live migration events are rare and usually occur only in specific circumstances, so it's unlikely to be the cause of the failure.

F. Export a debug VM into an image, and run the image on a local server where kernel log messages will be displayed on the native screen: This option involves exporting a VM into an image and running it on a local server to access the kernel log messages. While it's possible to use this method to access the logs, it's time-consuming and not an efficient way to troubleshoot the issue. Additionally, it may not be possible to reproduce the exact conditions that led to the failure on a local server, making this option less effective than the other options listed.