Expand Your Local Community App Globally - HipLocal Case Study

Troubleshooting API Failures and Collecting Metrics | PCD Exam Answer

Question

Case Study - Company Overview - HipLocal is a community application designed to facilitate communication between people in close proximity.

It is used for event planning and organizing sporting events, and for businesses to connect with their local communities.

HipLocal launched recently in a few neighborhoods in Dallas and is rapidly growing into a global phenomenon.

Its unique style of hyper-local community communication and business outreach is in demand around the world.

Executive Statement - We are the number one local community app; it's time to take our local community services global.

Our venture capital investors want to see rapid growth and the same great experience for new local and virtual communities that come online, whether their members are 10 or 10000 miles away from each other.

Solution Concept - HipLocal wants to expand their existing service, with updated functionality, in new regions to better serve their global customers.

They want to hire and train a new team to support these regions in their time zones.

They will need to ensure that the application scales smoothly and provides clear uptime data.

Existing Technical Environment - HipLocal's environment is a mix of on-premises hardware and infrastructure running in Google Cloud Platform.

The HipLocal team understands their application well, but has limited experience in global scale applications.

Their existing technical environment is as follows: " Existing APIs run on Compute Engine virtual machine instances hosted in GCP.

" State is stored in a single instance MySQL database in GCP.

" Data is exported to an on-premises Teradata/Vertica data warehouse.

" Data analytics is performed in an on-premises Hadoop environment.

" The application has no logging.

" There are basic indicators of uptime; alerts are frequently fired when the APIs are unresponsive.

Business Requirements - HipLocal's investors want to expand their footprint and support the increase in demand they are seeing.

Their requirements are: " Expand availability of the application to new regions.

" Increase the number of concurrent users that can be supported.

" Ensure a consistent experience for users when they travel to different regions.

" Obtain user activity metrics to better understand how to monetize their product.

" Ensure compliance with regulations in the new regions (for example, GDPR)

" Reduce infrastructure management time and cost.

" Adopt the Google-recommended practices for cloud computing.

Technical Requirements - " The application and backend must provide usage metrics and monitoring.

" APIs require strong authentication and authorization.

" Logging must be increased, and data should be stored in a cloud analytics platform.

" Move to serverless architecture to facilitate elastic scaling.

" Provide authorized access to internal apps in a secure manner.

HipLocal's APIs are showing occasional failures, but they cannot find a pattern.

They want to collect some metrics to help them troubleshoot.

What should they do?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

C.

Option C, Install the Stackdriver Monitoring agent on the VMs, is the most appropriate solution to collect metrics to troubleshoot the API failures in HipLocal's environment.

The Stackdriver Monitoring agent is a tool provided by Google Cloud Platform (GCP) that allows collecting metrics and data from different sources in an application, such as VMs, containers, and applications running on GCP. By installing the Stackdriver Monitoring agent on the VMs hosting HipLocal's APIs, HipLocal can monitor the performance and health of its APIs, and gather data that can help them identify the root cause of the occasional API failures.

Option A, taking frequent snapshots of all VMs, is not an appropriate solution for collecting metrics to troubleshoot API failures. VM snapshots capture a point-in-time image of a VM's state and are useful for creating backups or restoring VMs to a previous state. However, they do not provide real-time performance and health data, which is needed for troubleshooting API failures.

Option B, installing the Stackdriver Logging agent on the VMs, is not the most appropriate solution for collecting metrics to troubleshoot API failures, as the Stackdriver Logging agent is used for collecting logs from applications running on GCP. While logs are useful for debugging and troubleshooting, they may not provide sufficient performance data to identify the root cause of the API failures.

Option D, using Stackdriver Trace to look for performance bottlenecks, is a valid approach for identifying performance bottlenecks in an application. However, it may not be the most appropriate solution for collecting metrics to troubleshoot API failures, as Stackdriver Trace is designed to trace the execution of individual requests and provide insights into their latency and performance, rather than collecting real-time performance and health data.