HipLocal Case Study - Data Preparation for Data Science Team

Preparing Data for HipLocal's Data Science Team

Question

Case study - This is a case study.

Case studies are not timed separately.

You can use as much exam time as you would like to complete each case.

However, there may be additional case studies and sections on this exam.

You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.

To answer the questions included in a case study, you will need to reference information that is provided in the case study.

Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study.

Each question is independent of the other questions in this case study.

At the end of this case study, a review screen will appear.

This screen allows you to review your answers and to make changes before you move to the next section of the exam.

After you begin a new section, you cannot return to this section.

To start the case study - To display the first question in this case study, click the Next button.

Use the buttons in the left pane to explore the content of the case study before you answer the questions.

Clicking these buttons displays information such as business requirements, existing environment, and problem statements.

If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs.

When you are ready to answer a question, click the Question button to return to the question.

Company Overview - HipLocal is a community application designed to facilitate communication between people in close proximity.

It is used for event planning and organizing sporting events, and for businesses to connect with their local communities.

HipLocal launched recently in a few neighborhoods in Dallas and is rapidly growing into a global phenomenon.

Its unique style of hyper-local community communication and business outreach is in demand around the world.

Executive Statement - We are the number one local community app; it's time to take our local community services global.

Our venture capital investors want to see rapid growth and the same great experience for new local and virtual communities that come online, whether their members are 10 or 10000 miles away from each other.

Solution Concept - HipLocal wants to expand their existing service, with updated functionality, in new regions to better serve their global customers.

They want to hire and train a new team to support these regions in their time zones.

They will need to ensure that the application scales smoothly and provides clear uptime data.

Existing Technical Environment - HipLocal's environment is a mix of on-premises hardware and infrastructure running in Google Cloud Platform.

The HipLocal team understands their application well, but has limited experience in global scale applications.

Their existing technical environment is as follows: " Existing APIs run on Compute Engine virtual machine instances hosted in GCP.

" State is stored in a single instance MySQL database in GCP.

" Data is exported to an on-premises Teradata/Vertica data warehouse.

" Data analytics is performed in an on-premises Hadoop environment.

" The application has no logging.

" There are basic indicators of uptime; alerts are frequently fired when the APIs are unresponsive.

Business Requirements - HipLocal's investors want to expand their footprint and support the increase in demand they are seeing.

Their requirements are: " Expand availability of the application to new regions.

" Increase the number of concurrent users that can be supported.

" Ensure a consistent experience for users when they travel to different regions.

" Obtain user activity metrics to better understand how to monetize their product.

" Ensure compliance with regulations in the new regions (for example, GDPR)

" Reduce infrastructure management time and cost.

" Adopt the Google-recommended practices for cloud computing.

Technical Requirements - " The application and backend must provide usage metrics and monitoring.

" APIs require strong authentication and authorization.

" Logging must be increased, and data should be stored in a cloud analytics platform.

" Move to serverless architecture to facilitate elastic scaling.

" Provide authorized access to internal apps in a secure manner.

HipLocal's data science team wants to analyze user reviews.

How should they prepare the data?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

D.

The HipLocal data science team wants to analyze user reviews. The team should prepare the data by using the Cloud Natural Language Processing API for de-identification of the review dataset.

De-identification is the process of removing or obfuscating personally identifiable information (PII) from a dataset, so that individuals cannot be identified. In this case, the data science team needs to analyze user reviews, which could potentially contain PII such as names, email addresses, or other personal information. To protect the privacy of the users, the data should be de-identified before it is analyzed.

The Cloud Natural Language Processing API provides various features such as entity recognition, sentiment analysis, and syntax analysis. The API also includes a feature for de-identification, which can be used to remove PII from text data. This feature uses machine learning models to identify and mask or replace PII in the text.

In contrast, the Cloud Data Loss Prevention API is designed to identify and redact sensitive data, such as credit card numbers, social security numbers, or other types of PII. It is not intended for general de-identification of text data.

Therefore, the correct answer is D. The data science team should use the Cloud Natural Language Processing API for de-identification of the review dataset.