Google Cloud | Detecting Anomalies in Company Data | Best Practices

Detecting Anomalies in Company Data

Question

Your company has multiple on-premises systems that serve as sources for reporting.

The data has not been maintained well and has become degraded over time.

You want to use Google-recommended practices to detect anomalies in your company data.

What should you do?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

B.

To detect anomalies in your company's degraded data, you should use Google-recommended practices by uploading your files into Cloud Storage and using Cloud Dataprep to explore and clean your data. Therefore, the correct answer is B.

Here is a detailed explanation of why:

  1. Upload your files into Cloud Storage: Google Cloud Storage is a durable and highly available object storage service that allows you to store and access your data from anywhere in the world. You can upload your files into Cloud Storage and create a bucket to hold the data.

  2. Use Cloud Dataprep to explore and clean your data: Cloud Dataprep is a data preparation service that provides a visual interface for cleaning, transforming, and enriching data. It has built-in features that allow you to detect anomalies in your data, such as missing or inconsistent values, duplicates, and outliers. You can use Cloud Dataprep to create a dataflow that reads data from Cloud Storage, cleans the data, and writes the results back to Cloud Storage.

Cloud Dataprep has several advantages over Cloud Datalab:

  • Cloud Dataprep is a purpose-built service for data preparation, whereas Cloud Datalab is a notebook environment for data exploration and analysis.
  • Cloud Dataprep has a visual interface that allows you to interact with your data, whereas Cloud Datalab requires you to write code to manipulate your data.
  • Cloud Dataprep has built-in data profiling and anomaly detection features, whereas Cloud Datalab requires you to write custom code or use external libraries to perform these tasks.
  1. Why not use Cloud Datalab? Cloud Datalab is a powerful tool for exploring, analyzing, and visualizing data. However, it is not the best choice for cleaning and preparing degraded data, as it requires more manual effort and custom coding than Cloud Dataprep.

  2. Why not connect Cloud Datalab or Cloud Dataprep to on-premises systems? While it is possible to connect Cloud Datalab or Cloud Dataprep to on-premises systems, it requires additional setup and configuration, such as setting up a VPN or configuring firewalls. It is simpler and more efficient to upload the data to Cloud Storage and process it from there.

In conclusion, uploading your files into Cloud Storage and using Cloud Dataprep to explore and clean your data is the recommended approach to detect anomalies in your company data using Google-recommended practices.