Cleaning Data from Missing Values Using ML Designer in Azure

Cleaning Data from Missing Values

Question

Your company is operating a fleet of IoT devices used to collect several environmental parameters at many locations.

They produce a huge amount of data but, for some reasons, the incoming data is regularly “contaminated” with missing values in different numerical columns.

Your task is to clean data from missing values by using a predefined transformation in the ML Designer.

What is the recommended practice to achieve the goal?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: A.

Option A is CORRECT because the most convenient way is writing cleansing rules once and using them many times in Designer pipelines.

Clean missing value rules can be defined and saved by the Clean Missing Data module, then can be re-used by the Apply Transformations module.

Saved transformations can be applied for datasets with the same schema.

Saved transformations appear in the Designer as drop modules.

Option B is incorrect because the Clean Missing Data module is used to define a cleansing transformation (e.g.

by setting a custom substitution rule), which then can be saved for future re-use.

This module cannot use saved transformations.

Option C is incorrect because you cannot select the columns to which the transformation to be applied when using a saved transformation.

Transformation applies exactly for the columns defined earlier.

Option D is incorrect because you cannot select the columns to which the transformation to be applied when using a saved transformation.

Transformation applies exactly for the columns defined earlier.

Reference:

The recommended practice to clean the data from missing values in the given scenario is option C: In the Clean Missing Data module; set the Custom substitution value to the saved cleaning transformation; select the columns to be cleaned.

The Clean Missing Data module is used to handle missing values in the dataset by filling them with substitute values or removing them entirely. In this case, the data is contaminated with missing values, so this module will be used to clean the data.

Option C suggests setting the custom substitution value to the saved cleaning transformation and selecting the columns to be cleaned. This means that a predefined transformation is created to handle the missing values, which is saved and then used in the Clean Missing Data module as a custom substitution value.

Selecting the columns to be cleaned is important because it is possible that not all columns have missing values, and it would be a waste of resources to apply the transformation to all columns. By selecting only the columns with missing values, the cleaning process becomes more efficient.

Option A suggests dropping a saved transformation as a module from Transforms list and connecting it to the Apply Transformations module. This option does not specify what the transformation does or how it handles missing values. Therefore, it is not recommended in this scenario because it does not explicitly address the issue of missing values.

Option B suggests setting the custom substitution value to the saved cleaning transformation in the Clean Missing Data module. However, this option does not specify how the transformation is created or how to select the columns to be cleaned. Therefore, it is less specific than option C and may not fully address the issue of missing values.

Option D suggests dropping a saved transformation from Transforms list, connecting it to the Apply Transformations module, and selecting the columns to be cleaned. This option is similar to option A but includes the step of selecting the columns to be cleaned. However, like option A, it does not specify what the transformation does or how it handles missing values. Therefore, it is not recommended in this scenario.