Build an ML Pipeline for Regression Model Comparison | Azure ML Designer

Compare Boosted Decision Tree and Decision Forest Algorithms for Car Price Prediction

Question

Your task is to build an ML pipeline for training a regression model to predict a car's price based on its technical features.

Since you can't decide in advance which ML algorithm to use, you decide to train two regression algorithms (Boosted Decision Tree and Decision Forest) in parallel and compare their performance in the simplest way, so that execution requires the least amount of time.

You are working with Azure ML Designer.

Which of the following Designer modules do you need to duplicate in the pipeline because of the comparison of two algorithms? Get data (Import, Dataset) Select Columns in Dataset Split Data Clean Missing Data Train Model Evaluate Model Score Model.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: D.

Option A is incorrect because getting the data and the data preparation process is the same, regardless of the number of algorithms used.

Train Model is correct.

Option B is incorrect becausegetting the data and the data preparation process is the same, regardless of the number of algorithms used.

Score Model is correct.

Option C is incorrect because splitting the data prepares the same datasets for both algorithms, therefore it doesn't need to be duplicated.

Evaluate Model compares the performance metrics of the two algorithms.

Train Model is correct.

Option D is CORRECT because the data preparation process - up to splitting the data -is the same for the two algorithms.

Only two steps need to be added for each of the algorithms: they need to be trained (Train Model) and scored (Score Model) separately.

The Evaluation step compares the outputs of the two Scorings.

Diagram:

®& Clean Missing Data iv}
Remove missing value rows

Split Data o

Split the data into training set(0.7) and. , Decision Forest Regression @
&, Boosted Decision Tree Regre..@

Train Model i) > Train Model °o
j Z I~
¥ M M
Score Model °o Score Model °o
Use the test set to get the predicted pri Use the test set to get the predicted pr.

¥ ¥

Evaluate Model °

Reference:

The correct answer is C. Split Data, Train Model, Evaluate Model.

When building an ML pipeline to train a regression model to predict a car's price based on its technical features, the first step is to import the data. This can be done using the Get Data module, which allows you to import data from various sources including files, URLs, and Azure data services. Once the data is imported, you will need to select the relevant columns using the Select Columns in Dataset module. This step is important as it allows you to filter out unnecessary columns and keep only those that are relevant to the task at hand.

After selecting the relevant columns, the next step is to split the data into training and testing sets. This can be done using the Split Data module, which randomly splits the data into two sets. The training set is used to train the model, while the testing set is used to evaluate the model's performance.

Once the data is split, the next step is to train the model. In this case, you have decided to train two regression algorithms (Boosted Decision Tree and Decision Forest) in parallel. This can be done using the Train Model module, which allows you to train different machine learning algorithms on your data. You will need to duplicate this module and configure it to train both the Boosted Decision Tree and Decision Forest algorithms.

After training the models, the next step is to evaluate their performance. This can be done using the Evaluate Model module, which allows you to compare the performance of different models on the testing set. You will need to duplicate this module and configure it to evaluate both the Boosted Decision Tree and Decision Forest models.

Finally, once the models have been evaluated, you can score new data using the Score Model module. This module takes new data as input and produces predictions based on the trained models. However, since the question asks for the simplest way to compare the performance of the two models, scoring new data is not necessary and can be skipped.

Therefore, the correct answer is C. Split Data, Train Model, Evaluate Model. This pipeline allows you to train two regression algorithms in parallel and compare their performance on the testing set in the simplest way possible.