AWS Certified Machine Learning - Specialty Exam: K-Means Hyperparameter for Scoring and Available Metric Options

K-Means Hyperparameter for Scoring and Available Metric Options

Question

You work in the machine learning department of a major retail company.

Your team is working on a model to classify customers by purchase history.

Your marketing department wants to use the results of your model predictions to determine which customers should receive a new campaign offer.

You have selected your observations and cleaned your data.

You have also split your data into training and evaluation datasets.

You are now training your k-means model in Amazon SageMaker, and you are trying to select the model hyperparameters that give your marketing team the best predictions. You have set the feature_dim hyperparameter to equal the number of features in your input data.

You have set the k hyperparameter to 10

The number of clusters you estimate is appropriate for your model.

You have set the epochs hyperparameter to 1 so that the model performs one pass over your data. You need to report a score for your model.

Which k-means hyperparameter allows you to select the metric types to report this scoring, and what are the available metric options?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: D.

Option A is incorrect.

The hyperparameter you chose to report a score for your model is the eval_metrics hyperparameter.

The eval_metrics hyperparameter has the allowed values of msd for Mean Square Error, ssd for Sum of Square Distance, and the option of both msd and ssd.

The extra_center_factor is used to control the number of clusters.

Option B is incorrect.

The hyperparameter you chose to report a score for your model is the eval_metrics hyperparameter.

The eval_metrics hyperparameter has the allowed values of msd for Mean Square Error, ssd for Sum of Square Distance, and the option of both msd and ssd.

The Amazon SageMaker k-means algorithm does not have a score_metrics hyperparameter.

Option C is incorrect.

The hyperparameter you chose to report a score for your model is the eval_metrics hyperparameter.

The eval_metrics hyperparameter has the allowed values of msd for Mean Square Error, ssd for Sum of Square Distance, and the option of both msd and ssd.

The Amazon SageMaker k-means algorithm does not have an eval_method hyperparameter.

Option D is correct.

The hyperparameter you chose to report a score for your model is the eval_metrics hyperparameter.

The eval_metrics hyperparameter has the allowed values of msd for Mean Square Error, ssd for Sum of Square Distance, and the option of both msd and ssd.

Reference:

Please see the Amazon SageMaker developer guide titled K-Means Hyperparameters.

The k-means algorithm is an unsupervised learning algorithm that aims to group data points into k clusters based on their similarities. When training a k-means model, it is essential to evaluate its performance to select the best hyperparameters that will give the best predictions.

Amazon SageMaker is a cloud-based machine learning platform that provides tools and services for building, training, and deploying machine learning models. In SageMaker, you can train a k-means model using the k-means algorithm provided in the Amazon SageMaker Python SDK.

To evaluate the performance of a k-means model, SageMaker provides several hyperparameters that allow you to select the metric types to report the scoring. These hyperparameters are:

A. extra_center_factor with msd, ssd, or [msd, ssd] as the available metric type values B. score_metrics with mse, ssd, or [mse, ssd] as the available metric type values C. eval_method with mse, ssd, or [mse, ssd] as the available metric type values D. eval_metrics with msd, ssd, or [msd, ssd] as the available metric type values.

Of these hyperparameters, the eval_metrics hyperparameter allows you to select the metric types to report the scoring. The available metric type values for the eval_metrics hyperparameter are msd, ssd, or [msd, ssd].

MSD (Mean Square Distance) is the average of the squared distances between each data point and its assigned centroid. SSD (Sum of Squared Distances) is the sum of the squared distances between each data point and its assigned centroid.

Therefore, the answer to the question is D. eval_metrics with msd, ssd, or [msd, ssd] as the available metric type values. This hyperparameter allows you to select the metric types to report the scoring, and the available metric type values are msd, ssd, or [msd, ssd].