AWS Certified Machine Learning - Specialty Exam: Handling Outliers in SageMaker Image Classification

Handling Outliers in SageMaker Image Classification

Question

You work as a machine learning specialist for the highway toll collection division of the regional state area.

The toll collection division uses cameras to identify car license plates as the cars pass through the various toll gates on the state highways.

You are on the team that is using SageMaker Image Classification machine learning to read and classify license plates by state and then identify the actual license plate number. Very rarely, cars pass through the toll gates with plates from foreign countries, for example, Great Britain or Mexico.

The outliers must not adversely affect your model's predictions. Which hyperparameter should you set, and to what value, to ensure these outliers do not adversely impact your model?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Answer: E.

Option A is incorrect.

The feature_dim hyperparameter is a setting on the K-Means and K-Nearest Neighbors algorithms, not the Image Classification algorithm.

Option B is incorrect.

The feature_dim hyperparameter is a setting on the K-Means and K-Nearest Neighbors algorithms, not the Image Classification algorithm.

Option C is incorrect.

The sample_size hyperparameter is a setting on the K-Nearest Neighbors algorithm, not the Image Classification algorithm.

Option D is incorrect.

The sample_size hyperparameter is a setting on the K-Nearest Neighbors algorithm, not the Image Classification algorithm.

Option E is correct.

The learning_rate hyperparameter governs how quickly the model adapts to new or changing data.

Valid values range from 0.0 to 1.0

Setting this hyperparameter to a low value, such as 0.1, will make the model learn more slowly and be less sensitive to outliers.

This is what you want.

You want your model not to be adversely impacted by outlier data.

Option F is incorrect.

The learning_rate hyperparameter governs how quickly the model adapts to new or changing data.

Valid values range from 0.0 to 1.0

Setting this hyperparameter to a high value, such as 0.75, will make the model learn more quickly but be sensitive to outliers.

This is not what you want.

You want your model not to be adversely impacted by outlier data.

Reference:

Please see the Amazon SageMaker developer guide titled Image Classification Hyperparameters, and the Amazon SageMaker developer guide titled Use Amazon SageMaker Built-in Algorithms.

The hyperparameter that should be set to ensure that outliers do not adversely impact the model's predictions is the sample_size hyperparameter. The value for this hyperparameter should be set to 100.

Explanation:

The sample_size hyperparameter controls the number of data points that are used for each batch during model training. By increasing the sample size, the model has a better chance of capturing the underlying patterns and features of the data, which helps to reduce the impact of outliers. In other words, increasing the sample_size helps the model to generalize better, making it less sensitive to outliers.

On the other hand, the feature_dim hyperparameter specifies the number of features or input dimensions in the dataset. This hyperparameter does not directly impact the model's sensitivity to outliers.

Finally, the learning_rate hyperparameter controls how much the model weights are updated during training. This hyperparameter does not have a direct impact on the model's sensitivity to outliers.

Therefore, the correct answer is D. sample_size set to 100.