AWS Certified Machine Learning - Specialty Exam: Building a Model for Breast Mass Image Diagnosis

Building a Model for Breast Mass Image Diagnosis

Question

You work as a machine learning specialist for a medical imaging company.

You and your machine learning team have been assigned the task of building a model that predicts whether a breast mass image indicates a benign or malignant tumor.

Your model will be used to help physicians quickly decide how to treat their patients using a verified diagnosis.

Which option gives the appropriate machine learning services and features to train your model for your image diagnosis problem?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: C.

Option A is incorrect.

You specify the SageMaker role arn used to give learning and hosting access to your data by using the role = sagemaker.get_execution_role() statement, not the role = sagemaker.get_role() statement.

Also, since this problem is trying to predict whether an image mass is benign or malignant, you want to use a regression algorithm to give the probability that the mass is malignant, not a binary yes/no.

Therefore, you should set the predictor_type hyperparameter to the value regressor, not binary_classifier.

Option B is incorrect.

Since this problem is trying to predict whether an image mass is benign or malignant, you want to use a regression algorithm to give the probability that the mass is malignant, not a binary yes/no.

Therefore, you should set the predictor_type hyperparameter to the value regressor, not binary_classifier.

Option C is correct.

Since this problem is trying to predict whether an image mass is benign or malignant, you want to use a regression algorithm to give the probability that the mass is malignant, not a binary yes/no.

Therefore, you should set the predictor_type hyperparameter to the regressor type.

Also, it is correct to specify the SageMaker role arn used to give learning and hosting access to your data by using the role = sagemaker.get_execution_role() statement.

Option D is incorrect.

Since this problem is trying to predict whether an image mass is benign or malignant, you want to use a regression algorithm to give the probability that the mass is malignant, not a classification across multiple classes.

Therefore, you should set the predictor_type hyperparameter to a regressor, not a multiclass_classifier.

References:

Please see the AWS Amazon SageMaker Examples jupyter notebook titled Breast Cancer Prediction (https://github.com/aws/amazon-sagemaker-examples/blob/master/introduction_to_applying_machine_learning/breast_cancer_prediction/Breast%20Cancer%20Prediction.ipynb),

The Amazon SageMaker page titled Linear Learner Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html)

The appropriate option to train a model for the given medical imaging problem is B. Specify the SageMaker role arn used to give learning and hosting access to your data by using the role = sagemaker.get_execution_role() statement in your jupyter notebook. Load your data into a pandas dataframe. Split your data into 80% training, 10% validation and 10% testing. Set the predictor_type hyperparameter to binary_classifier. Then run your training job using the sagemaker.create_training_job statement in your jupyter notebook.

Here's why:

First, let's break down the different parts of the option:

  • Specify the SageMaker role arn used to give learning and hosting access to your data by using the role = sagemaker.get_execution_role() statement in your jupyter notebook: This step ensures that the machine learning model can access the necessary data and resources in the SageMaker environment. The role arn gives permission to the notebook instance to access AWS services.
  • Load your data into a pandas dataframe: This step involves loading the data from the source into the notebook environment in a format that can be used for training the machine learning model. A pandas dataframe is a popular data structure used for data manipulation and analysis.
  • Split your data into 80% training, 10% validation, and 10% testing: This step involves dividing the data into different subsets for training, validation, and testing. This approach helps to evaluate the model's performance and avoid overfitting, where the model learns the training data too well and performs poorly on new data.
  • Set the predictor_type hyperparameter to binary_classifier: This step sets the type of machine learning problem being solved, which in this case is a binary classification problem. The predictor_type hyperparameter helps SageMaker determine the type of algorithm to use and the format of the output.
  • Run your training job using the sagemaker.create_training_job statement in your jupyter notebook: This step starts the training job on SageMaker using the specified hyperparameters and resources.

Now let's look at why the other options are not correct:

Option A is incorrect because it uses the "sagemaker.get_role()" statement, which is not a valid method to retrieve the role arn. The correct method is "sagemaker.get_execution_role()". Additionally, the hyperparameter "predictor_type" should be set to binary_classifier for a binary classification problem.

Option C is incorrect because it sets the predictor_type to regressor, which is not appropriate for a binary classification problem. A regressor is used for predicting continuous values, not binary classes.

Option D is incorrect because it sets the predictor_type to multiclass_classifier, which is not appropriate for a binary classification problem. A multiclass classifier is used for problems with more than two classes, whereas this problem has only two classes (benign or malignant).