Model Evaluation Metrics for Cancer Screening | AWS Certified Machine Learning Exam

Choosing the Right Metric for Cancer Screening Models

Question

You are a data scientist working for a cancer screening center.

The center has gathered data on many patients that have been screened over the years.

The data is obviously skewed toward true negative results, as most screened patients don't have cancer.

You evaluate several machine learning models to decide which model best predicts true positives when using your cancer screening data.

You have split your data into a 70/30 ratio of the training set to the test set.

You now need to decide which metric to use to evaluate your models. Which metric will most accurately determine the model best suited to solve your classification problem?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: D.

Option A is incorrect because it is best used when both outcomes have equal importance.

Due to the importance of true negative in this equation, it will not differentiate models well for the cancer screening problem, since this data set is skewed to true negatives.

The true negative cases are heavily weighted in the equation, thus amplifying the impact of the imbalance.

Option B is incorrect because it only takes into account the percentage of positive cases out of the total predicted positive.

Option C is incorrect because it only takes into account the percentage of positive cases out of the total actual positive.

Option D is correct because the PR Curve is best used to evaluate models on data sets where most of the cases are negative, as in the cancer screening data set.

The true negative cases are not weighted heavily in the equation, thus reducing the impact of the imbalance.

Reference:

Please see the article Various ways to evaluate a machine learning model's performance.

In this scenario, the goal is to find the model that best predicts true positives, meaning correctly identifying patients who have cancer. As the data is skewed towards true negatives, which means patients who do not have cancer, accuracy is not an appropriate metric to use as a performance indicator. This is because even if the model predicts all cases as negative, it will still have a high accuracy due to the high proportion of true negatives.

There are several metrics available to evaluate the performance of a classification model. However, the most suitable metrics for this scenario are precision, recall, ROC curve, and PR curve.

Precision measures the proportion of true positives among all positive predictions made by the model. It is given by the formula:

Precision = TP / (TP + FP)

where TP is the number of true positives and FP is the number of false positives. Precision is a useful metric in situations where false positives are costly, such as in the case of cancer screening. However, precision does not consider false negatives, which means it may not be suitable for this scenario where missing a true positive is also costly.

Recall measures the proportion of true positives among all actual positive cases in the dataset. It is given by the formula:

Recall = TP / (TP + FN)

where FN is the number of false negatives. Recall is a useful metric in situations where false negatives are costly, such as in the case of cancer screening. However, recall does not consider false positives, which means it may not be suitable for this scenario where false positives are also costly.

The ROC curve (Receiver Operating Characteristic curve) is a graphical representation of the performance of a binary classifier model. It plots the true positive rate (TPR) against the false positive rate (FPR) for different threshold values. The area under the ROC curve (AUC-ROC) is a measure of the overall performance of the model. A higher AUC-ROC indicates a better-performing model.

The PR curve (Precision-Recall curve) is another graphical representation of the performance of a binary classifier model. It plots the precision against the recall for different threshold values. The area under the PR curve (AUC-PR) is a measure of the overall performance of the model. A higher AUC-PR indicates a better-performing model.

In this scenario, both ROC and PR curves can be used to evaluate the performance of the models. However, since the goal is to find the model that best predicts true positives, PR curve may be more appropriate. This is because the PR curve focuses on the trade-off between precision and recall, which are more relevant to the problem at hand. The AUC-PR metric also takes into account the imbalance in the dataset, which is an important consideration in this scenario.

Therefore, the most suitable metric to determine the model best suited to solve this classification problem is D. PR Curve.