Random Cut Forest - Training Job Failure Causes

Possible Causes of Training Job Failure

Question

You work for a computer peripheral manufacturer that builds printers, external hard drives, etc.

You are on the machine learning team where you are currently building a machine learning model to be used to find anomalies in the functional behavior of your company's line of printers.

The printers generate IoT device messages that are streamed to your model S3 bucket using Amazon Kinesis Data Streams.

You have performed your data cleansing and data engineering of your IoT printer data.

You are now ready to start training your model.

You have chosen the Random Cut Forest SageMaker built-in algorithm for your model.

You hope to find anomalies in your customer's printer activity by looking for outlier observations using your Random Cut Forest-based model.

Finding these anomalies will help your company provide better customer service. You have started your first training job, but you see that your training job is failing.

What may be the cause of this failure?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: A.

Option A is correct.

SageMaker only supports the CPU instance class for the Random Cut Forest algorithm.

Option B is incorrect.

SageMaker only supports the CPU instance class for the Random Cut Forest algorithm.

So selecting the instance class of CPU would not cause your training job to fail.

Option C is incorrect.

SageMaker supports both the CSV and recordio-protobuf file types for training data files.

So using the CSV file type for your training data would not cause your training job to fail.

Option D is incorrect.

SageMaker supports both the CSV and recordio-protobuf file types for training data files.

So using the recordio-protobuf file type for your training data would not cause your training job to fail.

Reference:

Please see the Amazon SageMaker developer guide titled Train a Model with Amazon SageMaker, the Amazon SageMaker developer guide titled Common Parameters for Built-In Algorithms.

The cause of the failure of the training job in this scenario can be one of the following options:

A. You have selected compute resources of the GPU compute instance class: Random Cut Forest is a machine learning algorithm that doesn't require high computational power. Selecting a GPU instance class for this algorithm would be an overkill and might result in unnecessary cost. However, selecting GPU instances is not a reason for the failure of the training job.

B. You have selected compute resources of the CPU compute instance class: Random Cut Forest algorithm doesn't require high computational power, and it can run on a CPU instance class. Therefore, selecting a CPU instance class is a suitable choice. Hence, this option is unlikely to be the cause of the failure of the training job.

C. You have built your training data files using the CSV file type: Random Cut Forest algorithm can handle input data in various formats, including CSV. Therefore, using CSV files for training data is not a reason for the failure of the training job.

D. You have built your training data files using the recordio-protobuf file type: The Random Cut Forest SageMaker built-in algorithm expects input data in the RecordIO protobuf format. Therefore, building training data files using the CSV file type is not suitable for this algorithm and can cause the failure of the training job. The algorithm will throw an error message indicating that it requires data in the RecordIO protobuf format.

In summary, the most likely cause of the failure of the training job in this scenario is that the training data files are built using the wrong file format, CSV instead of RecordIO protobuf.