AWS Machine Learning Specialty: Input Data Channel Specifications for Manufacturing Plant Image Recognition

Input Data Channel Specifications for Manufacturing Plant Image Recognition

Question

You work as a machine learning specialist for a manufacturing plant where you are attempting to use supervised learning to train assembly line image recognition to categorize malformed parts.

You have engineered your data and produced a CSV file and placed it on S3. Which of the following input data channel specifications are correct for your data? (Select TWO)

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Answers: A and C.

Option A is correct.

The Content-Type of text/csv without specifying a label_size is used when you have target data, usually in column one, since the default value for label_size is 1, meaning you have one target column.

(See the Amazon SageMaker developer guide titled Common Data Formats for Training)

Option B is incorrect.

The Content-Type of text/csv specifying a label_size of 0 is used when you do not have target data.

You usually choose this setting when using unsupervised learning.

(See the Amazon SageMaker developer guide titled Common Data Formats for Training)

Option C is correct.

From the Amazon SageMaker developer guide titled Common Data Formats for Training, “Amazon SageMaker requires that a CSV file doesn't have a header record and that the target variable is in the first column”.

Option D is incorrect.

From the Amazon SageMaker developer guide titled Common Data Formats for Training, “Amazon SageMaker requires that a CSV file doesn't have a header record and that the target variable is in the first column”.

Option E is incorrect.From the Amazon SageMaker developer guide titled Common Data Formats for Training, “Amazon SageMaker requires that a CSV file doesn't have a header record and that the target variable is in the first column”.

Option F is incorrect.From the Amazon SageMaker developer guide titled Common Data Formats for Training, “Amazon SageMaker requires that a CSV file doesn't have a header record and that the target variable is in the first column”.

Reference:

Please see the Amazon SageMaker developer guide, specifically Common Data Formats for Built-in Algorithms and Common Data Formats for Training.

As a machine learning specialist, you need to ensure that your input data is correctly formatted before training a model. In this case, you have engineered your data and produced a CSV file, and you have placed it on S3. Let's go through each answer option to see which input data channel specifications are correct for your data.

A. Metadata Content-Type is identified as text/csv This answer option is correct. When uploading a CSV file to S3, the metadata Content-Type should be identified as text/csv. This identifies the file format as a comma-separated values file.

B. Metadata Content-Type is identified as text/csv;label_size=0 This answer option is incorrect. The label_size parameter is not a valid parameter for the Content-Type metadata. This answer option suggests that the label size is 0, which is not helpful for a supervised learning problem where labels are required.

C. Target value should be in the first column with no header This answer option is incorrect. In a supervised learning problem, the target value (i.e., the label or output variable) should be in a separate column from the input variables. Additionally, it's usually best practice to include a header row to identify the columns.

D. Target value should be in the last column with no header This answer option is incorrect. Similar to answer option C, the target value should be in a separate column from the input variables. Additionally, not having a header row can make it difficult to identify which column contains the target value.

E. Target value should be in the last column with a header This answer option is correct. The target value should be in a separate column from the input variables, and including a header row can help identify which column contains the target value. Placing the target value in the last column is a convention but it's not a strict rule.

F. Target value should be in the first column with a header. This answer option is incorrect. Placing the target value in the first column is not conventional, as it can make it difficult to identify the input variables. However, having a header row to identify the columns is a good practice.

Therefore, the correct input data channel specifications for your data are:

  • Metadata Content-Type is identified as text/csv
  • Target value should be in the last column with a header.