AWS Certified Machine Learning - Specialty Exam: Troubleshooting Error in SageMaker Jupyter Notebook

AWS Certified Machine Learning - Specialty Exam: Troubleshooting Error

Question

You work for a large healthcare diagnostics company.

You are on the machine learning team responsible for predicting various anomalies in blood samples.

You have data samples from all of the corporation's many testing facilities across the country.

You have performed feature engineering and data cleaning on your dataset.

You have also written the python code to split your dataset into training and test datasets.

You are now ready to train your model for the first time. You have written the following python code in your SageMaker jupyter notebook: import sagemaker from sagemaker.amazon.amazon_estimator import get_image_uri from sagemaker import get_execution_role container = get_image_uri(boto3.Session().region_name, 'xgboost') role = get_execution_role() s3_train = 's3://{}/{}/{}'.format(bucket, prefix, 'train') s3_validation = 's3://{}/{}/{}'.format(bucket, prefix, 'validation') s3_output = 's3://{}/{}/{}'.format(bucket, prefix, xgb_output) xgb_model = sagemaker.estimator.Estimator(container, role, train_instance_count=1, train_instance_type='ml.m4.xlarge', train_volume_size = 5, output_path=s3_output, sagemaker_session=sagemaker.Session()) xgb_model.set_hyperparameters(max_depth = 2, eta = 2, gamma = 2, min_child_weight = 2, silent = 0, objective = "multi:softmax", num_class = 10, num_round = 10) train_channel = sagemaker.session.s3_input(s3_train, content_type='text/csv') valid_channel = sagemaker.session.s3_input(s3_validation, content_type='text/csv') data_channels = {'train': train_channel, 'validation': valid_channel} xgb_model.fit(inputs=data_channels,logs=True) When you attempt to run this code in your SageMaker jupyter notebook, it fails.

You check the CloudWatch logs and find this error message: AlgorithmError: u'2' is not valid under any of the given schemas\n\nFailed validating u'oneOf' in schema[u'properties'][u'feature_dim']:\n{u'oneOf': [{u'pattern': u'^([0]\.[0-9])$', u'type': u'string'},\n{u'minimum': 0, u'type': u'integer'}]}\ What is the cause of your error?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: B.

Option A is incorrect.

If you had specified an invalid hyperparameter, you would get an error such as:

ERROR 139623806805824 train.py:48]

Additional properties are not allowed (u'min_child_weigh' was.

unexpected)

Option B is correct.

You specified the value of 2 for the eta hyperparameter, but the valid range for this hyperparameter for the XGBoost algorithm is float range: [0,1]

Option C is incorrect.

The valid content types for the XGBoost algorithm are text/libsvm (default) or text/csv.

You have used text/csv, so your content type is valid.

Option D is incorrect.

The objective multi:softmax is a valid setting for the XGBoost algorithm.

Reference:

Please see the Amazon SageMaker developer guide titled Logs for Built-in Algorithms, the Amazon SageMaker developer guide titled XGBoost Hyperparameters, and the XGBoost Parameters GitHub page (especially the Learning Task Parameters section)

The error message "AlgorithmError: u2 is not valid under any of the given schemas\n\nFailed validating uoneOf in schema[uproperties][ufeature_dim]:\n{uoneOf: [{upattern: u^([0]\.[0-9])$, utype: ustring},\n{uminimum: 0, utype: uinteger}]}" suggests that there is an issue with one of the hyperparameters used in the code.

Looking at the code, we can see that the hyperparameters are being set in the following line:

makefile
xgb_model.set_hyperparameters(max_depth = 2, eta = 2, gamma = 2, min_child_weight = 2, silent = 0, objective = "multi:softmax", num_class = 10, num_round = 10)

The error message indicates that there is an issue with the "feature_dim" property, which is not being explicitly set in the code. This suggests that the error is not directly related to the hyperparameters being set, but rather to some other issue with the configuration.

However, the error message also suggests that the value "2" is not valid for the hyperparameter being checked. Looking at the schema provided in the error message, we can see that "feature_dim" is expected to be either a string matching a specific pattern or an integer greater than or equal to zero. Therefore, it is possible that the value "2" is being interpreted as an invalid value for "feature_dim".

To determine the root cause of the issue, it may be necessary to examine other parts of the code, such as the data preparation and input configuration steps. It is also possible that the issue is related to the specific version of the XGBoost algorithm being used, or to some other aspect of the SageMaker environment. In any case, further investigation will be necessary to determine the cause of the error and how to resolve it.