You work for a retail clothing manufacturer that has a very active online web store.

You have been assigned the task of building a model to contact customers for a direct marketing campaign based on their predicted receptiveness to the campaign.

Some of your customers have been contacted in the past for other marketing campaigns.

You don't want to contact these customers who have been contacted in the past for this latest campaign. Before training this model, you need to clean your data and prepare it for the XGBoost algorithm you are going to use.

You have written your cleaning/preparation code in your SageMaker notebook.

Based on the following code, what happens on lines 19, 21, 22? (Select THREE) 1 import sagemaker 2 import boto3 3 from sagemaker.predictor import csv_serializer 4 import numpy as np 5 import pandas as pd 6 from time import gmtime, strftime 7 import os 8 region = boto3.Session().region_name 9 smclient = boto3.Session().client('sagemaker') 10 from sagemaker import get_execution_role 11 role = get_execution_role() 12 bucket = 'sagemakerS3Bucket' 13 prefix = 'sagemaker/xgboost' 14 !wget -N https://.../bank.zip 15 !unzip -o bank.zip 16 data = pd.read_csv('./bank/bank-full.csv', sep=';') 17 pd.set_option('display.max_columns', 500) 18 pd.set_option('display.max_rows', 5) 19 data['no_previous_campaign'] = np.where(data['contacted'] == 999, 1, 0) 20 data['not_employed'] = np.where(np.in1d(data['job'], ['student', 'retired', 'unempl']), 1, 0) 21 model_data = pd.get_dummies(data) 22 model_data = model_data.drop(['duration', 'employee.rate', 'construction.price.idex', 'construction.confidence.idx','lifetime.rate', 'region'], axis=1) 23 train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data)), int(0.9*len(model_data))]) 24 pd.concat([train_data['y_yes'], train_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('train.csv', index=False, header=False) 25 pd.concat([validation_data['y_yes'], validation_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('validation.csv', index=False, header=False) 26 pd.concat([test_data['y_yes'], test_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('test.csv', index=False, header=False) 27 boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv') 28 boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')).upload_file('validation.csv')

Question

You work for a retail clothing manufacturer that has a very active online web store.

You have been assigned the task of building a model to contact customers for a direct marketing campaign based on their predicted receptiveness to the campaign.

Some of your customers have been contacted in the past for other marketing campaigns.

You don't want to contact these customers who have been contacted in the past for this latest campaign. Before training this model, you need to clean your data and prepare it for the XGBoost algorithm you are going to use.

You have written your cleaning/preparation code in your SageMaker notebook.

Based on the following code, what happens on lines 19, 21, 22? (Select THREE) 1 import sagemaker 2 import boto3 3 from sagemaker.predictor import csv_serializer 4 import numpy as np 5 import pandas as pd 6 from time import gmtime, strftime 7 import os 8 region = boto3.Session().region_name 9 smclient = boto3.Session().client('sagemaker') 10 from sagemaker import get_execution_role 11 role = get_execution_role() 12 bucket = 'sagemakerS3Bucket' 13 prefix = 'sagemaker/xgboost' 14 !wget -N https://.../bank.zip 15 !unzip -o bank.zip 16 data = pd.read_csv('./bank/bank-full.csv', sep=';') 17 pd.set_option('display.max_columns', 500) 18 pd.set_option('display.max_rows', 5) 19 data['no_previous_campaign'] = np.where(data['contacted'] == 999, 1, 0) 20 data['not_employed'] = np.where(np.in1d(data['job'], ['student', 'retired', 'unempl']), 1, 0) 21 model_data = pd.get_dummies(data) 22 model_data = model_data.drop(['duration', 'employee.rate', 'construction.price.idex', 'construction.confidence.idx','lifetime.rate', 'region'], axis=1) 23 train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data)), int(0.9*len(model_data))]) 24 pd.concat([train_data['y_yes'], train_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('train.csv', index=False, header=False) 25 pd.concat([validation_data['y_yes'], validation_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('validation.csv', index=False, header=False) 26 pd.concat([test_data['y_yes'], test_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('test.csv', index=False, header=False) 27 boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv') 28 boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')).upload_file('validation.csv')

Exam-Answer · Accepted Answer

Sets the attribute no_previous_campaign to 1 if the customer in the observation has not been contacted via a previous campaign or 0 if they have been contacted via a previous campaignConverts categorical data to a set of indicator variablesRemoves features deemed inconsequential

Prepare Data for XGBoost Algorithm

Question

Answers

Explanations