Defining Data Sources for Allianz Financial Services (AFS) on AWS

Identifying Tasks for Defining Data Sources

Question

Allianz Financial Services (AFS) is a banking group offering end-to-end banking and financial solutions in South East Asia through its consumer banking, business banking, Islamic banking, investment finance and stockbroking businesses as well as unit trust and asset administration, having served the financial community over the past five decades. AFS has built its entire infrastructure on AWS which includes web applications built on EC2, Files and logs on S3, databases on Amazon RDS, DynamoDB and DWH on Redshift.

AFS is defining data sources.

Please help identify the tasks. Select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F. G.

Answer: B, D, F.

Option A is incorrect -This is called schema which is composed of all attributes in the input data and their corresponding data types.

https://docs.aws.amazon.com/machine-learning/latest/dg/creating-a-data-schema-for-amazon-ml.html

Option B is correct -Data Sources can be created only for RDS, Redshift, and S3.

https://docs.aws.amazon.com/machine-learning/latest/dg/creating-and-using-datasources.html

Option C is incorrect -Amazon ML provides three options for splitting your data:

Pre-split the data - Split the data into two data input locations, before uploading them to Amazon Simple Storage Service (Amazon S3) and creating two separate data sources with them.

Amazon ML sequential split - configure Amazon ML to split your data sequentially when creating the training and evaluation data sources.

Amazon ML random split - Configure Amazon ML to split your data using a seeded random method when creating the training and evaluation data sources.

Option D is correct -Data source objects contain metadata about your input data.

for a data source, Amazon ML reads your input data, computes descriptive statistics on its attributes, and stores the statistics, a schema, and other information as part of the data source object.

https://docs.aws.amazon.com/machine-learning/latest/dg/creating-and-using-datasources.html

Option E is incorrect - Data Source cannot be created on DynamoDB

https://docs.aws.amazon.com/machine-learning/latest/dg/creating-and-using-datasources.html

Option F is correct -AttributeType includes Binary, Categorical, Numeric and Text datatypes.

https://docs.aws.amazon.com/machine-learning/latest/dg/creating-a-data-schema-for-amazon-ml.html#assigning-data-types

Option G is incorrect -these define the distribution of text attributes.

https://docs.aws.amazon.com/machine-learning/latest/dg/data-insights.html

The tasks to identify the data sources for AFS on AWS are:

A. Data of files in S3, tables, views, and collections in databases are the data sources: This option is correct because AFS is using Amazon S3 to store files and logs, Amazon RDS and DynamoDB for databases, and Redshift for data warehousing. All these services provide different types of data sources that can be used in machine learning models. For example, S3 files can be used to train computer vision models, while RDS tables can be used for time-series forecasting models.

B. Amazon ML data sources can be created only for RDS, Redshift, and S3: This option is partially correct. Amazon ML can create data sources for RDS, Redshift, and S3, but it can also create data sources for other AWS services such as EMR, Athena, and Aurora.

C. Amazon ML provides only 2 options to split the datasets, sequential and random split: This option is incorrect. Amazon ML provides multiple options to split datasets such as stratified sampling, time-series split, and k-fold cross-validation.

D. Metadata of files in S3, tables, views, and collections in databases are the data sources: This option is partially correct. Metadata can be used as a data source, but it is not the same as the data itself. Metadata provides information about the data such as file size, creation date, and schema structure.

E. Amazon ML data sources can be created on any of the above data sources: This option is correct. Amazon ML can create data sources on multiple AWS services such as S3, RDS, Redshift, EMR, Athena, and Aurora.

F. AttributeType includes Binary, Categorical, Numeric and Text datatypes: This option is correct. AttributeType is a parameter used in Amazon ML to define the data type of a feature or column. It includes Binary for binary classification, Categorical for categorical variables, Numeric for continuous variables, and Text for text-based features.

G. AttributeType includes Ranking, Categorization, Word Prominence, Count number, and Count percentage: This option is incorrect. AttributeType does not include Ranking or Word Prominence. Count number and Count percentage are not data types but are calculated from existing data. Categorization is included in AttributeType as a data type for categorical variables.