AWS Certified Machine Learning - Specialty Exam: Data Source Platforms for Quantitative Analysis Model

Data Source Platforms for Quantitative Analysis Model

Question

You work as a machine learning specialist for a financial services firm.

Your machine learning team has been tasked with building a quantitative analysis model for your mutual fund portfolio managers in the firm's quant department.

You have several financial data provider data sources that you need to use in your model.

You are looking for the optimal data source platform to ingest data into your machine learning jupyter notebook environment.

Which options are NOT a data source platform that you can use? (Select TWO)

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Correct Answers: D and E.

Option A is incorrect.

The most commonly used data source for SageMaker is an S3 bucket.

However, you can also use Athena, EMR, and Redshift as data sources for SageMaker.

Option B is incorrect.

The most commonly used data source for SageMaker is an S3 bucket.

However, you can also use Athena, EMR, and Redshift as data sources for SageMaker.

Option C is incorrect.

The most commonly used data source for SageMaker is an S3 bucket.

However, you can also use Athena, EMR, and Redshift as data sources for SageMaker.

Option D is correct.

DynamoDB is not a viable data source for ingesting data into your machine learning jupyter notebook environment.

Option E is correct.

RDS is not a viable data source for ingesting data into your machine learning jupyter notebook environment.

References:

Please see the Amazon SageMaker Examples Read the Docs Data Ingestion guide titled Get started with data ingestion (https://sagemaker-examples.readthedocs.io/en/latest/ingest_data/index.html),

The Amazon SageMaker Examples Read the Docs Data Ingestion guide titled Ingest data with Athena (https://sagemaker-examples.readthedocs.io/en/latest/ingest_data/02_Ingest_data_with_Athena_v1.html),

The Amazon SageMaker Examples Read the Docs Data Ingestion guide titled Ingest Data with EMR (https://sagemaker-examples.readthedocs.io/en/latest/ingest_data/04_Ingest_data_with_EMR.html),

The Amazon SageMaker Examples Read the Docs Data Ingestion guide titled Ingest data with Redshift (https://sagemaker-examples.readthedocs.io/en/latest/ingest_data/03_Ingest_data_with_Redshift_v3.html)

As a machine learning specialist for a financial services firm, you are tasked with building a quantitative analysis model for your mutual fund portfolio managers. This requires you to ingest data from several financial data provider data sources into your machine learning Jupyter notebook environment.

Out of the given options, two are NOT data source platforms that you can use. Let's take a look at each of them in detail:

A. Athena - Amazon Athena is an interactive query service that allows you to analyze data in Amazon S3 using standard SQL. It is a data source platform that you can use for your machine learning Jupyter notebook environment.

B. Redshift - Amazon Redshift is a fully managed data warehouse that makes it simple and cost-effective to analyze all your data using SQL and your existing business intelligence tools. It is also a data source platform that you can use for your machine learning Jupyter notebook environment.

C. EMR - Amazon EMR (Elastic MapReduce) is a fully managed service that provides big data processing frameworks such as Apache Hadoop, Spark, and Flink. It is also a data source platform that you can use for your machine learning Jupyter notebook environment.

D. DynamoDB - Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. It is NOT a data source platform that you can use for your machine learning Jupyter notebook environment.

E. RDS - Amazon Relational Database Service (RDS) is a fully managed relational database service that makes it easy to set up, operate, and scale a relational database in the cloud. It is also a data source platform that you can use for your machine learning Jupyter notebook environment.

Therefore, the two options that are NOT a data source platform that you can use are D. DynamoDB and E. RDS.