Tiger Investments - EMR File System Options

EMR File System Options

Question

Tiger Investments (TI) is a private equity trust manager specializing in border market investments.

The Group is considered a pioneer investor in Southeast Asia's Greater Sub-region and the Caribbean.

Tiger Investments creates private equity funds targeting pre-emerging, post- conflict or post-disaster economies that are undergoing transition and are poised for rapid growth.

The funds invest commercially in basic businesses, targeting attractive economic and social returns.

Tiger Investments invests through a diversity of financial instruments including equity, and debt TI is planning to launch a EMR and evaluating different file system options how the storage can be configured.

Please advise.

select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Answer : A, B, C.

Option A is correct.

Ephemeral storage can be enabled through HDFS.

Option B is correct.

EMRFS extends Hadoop to add the ability to directly access data stored in Amazon S3 as if it were a file system like HDFS.

Option C is correct.

local file system storage is created when each node is created from an Amazon EC2 instance that comes with a preconfigured block of pre-attached disk storage called an instance store.

Option D is incorrect.

Ephemeral storage can be enabled only through HDFS.

Option E is incorrect.

local file system storage is created when each node is created from an Amazon EC2 instance that comes with a preconfigured block of pre-attached disk storage called an instance store.

Tiger Investments (TI) is planning to launch an EMR and is evaluating different file system options for configuring storage. EMR is a managed Hadoop framework provided by Amazon Web Services (AWS) that allows users to process big data workloads.

There are various file system options available for EMR, and the best option depends on the specific requirements of TI. The following are the three recommended options that TI should consider:

Option A: Enable Ephemeral storage using HDFS This option involves using Hadoop Distributed File System (HDFS) to store data. HDFS is a distributed file system that provides high throughput access to application data. In this option, the data is distributed across instances in the EMR cluster, and multiple copies of data are stored on different instances to ensure that no data is lost if an individual instance fails. Ephemeral storage is non-persistent storage that is attached to an EC2 instance and is lost when the instance terminates. This option is best suited for use cases where the data is transient or can be regenerated.

Option B: EMRFS to directly access the data stored in S3 This option involves using EMR File System (EMRFS) to directly access data stored in Amazon S3. EMRFS provides a consistent view of data stored in S3 across all nodes in the EMR cluster. EMRFS can also cache frequently accessed data on the local disk of each node in the cluster, which can improve performance. This option is best suited for use cases where the data is stored in S3 and needs to be processed by EMR.

Option C: Local file system storage This option involves using the local file system storage when each node is created from an Amazon EC2 instance that comes with a preconfigured block of pre-attached disk storage called an instance store. The instance store provides temporary block-level storage for EC2 instances, and the data is lost when the instance terminates. This option is best suited for use cases where the data is transient or can be regenerated.

Option D and E are not recommended because they both suggest using ephemeral storage or local file system storage with HDFS, which is not the optimal solution in most cases. Option A is a better option for HDFS storage because it distributes data across instances and stores multiple copies of data to ensure data is not lost if an individual instance fails. Option C is a better option for local file system storage because it utilizes the instance store, which is optimized for temporary block-level storage for EC2 instances.

In summary, Tiger Investments should consider using EMRFS to directly access data stored in S3, or enable ephemeral storage using HDFS by distributing data across instances in the cluster and storing multiple copies of data on different instances. They may also consider using local file system storage with the instance store when each node is created from an Amazon EC2 instance.