EMR Hadoop Ecosystem for OLTP and Operational Analytics with Apache HBase Backing Store

EMR Hadoop Ecosystem for OLTP and Operational Analytics with Apache HBase Backing Store

Question

Allianz Financial Services (AFS) is a banking group offering end-to-end banking and financial solutions in South East Asia through its consumer banking, business banking, Islamic banking, investment finance and stock broking businesses as well as unit trust and asset administration, having served the financial community over the past five decades. AFS launched EMR cluster to support their big data analytics requirements.

AFS is planning to build an application running on EMR which supports both OLTP and operational analytics allowing you to use standard SQL queries and JDBC APIs to work with an Apache HBase backing store. Which EMR Hadoop ecosystem fulfills the requirements? select 1 option.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer : C.

Option A is incorrect - Hue (Hadoop User Experience) is an open-source, web-based, graphical user interface for use with Amazon EMR and Apache Hadoop.

Hue groups together several different Hadoop ecosystem projects into a configurable interface.

Amazon EMR has also added customizations specific to Hue in Amazon EMR.

Hue acts as a front-end for applications that run on your cluster, allowing you to interact with applications using an interface that may be more familiar or user-friendly.

The applications in Hue, such as the Hive and Pig editors, replace the need to log in to the cluster to run scripts interactively using each application's respective shell

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hue.html

Option B is incorrect - Apache Flink is a streaming dataflow engine that you can use to run real-timestream processing on high-throughput data sources.

Flink supports event time semantics for out-of-order events, exactly-once semantics, backpressure control, and APIs optimized for writing both streaming and batch applications.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-flink.html

Option C is correct -Apache Phoenix is used for OLTP and operational analytics, allowing you to use standard SQL queries and JDBC APIs to work with an Apache HBase backing store.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-phoenix.html

Option D is incorrect -HBase is an open source, non-relational, distributed database developed as part of the Apache Software Foundation's Hadoop project.

HBase runs on top of Hadoop Distributed File System (HDFS) to provide non- relational database capabilities for the Hadoop ecosystem

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase.html

The requirement is to build an application running on EMR that supports both OLTP and operational analytics, allowing the use of standard SQL queries and JDBC APIs to work with an Apache HBase backing store. Based on these requirements, the best option would be Apache Phoenix, which is a relational database layer over HBase that provides support for OLTP workloads and enables the use of standard SQL queries and JDBC APIs.

Apache Hue is a web-based interface for interacting with Hadoop and other big data systems, but it is not specifically designed for OLTP workloads or for working with HBase.

Apache Flink is a distributed processing framework for batch and stream processing, but it is not designed for working with HBase or for supporting OLTP workloads.

Apache HBase is a NoSQL database that runs on top of Hadoop Distributed File System (HDFS), but it does not provide direct support for SQL queries or JDBC APIs. However, as mentioned earlier, Apache Phoenix can be used as a relational database layer over HBase to provide these capabilities.

Therefore, the correct answer is C. Apache Phoenix.