AWS Certified Big Data - Specialty Exam: Tick-Bank Record Format Conversion in Firehose

Supported Formats for Record Format Conversion in Firehose

Question

Tick-Bank is a privately held Internet retailer of both physical and digital products founded in 2008

The company has more than six-million clients worldwide.

Tick-Bank aims to serve as a connection between digital content makers and affiliate dealers, who then promote them to clients.

Tick-Bank's technology aids in payments, tax calculations and a variety of customer service tasks.

Tick-Bank assists in building perceptibility and revenue making opportunities for entrepreneurs. Tick-Bank runs multiple java based web applications running on windows based EC2 machines in AWS managed by internal IT Java team, to serve various business functions.

Tick-Bank is looking to enable web-site traffic analytics there by understanding user navigational behavior, preferences and other click related info.

Tick-Bank is also looking at improving operations ingesting monitoring logs.

Kinesis agent is used to process the logs Since the amount of data, that is being processing is very large, Tick-Bank prefers data compression, data transformation when processing and considers Kinesis firehose to process the streams.

Tick-Bank is considering record-format conversion.

The logs are captured into S3 and further integrated with AWS Glue to provide analytics. What formats are supported when record format conversion in enabled in Firehose.

select 2 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Answer:B,C.

Amazon Kinesis Data Firehose can convert the format of your input data from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3

Parquet and ORC are columnar data formats that save space and enable faster queries compared to row-oriented formats like JSON.

https://docs.aws.amazon.com/firehose/latest/dev/record-format-

Amazon Kinesis Data Firehose is a fully managed service that helps to load data streams into AWS data stores, including Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, in near real-time. Firehose allows us to transform the incoming data to a format that is suitable for data storage, processing, and analytics. When the Record format conversion is enabled in Firehose, it can convert the data format of the incoming data stream to a different format. Firehose supports the following record format conversions:

A. Record format conversion from JSON to Avro format: Apache Avro is a binary data serialization system that helps in exchanging data between distributed systems. When Firehose converts the incoming JSON data to Avro format, it converts the schemaless JSON data to a well-defined Avro schema. Avro schema can be defined using JSON or in Avro IDL (Interface Definition Language) format. The Avro format is well suited for efficient serialization and deserialization and can be used in various data processing frameworks such as Apache Spark, Apache Hive, and Apache Kafka.

B. Record format conversion from JSON to Parquet format: Apache Parquet is a columnar storage format that helps in storing data efficiently and enables high-performance analytics on large datasets. When Firehose converts the incoming JSON data to Parquet format, it converts the schemaless JSON data to a well-defined Parquet schema. The Parquet format is well suited for querying large datasets efficiently and can be used in various data processing frameworks such as Apache Spark, Apache Hive, and Amazon Athena.

C. Record format conversion from JSON to Apache ORC format: Apache ORC (Optimized Row Columnar) is a columnar storage format that helps in storing data efficiently and enables high-performance analytics on large datasets. When Firehose converts the incoming JSON data to ORC format, it converts the schemaless JSON data to a well-defined ORC schema. The ORC format is well suited for querying large datasets efficiently and can be used in various data processing frameworks such as Apache Spark, Apache Hive, and Amazon Athena.

D. Record format conversion from Apache Log to Avro format: Apache Logs are text-based logs generated by various web servers and applications. When Firehose converts the incoming Apache Logs data to Avro format, it converts the text-based log data to a well-defined Avro schema. The Avro format is well suited for efficient serialization and deserialization and can be used in various data processing frameworks such as Apache Spark, Apache Hive, and Apache Kafka.

E. Record format conversion from Apache Log to Parquet format: When Firehose converts the incoming Apache Logs data to Parquet format, it converts the text-based log data to a well-defined Parquet schema. The Parquet format is well suited for querying large datasets efficiently and can be used in various data processing frameworks such as Apache Spark, Apache Hive, and Amazon Athena.

F. Record format conversion from Apache Log to Apache ORC format: When Firehose converts the incoming Apache Logs data to ORC format, it converts the text-based log data to a well-defined ORC schema. The ORC format is well suited for querying large datasets efficiently and can be used in various data processing frameworks such as Apache Spark, Apache Hive, and Amazon Athena.

In conclusion, the two options that are supported when record format conversion is enabled in Firehose are: A. Record format conversion from JSON to Avro format B. Record format conversion from JSON to Parquet format