TerramEarth - Data Ingestion Solutions

Data Ingestion Solutions for TerramEarth

Question

TerramEarth manufactures heavy equipment for the mining and agricultural industries.

About 80% of their business is from mining and 20% from agriculture.

They currently have over 500 dealers and service centers in 100 countries.

Their mission is to build products that make their customers more productive.

Solution Concept - There are 20 million TerramEarth vehicles in operation that collect 120 fields of data per second.

Data is stored locally on the vehicle and can be accessed for analysis when a vehicle is serviced.

The data is downloaded via a maintenance port.

This same port can be used to adjust operational parameters, allowing the vehicles to be upgraded in the field with new computing modules.

Approximately 200,000 vehicles are connected to a cellular network, allowing TerramEarth to collect data directly.

At a rate of 120 fields of data per second, with 22 hours of operation per day, TerramEarth collects a total of about 9 TB/day from these connected vehicles.

Existing Technical Environment - TerramEarth's existing architecture is composed of Linux and Windows-based systems that reside in a single U.S, west coast based data center.

These systems gzip CSV files from the field and upload via FTP, and place the data in their data warehouse.

Because this process takes time, aggregated reports are based on data that is 3 weeks old.

With this data, TerramEarth has been able to preemptively stock replacement parts and reduce unplanned downtime of their vehicles by 60%

However, because the data is stale, some customers are without their vehicles for up to 4 weeks while they wait for replacement parts.

Business Requirements -Decrease unplanned vehicle downtime to less than 1 weekSupport the dealer network with more data on how their customers use their equipment to better position new products and servicesHave the ability to partner with different companies " especially with seed and fertilizer suppliers in the fast-growing agricultural business " to create compelling joint offerings for their customers Technical Requirements -Expand beyond a single datacenter to decrease latency to the American midwest and east coastCreate a backup strategyIncrease security of data transfer from equipment to the datacenterImprove data in the data warehouseUse customer and equipment data to anticipate customer needs Application 1: Data ingest - A custom Python application reads uploaded datafiles from a single server, writes to the data warehouse.

Compute:Windows Server 2008 R2 - 16 CPUs - 128 GB of RAM - 10 TB local HDD storage Application 2: Reporting - An off the shelf application that business analysts use to run a daily report to see what equipment needs repair.

Only 2 analysts of a team of 10 (5 west coast, 5 east coast) can connect to the reporting application at a time.

Compute:Off the shelf application.

License tied to number of physical CPUs - Windows Server 2008 R2 - 16 CPUs - 32 GB of RAM - 500 GB HDD Data warehouse:A single PostgreSQL server - RedHat Linux - 64 CPUs - 128 GB of RAM - 4x 6TB HDD in RAID 0 Executive Statement - Considering the technical requirements, which components should you use for the ingestion of the data?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

B.

Based on the business requirements, TerramEarth needs to decrease unplanned vehicle downtime to less than 1 week, support the dealer network with more data on how their customers use their equipment to better position new products and services, and have the ability to partner with different companies, especially with seed and fertilizer suppliers in the fast-growing agricultural business, to create compelling joint offerings for their customers. Additionally, they have technical requirements such as expanding beyond a single data center to decrease latency to the American midwest and east coast, creating a backup strategy, increasing security of data transfer from equipment to the data center, and improving data in the data warehouse.

To address the data ingestion component of the solution, the best approach would be to leverage a cloud-based service that can handle the scale of data being collected by TerramEarth. From the given options, the best choice would be Cloud IoT Core with public/private key pairs.

Cloud IoT Core is a managed service that allows TerramEarth to securely and reliably ingest data from their connected devices. It can handle the scale of data being collected and provides a scalable and resilient platform to manage device communication. Cloud IoT Core also offers end-to-end security with features such as public/private key pairs, which are necessary to authenticate the devices and secure data transfer from equipment to the data center.

Google Kubernetes Engine with an SSL Ingress is not the best option for data ingestion as it is a container orchestration service that is used to manage and deploy containerized applications. While it can scale to handle large volumes of traffic, it is not specifically designed for IoT data ingestion.

Compute Engine with project-wide SSH keys or specific SSH keys could be used to ingest data, but it would require significant effort to set up and maintain. Cloud IoT Core provides a more streamlined and secure solution for IoT data ingestion.

In summary, based on the business and technical requirements, Cloud IoT Core with public/private key pairs is the best option for TerramEarth to ingest data from their connected vehicles.