Cash Logistics Application | Data Collection, Routing, and Scheduling

Cash Logistics Application

Prev Question Next Question

Question

Your company has developed and managed a Cash Logistics application.

The application collects data from a variety of sources like Bank, Deposit Boxes, or ATMs.

Then, it routes the physical cash to various points like Chain Stores, Banks, and ATM points to schedule the pickups and drop-offs.

The application receives requests in different formats from different sources like CSV, XML, JSON, or even encoded data.

Due to the nature of the application, all the transfer requests come before 24 hours of the actual time of scheduling so that they can be processed and prepared for the scheduling.

Answers

A. Use the AWS Glue to perform the extract and import the data to the source database for job scheduling. Trigger the AWS Batch jobs once the data is imported to perform the route schedule.

B. Use the AWS Batch to execute the processing jobs for different importers. Create separate job definitions and compute environments to support different import files based on customers.

C. Create different S3 buckets for a different source. Use the Lambda trigger to invoke after a new request has arrived. The Lambda will process the file and import the data into the target database for the job scheduling.

D. Use the EFS to import the data from on-premises data centers to the AWS. Run the EMR jobs which can pull the data from the EFS and import the data into the database for the job scheduling.

E. Import the files of data requests from the on-premises data center to S3. Store the files into one S3 bucket and allocate one folder for each customer.

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Correct Answer: A, B, and E.

Option A is CORRECT because the AWS Glue is a fully serverless ETL framework that can process a large amount of data set effectively and can handle the transformation with little to no coding requirements.

It is possible to trigger the AWS Batch process once the AWS Glue import is completed.

This can invoke the route scheduling jobs based on the imported data.

Option B is CORRECT because the AWS Batch can process a large number of jobs with varying compute requirements.

The jobs can be prioritized as well as scheduled to perform after one another.

The jobs run in the elastic containers and only consume the resources when there is anything to work on.

This can be integrated with AWS Glue for the post-processing once the data is available for execution.

Option C is INCORRECT because running the data load and scheduling the jobs in the Lambda will not be possible as per the business requirements.

Also, in the question, it mentions that it may take few minutes to an hour to process the routing information.

The Lambda has a limited run time of 15 minutes per execution.

So some jobs may not fit into that.

Option D is INCORRECT because different importers are required for the different file types, so there will be different EMR clusters that need to run and manage.

EMR is technically the same as Glue, but it provides additional controls to manage the underlying Hadoop environment for the lower-level control.

It may not be a suitable fit in the current context.

Option E is CORRECT because it can automatically sync the request files from the on-premise data center to the AWS environment.

Once the files are available in the S3, other processes can be triggered via the S3 Event Triggers.

The best solution for this scenario would be option C. Here's why:

Option A, which suggests using AWS Glue to extract and import data to the source database for job scheduling, is not the best fit for the given scenario. AWS Glue is a fully managed ETL service that is used for data cataloging, cleaning, normalization, and preparation. It is not suitable for scheduling jobs.

Option B, which suggests using AWS Batch to execute processing jobs for different importers, is also not the best fit. AWS Batch is used for running batch computing workloads on the AWS Cloud. It is not designed to handle file processing and job scheduling.

Option D suggests using EFS to import data from on-premises data centers to AWS and running EMR jobs to import data into the database for job scheduling. This approach is expensive and time-consuming as it involves setting up an EFS file system and an EMR cluster. Moreover, it is not suitable for processing files of different formats and sources.

Option E, which suggests importing data request files from on-premises data centers to S3 and storing them in one S3 bucket, is not optimal as it does not cater to the different file formats and sources of data. Furthermore, storing all data in a single bucket could lead to naming conflicts and make it difficult to manage and process data.

Option C, which suggests creating different S3 buckets for each data source and using Lambda triggers to process the data, is the best fit for this scenario. Here's how it works:

Create different S3 buckets for each data source (e.g., Bank, Deposit Boxes, ATM points).
Allocate a folder for each customer within the S3 bucket.
When a new data request file arrives in a bucket, a Lambda function is triggered to process the file.
The Lambda function reads the file and extracts the data.
The extracted data is then imported into the target database for job scheduling.

This approach is cost-effective, scalable, and can handle different file formats and sources. Moreover, it allows for efficient management and processing of data.

Prev Question Next Question