Distributing Resources for Querying in Redshift Tables - Best Practices for Big Data Optimization

Optimizing Resource Distribution for Querying in Redshift Tables

Question

You have created a platform wherein companies can place their data in Redshift tables for their order history.

There are 5 major players which take up 80% of the data hosted and a small number of other players.

You need to ensure that the resources for querying is distributed properly for the big players since they take up a major percentage of the data on Redshift.

How can you achieve this without much additional maintenance overhead?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - A.

The AWS Documentation mentions the following.

Amazon Redshift workload management (WLM) enables users to flexibly manage priorities within workloads so that short, fast-running queries won't get stuck in queues behind long-running queries.

Amazon Redshift WLM creates query queues at runtime according to service classes, which define the configuration parameters for various types of queues, including internal system queues and user-accessible queues.

From a user perspective, a user-accessible service class and a queue are functionally equivalent.

For consistency, this documentation uses the term queue to mean a user-accessible service class as well as a runtime queue.

Option B is incorrect since this would add a maintenance overhead for the entire solution.

Options C and D are incorrect since the distribution styles would assist with ensuring performance for the major players.

For more information on workload management, please refer to the below URL.

https://docs.aws.amazon.com/redshift/latest/dg/c_workload_mngmt_classification.html

The best way to ensure that the resources for querying are distributed properly for the big players without much additional maintenance overhead is to create a separate user group and make use of Workload Management.

Workload Management (WLM) is a feature of Amazon Redshift that enables you to define how queries are prioritized and allocated resources. WLM enables you to create query queues and assign query groups to each queue, where each group has its own set of query processing rules. By creating a separate user group for the major players and assigning them to their own query queue, you can ensure that they receive a larger portion of the cluster resources when running queries.

This approach has the advantage of being relatively low maintenance overhead, as it only requires setting up the user group and WLM configuration once. Additionally, this approach allows you to ensure that the big players receive the resources they need without impacting the performance of the smaller players, as their queries will be processed in a separate queue.

Segregating the clusters based on the major players (option B) may be a viable approach, but it could result in additional maintenance overhead as you would need to manage multiple clusters. Furthermore, it may not be the most cost-effective approach, as each cluster incurs its own costs.

Creating tables based on the EVEN distribution style (option C) is not directly related to query resource distribution. Even distribution style distributes data across all the nodes in a Redshift cluster in a round-robin fashion. This is useful when you want to evenly distribute data across the cluster and do not have a good understanding of how data will be queried.

Creating tables based on the ALL distribution style (option D) is not recommended as it duplicates data across all the nodes in a Redshift cluster, which can result in high storage costs and slow query performance. This is only useful when you need to join very large tables that are not already distributed on the same key.