Ideal Partition Key for DynamoDB Table

Ideal Partition Key

Prev Question Next Question

Question

Your company currently is maintaining excel sheets with data that now needs to be ported to a DynamoDB table.

The excel sheet contains the following headers for the data. · Customer ID · Customer Name · Customer Location · Customer Age Which of the following would be the ideal partition key for the data table in DynamoDB?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - A.

To get better performance on your underlying DynamoDB table, you should choose a partition key that would give an even distribution of values, and this would be the Customer ID.The below snapshot from the AWS Documentation gives some of the recommended partition key values.

Because of the AWS Documentation's recommendations on better partition key values, all other options are invalid.

For more information on partition key design, please refer to the below URL-

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-uniform-load.html
Here is a comparison of the provisioned throughput efficiency of some common partition key schemas:

Partition key value Uniformity
User ID, where the application has many users. Good
Status code, where there are only a few possible status codes. Bad

Item creation date, rounded to the nearest time period (for example, day, hour, or minute). Bad

Device ID, where each device accesses data at relatively similar intervals. Good

Device ID, where even if there are many devices being tracked, one is by far more popular than all the others. Bad

The partition key in DynamoDB is used to partition the table's data across multiple nodes in the database, allowing for efficient access and scalability. It's important to choose the right partition key for the table to optimize performance and minimize costs.

In this case, the ideal partition key for the DynamoDB table would be "Customer ID" (option A).

Here are a few reasons why:

  1. Uniqueness: The partition key should be unique for each item in the table to ensure even distribution of data across partitions. "Customer ID" is likely to be unique for each customer, making it a good candidate for the partition key.

  2. Queryability: The partition key should also be used in queries to retrieve specific items from the table efficiently. "Customer ID" would be a good choice for queries that retrieve data for a specific customer.

  3. Uniformity: The partition key should distribute data evenly across partitions to ensure that no single partition becomes a performance bottleneck. "Customer ID" is likely to distribute data evenly, especially if it's a randomly generated identifier.

In contrast, the other options would not be as good for the following reasons:

  • Customer Name (Option B): Customer names may not be unique, and there's no guarantee that queries for specific names would distribute evenly across partitions.
  • Customer Location (Option C): Customer locations may not be evenly distributed, which could result in uneven data distribution across partitions.
  • Customer Age (Option D): Age is not likely to be unique for each customer, and there's no guarantee that queries for specific ages would distribute evenly across partitions.

Therefore, Option A (Customer ID) would be the best partition key for the DynamoDB table.