AWS Certified Big Data - Specialty Exam: Best Practices for Loading Time Series Tables

Designing Time Series Tables

Question

Allianz Financial Services (AFS) is a banking group offering end-to-end banking and financial solutions in South East Asia through its consumer banking, business banking, Islamic banking, investment finance and stock broking businesses as well as unit trust and asset administration, having served the financial community over the past five decades. AFS uses Redshift on AWS to fulfill the data warehousing needs and uses S3 as the staging area to host files.

AFS uses other services like DynamoDB, Aurora, and Amazon RDS on remote hosts to fulfill other needs.

The tem needs to design Time Series tables.

Please advise the best practices to load the time series tables.

select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Answer : A, B, E.

Option A is correct - If your data has a fixed retention period, we strongly recommend that you organize your data as a sequence of time-series tables.

In this sequence, each table should be identical but contain data for different time ranges.

create a UNION ALL view to hide the fact that the data is stored in different tables

https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.html

Option B is correct - If your data has a fixed retention period, we strongly recommend that you organize your data as a sequence of time-series tables.

In this sequence, each table should be identical but contain data for different time ranges.

create a UNION ALL view to hide the fact that the data is stored in different tables

https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.html

Option C is incorrect -If your data has a fixed retention period, we strongly recommend that you organize your data as a sequence of time-series tables.

In this sequence, each table should be identical but contain data for different time ranges.

create a UNION ALL view to hide the fact that the data is stored in different tables

https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.html

Option D is incorrect - If your data has a fixed retention period, we strongly recommend that you organize your data as a sequence of time-series tables.

In this sequence, each table should be identical but contain data for different time ranges.

create a UNION ALL view to hide the fact that the data is stored in different tables

https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.html

Option E is correct -Create a UNION ALL view to hide the fact that the data is stored in different tables.

When you delete old data, simply refine your UNION ALL view to remove the dropped tables.

Similarly, as you load new time periods into new tables, add the new tables to the view.

https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.html

Option F is incorrect - Create a UNION ALL view to hide the fact that the data is stored in different tables.

When you delete old data, simply refine your UNION ALL view to remove the dropped tables.

Similarly, as you load new time periods into new tables, add the new tables to the view.

https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.html

The best practices for loading time series tables are:

B. In the sequence, each table should be identical but contain data for different time ranges. This approach is known as partitioning. It is a common practice to partition tables based on a date or timestamp column, allowing for efficient querying and management of data for specific time ranges. Each partition is a separate physical file, making it easier to manage and load data efficiently.

D. One single table with extending time ranges. This approach involves using a single table to store all time series data, with a column indicating the time or date of each record. This approach is useful when the data volume is small, and querying and managing a single table are easier. However, as the data volume grows, the table can become unwieldy and slow to query.

E. Use DROP TABLE instead of running a large-scale DELETE and a subsequent VACUUM process to reclaim space. This approach involves dropping the entire table and recreating it from scratch rather than using DELETE and VACUUM to reclaim space. This approach is useful when a large number of rows need to be deleted, and it is faster and more efficient to drop the table and recreate it.

Option A is incorrect because it is not a recommended approach. It is best to partition the tables based on time or date.

Option C is incorrect because the time-based column is required to properly store and manage time series data.

Option F is incorrect because it is not recommended to use DELETE and VACUUM to reclaim space, as it can lead to performance issues and increased costs.