One of the primary responsibilities of a database administrator is to architect an effective and cost-saving backup and retention policy to manage database backups throughout its lifecycle. In AWS, this could save the company a chunk of change each month. And hear me out when I say, regardless of the savings, any amount is better than nothing. The elasticity of the cloud can inherit some lousy architecture behaviors, and due to its flexible provisioning model, it will be susceptible to over-provisioning. Overprovisioning could lead to wasted hardware and resources and cost the company a lot of money. One critical area that I will focus on is architecting a backup/retention policy that will save money throughout the retention lifecycle.
EBS Volume Types
It is imperative to understand the different EBS volume types when architecting a backup solution for SQL Server. The volume types fall into two categories: SSD-backed volumes optimized for transactional workloads and HHD-backed volumes optimized for large streaming workloads. The table below describes the use cases and performance characteristics for the General Purpose SDD, Provisioned IOPS SSD, and the Throughput Optimized HDD volume types.
|Solid-State Drives (SSD)||Hard disk Drives (HDD)|
|Volume Type||General Purpose SSD (
||Provisioned IOPS SSD (
||Throughput Optimized HDD (
|Description||General purpose SSD volume that balances price and performance for a wide variety of workloads||Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads||Low-cost HDD volume designed for frequently accessed, throughput-intensive workloads|
|Volume Size||1 GiB – 16 TiB||4 GiB – 16 TiB||500 GiB – 16 TiB|
|Max. Throughput/Volume||160 MiB/s||500 MiB/s†||500 MiB/s|
|Max. Throughput/Instance††||1,750 MiB/s||1,750 MiB/s||1,750 MiB/s|
|Dominant Performance Attribute||IOPS||IOPS||MiB/s|
I recently attended an AWS AWSome day event in Fort Lauderdale and almost everyone that I speak with used General Purpose SSDs for everything, including the backup volume. Upon asking for an elaboration on this use case, most people responded that an EBS volume is the default volume type when creating an instance or provisioning an EBS volume. I responded by saying, “if there wasn’t a default volume type, which volume type would you choose and why”? Most people became puzzled because they don’t fully understand the difference between the volume types.
Selecting the right backup volume
While it may be tempting to use either a gp2 or io1 EBS volume type for the backup disk, a Throughput Optimized HDD (st1) drive is more suitable. Additionally, throughput is of importance, not IOPS. Furthermore, st1 is only $0.045 per gigabyte per month while gp2 is $0.10 and st1 is $0.125 plus $0.065 per provisioned IOPS-month. Note that these charges are region-specific; however, of the three EBS volume types mentioned above, st1 will always be cheaper regardless of the region. When architecting a backup solution, do keep in mind the minimum amount of drive space for each volume type. The minimum size for st1 is 500 GiB while gp2 is 1 GiB and io1 is 4 GiB. A GiB is equivalent to 1024^3 bytes. There is another volume type, the Cold HDD (sc1), that is similar to Throughput Optimized HDD; however, it only offers up to 250 MiB/s of throughput. On the plus side, it will cost nearly half the price of st1 – $0.025/GB per month.
S3 Storage classes
AWS currently offers five storage classes: S3 Standard, S3 Standard-IA, S3 One Zone-IA, S3 Reduced Redundancy, and Glacier. The following table provides an overview of the different storage classes.
|S3 Standard||S3 Standard-IA||S3 One Zone-IA||Amazon Glacier|
|Designed for Durability||99.999999999%||99.999999999%||99.999999999%†||99.999999999%|
|Designed for Availability||99.99%||99.9%||99.5%||N/A|
|Minimum Capacity Charge per Object||N/A||128KB*||128KB*||N/A|
|Minimum Storage Duration Charge||N/A||30 days||30 days||90 days|
|Retrieval Fee||N/A||per GB retrieved||per GB retrieved||per GB retrieved**|
|First Byte Latency||milliseconds||milliseconds||milliseconds||select minutes or hours***|
S3 Reduced redundancy, though not listed above, is used for frequently access data and is recommended for storing “…noncritical, reproducible data at lower levels of redundancy than Standard.”
On another note, S3 One Zone-IA stores data in one availability zone. If the data is not critical to the business, this storage class might be a viable option compared to S3 Standard or S3 Standard-IA. Of the four S3 storage classes, S3 One Zone-IA is the most affordable.
AWS S3 Pricing (varies by region)
|S3 Standard Storage|
|First 50 TB / Month||$0.023 per GB|
|Next 450 TB / Month||$0.022 per GB|
|Over 500 TB / Month||$0.021 per GB|
|S3 Standard-Infrequent Access (S3 Standard-IA) Storage|
|All storage||$0.0125 per GB|
|S3 One Zone-Infrequent Access (S3 One Zone-IA) Storage|
|All storage||$0.01 per GB|
|Amazon Glacier Storage|
|All storage||$0.004 per GB|
Here is a general backup policy
- The database should be backed up locally to a Throughput Optimized HDD (st1) volume.
- Upload the backups to S3 Standard using AWS Multipart Upload API (https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html)
- You can also use AWS CLI for Multipart Upload. See: https://aws.amazon.com/premiumsupport/knowledge-center/s3-multipart-upload-cli/
- Use a lifecycle policy to transition the backups from S3 Standard to S3 Infrequent Access
- Use a lifecycle policy to transition the backups from S3 Infrequent Access to Glacier
- Use AWS CLI to delete the archives in Glacier when the retention period expired
This backup/retention policy will cut costs by more than 50% compared to having backups on a gp2 disk. Furthermore, for more cost-savings, you could transition objects directly from S3 Standard to Glacier or even bypass S3 Standard and go from S3 Infrequent Access to Glacier. You have to change the storage class at the object level to upload a file to S3 Standard-IA. You can use the –storage-class parameter to upload a file to S3 Standard-IA, One Zone-IA, or Reduced Redundancy.
Consider deleting the backup chain at the same time from local storage. Let’s say you have a retention policy of 15 days and full backups are taken on Sundays, differentials nightly, and log backups every 2 hours; the full backup will get deleted first, and the last differential backup six days later. Unless there is a regulation or compliance that your company must adhere to, there’s no need to keep differential and log backups if the full backup no longer exist.