AWS SQL Server Backup/Retention Policy

One of the primary responsibilities of a database administrator is to architect an effective and cost-saving backup and retention policy to manage database backups throughout its lifecycle. In AWS, this could save the company a chunk of change each month. And hear me out when I say, regardless of the savings, any amount is better than nothing. The elasticity of the cloud can inherit some lousy architecture behaviors, and due to its flexible provisioning model, it will be susceptible to over-provisioning. Overprovisioning could lead to wasted hardware and resources and cost the company a lot of money. One critical area that I will focus on is architecting a backup/retention policy that will save money throughout the retention lifecycle.

EBS Volume Types

It is imperative to understand the different EBS volume types when architecting a backup solution for SQL Server. The volume types fall into two categories: SSD-backed volumes optimized for transactional workloads and HHD-backed volumes optimized for large streaming workloads. The table below describes the use cases and performance characteristics for the General Purpose SDD, Provisioned IOPS SSD, and the Throughput Optimized HDD volume types.

Solid-State Drives (SSD) Hard disk Drives (HDD)
Volume Type General Purpose SSD (gp2)* Provisioned IOPS SSD (io1) Throughput Optimized HDD (st1)
Description General purpose SSD volume that balances price and performance for a wide variety of workloads Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads Low-cost HDD volume designed for frequently accessed, throughput-intensive workloads
Use Cases
  • Recommended for most workloads
  • System boot volumes
  • Virtual desktops
  • Low-latency interactive apps
  • Development and test environments
  • Critical business applications that require sustained IOPS performance, or more than 10,000 IOPS or 160 MiB/s of throughput per volume
  • Large database workloads, such as:
    • MongoDB
    • Cassandra
    • Microsoft SQL Server
    • MySQL
    • PostgreSQL
    • Oracle
  • Streaming workloads requiring consistent, fast throughput at a low price
  • Big data
  • Data warehouses
  • Log processing
  • Cannot be a boot volume
API Name gp2 io1 st1
Volume Size 1 GiB – 16 TiB 4 GiB – 16 TiB 500 GiB – 16 TiB
Max. IOPS**/Volume 10,000 32,000*** 500
Max. Throughput/Volume 160 MiB/s 500 MiB/s† 500 MiB/s
Max. IOPS/Instance 80,000 80,000 80,000
Max. Throughput/Instance†† 1,750 MiB/s 1,750 MiB/s 1,750 MiB/s
Dominant Performance Attribute IOPS IOPS MiB/s

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

I recently attended an AWS AWSome day event in Fort Lauderdale and almost everyone that I speak with used General Purpose SSDs for everything, including the backup volume. Upon asking for an elaboration on this use case, most people responded that an EBS volume is the default volume type when creating an instance or provisioning an EBS volume. I responded by saying, “if there wasn’t a default volume type, which volume type would you choose and why”? Most people became puzzled because they don’t fully understand the difference between the volume types.

Selecting the right backup volume

While it may be tempting to use either a gp2 or io1 EBS volume type for the backup disk, a Throughput Optimized HDD (st1) drive is more suitable. Additionally, throughput is of importance, not IOPS. Furthermore, st1 is only $0.045 per gigabyte per month while gp2 is $0.10 and st1 is $0.125 plus $0.065 per provisioned IOPS-month. Note that these charges are region-specific; however, of the three EBS volume types mentioned above, st1 will always be cheaper regardless of the region. When architecting a backup solution, do keep in mind the minimum amount of drive space for each volume type. The minimum size for st1 is 500 GiB while gp2 is 1 GiB and io1 is 4 GiB. A GiB is equivalent to 1024^3 bytes. There is another volume type, the Cold HDD (sc1), that is similar to Throughput Optimized HDD; however, it only offers up to 250 MiB/s of throughput. On the plus side, it will cost nearly half the price of st1 – $0.025/GB per month.

S3 Storage classes

AWS currently offers five storage classes: S3 Standard, S3 Standard-IA, S3 One Zone-IA, S3 Reduced Redundancy, and Glacier. The following table provides an overview of the different storage classes.

S3 Standard S3 Standard-IA S3 One Zone-IA Amazon Glacier
Designed for Durability 99.999999999% 99.999999999% 99.999999999%† 99.999999999%
Designed for Availability 99.99% 99.9% 99.5% N/A
Availability SLA 99.9% 99% 99% N/A
Availability Zones >3 >3 1 >3
Minimum Capacity Charge per Object N/A 128KB* 128KB* N/A
Minimum Storage Duration Charge N/A 30 days 30 days 90 days
Retrieval Fee N/A per GB retrieved per GB retrieved per GB retrieved**
First Byte Latency milliseconds milliseconds milliseconds select minutes or hours***
Storage Type Object Object Object Object
Lifecycle Transitions Yes Yes Yes Yes

Source: https://aws.amazon.com/s3/storage-classes/

S3 Reduced redundancy, though not listed above, is used for frequently access data and is recommended for storing “…noncritical, reproducible data at lower levels of redundancy than Standard.”

On another note, S3 One Zone-IA stores data in one availability zone. If the data is not critical to the business, this storage class might be a viable option compared to S3 Standard or S3 Standard-IA. Of the four S3 storage classes, S3 One Zone-IA is the most affordable.

AWS S3 Pricing (varies by region)

  • US East (N. Virginia)
Pricing
S3 Standard Storage
First 50 TB / Month $0.023 per GB
Next 450 TB / Month $0.022 per GB
Over 500 TB / Month $0.021 per GB
S3 Standard-Infrequent Access (S3 Standard-IA) Storage
All storage $0.0125 per GB
S3 One Zone-Infrequent Access (S3 One Zone-IA) Storage
All storage $0.01 per GB
Amazon Glacier Storage
All storage $0.004 per GB

Source: https://aws.amazon.com/s3/pricing/

Here is a general backup policy

  1. The database should be backed up locally to a Throughput Optimized HDD (st1) volume.
  2. Upload the backups to S3 Standard using AWS Multipart Upload API (https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html)
  3. Use a lifecycle policy to transition the backups from S3 Standard to S3 Infrequent Access
  4. Use a lifecycle policy to transition the backups from S3 Infrequent Access to Glacier
  5. Use AWS CLI to delete the archives in Glacier when the retention period expired

This backup/retention policy will cut costs by more than 50% compared to having backups on a gp2 disk. Furthermore, for more cost-savings, you could transition objects directly from S3 Standard to Glacier or even bypass S3 Standard and go from S3 Infrequent Access to Glacier. You have to change the storage class at the object level to upload a file to S3 Standard-IA. You can use the –storage-class parameter to upload a file to S3 Standard-IA, One Zone-IA, or Reduced Redundancy.

Consider deleting the backup chain at the same time from local storage. Let’s say you have a retention policy of 15 days and full backups are taken on Sundays, differentials nightly, and log backups every 2 hours; the full backup will get deleted first, and the last differential backup six days later. Unless there is a regulation or compliance that your company must adhere to, there’s no need to keep differential and log backups if the full backup no longer exist.

2 comments

  1. Guillermo G Reply
    April 24, 2018 at 8:04 pm

    Great post! What are some of the options to backup SQL Server databases in AWS, aside from SQL Agent jobs? Jobs work just fine when configured correctly, just wondering if there are any alternatives in AWS.

    • Michael Reply
      May 1, 2018 at 2:40 pm

      Customer-managed backups are related to AWS EC2 instances and not RDS. AWS manages the RDS backups for you as well as upgrades to the OS and database software; you schedule the backup and retention policy, and AWS will store them in S3 storage.

      In regards to EC2 instances (customer managed virtual servers) where SQL Server is installed, the customer is responsible for maintenance of the OS and any other maintenance tasks such as configuring backups for SQL Server or snapshots of the underlying disk volumes. Currently, there is no way to directly backup to S3 without using third-party tools which is capable of mapping an S3 bucket on a server. Quite frankly, I wouldn’t use third party tool for this due to the overhead that might be involved. I would instead backup locally to a Through Optimized HDD volume (st1) and then upload the file to S3.

      One thing to keep in mind is that the max file size for a single S3 PUT is 5GB. As a result of this limitation, you have to split the backup file into multiple parts and use S3 multi-part upload API to upload each part. Upon successful upload of all the pieces, S3 will reassemble them into one object. I would suggest storing metadata in a database so that you can easily track information such as file name, size, backup date, retention policy etc… You can find the metadata for a database backup in the following msdb tables: backupset, backupmediafamily, etc.

      You can use whatever tool is in your toolset for backups. If it is a centrally managed tool such as LiteSpeed, you might need to open specific ports to allow communication to your database server.

      LOCAL STORAGE

      • Scripts (they are free!!! such as Ola Hallengren database maintenance solution)
      • Third party tools
      • such as LiteSpeed
      • will cost some $$$
      • but you can get good compression
      • usually faster than a native backup

      S3

      GLACIER

      • Use S3 lifecycle policy to automatically transition objects to Glacier
      • When your full backup expires, delete the entire backup chain from Glacier

Leave A Comment

Please be polite. We appreciate that. Your email address will not be published and required fields are marked