Creating S3 backups of EC2 data in the AWS cloud


posted | about 4 minutes to read

tags: amazon web services ec2 s3 system administration

As part of my switch over to the AWS cloud and specifically EC2 after my physical server started having issues, I wanted to make sure that I had reliable, consistent backups. Amazon provides a great solution for this in the form of Amazon S3, a service that provides eleven 9’s of durability for stored data. It’s super easy to set up backups, especially if you’re entirely in the AWS cloud - you won’t even have to deal with authentication keys or anything if we set up IAM roles appropriately, which I’ll go over in this guide. Without further ado, let’s get started.

Creating a bucket

First off, let’s create an S3 bucket to hold our backups. To do this, open up your AWS console, click “Services” at the top of the page, click “S3”, and then press the Create Bucket button. Bucket names are unique across the entire S3 system, so you may have to try a bit to find a bucket name that you can use. We don’t need to do anything else with this bucket - we’ll handle access to it via IAM roles and policies.

Setting up IAM

We’ll start by creating the policy that will allow our EC2 instance to access the S3 bucket. Open up IAM from the Services menu, then click on Policies on the left hand side of the page. Click Create Policy, then Copy an AWS Managed Policy. When it prompts you, copy the policy from AmazonS3FullAccess. It will then ask you to modify the policy. Look for the “Resource” line, and replace “*” with “arn:aws:s3:::your-bucket-name”. This limits the policy to only grant access to your bucket. Validate the policy and save it.

Now that we have created our policy, we need to create a role that we will assign the EC2 instance to. Click on Roles on the left hand side of the page, then create a new role. Name it whatever you’d like - I called mine “ec2-s3-backup” - and then assign it as an Amazon EC2 role. Then, attach your backup policy that you created earlier.

Spinning up an EC2 instance

Hop into the EC2 console and start a new instance. Set it up in whatever way fits your needs, but make sure you assign the EC2 backup role when it asks you about assigning roles. If you forget this step, you can assign the role later. Also, if you’d like to do this on an existing instance, you can simply assign the role to your currently running instances.

Once you spin up the server, open up a shell session. (Or RDP, I suppose, if you’re on Windows). You’ll need to install the AWS CLI to allow for the programmatic backups to run if you end up using my script below, so let’s do that first.

If you’re on Windows, you’ll need to use the packaged installer. Otherwise, it’s built in on Amazon Linux, and available via apt-get install awscli on Ubuntu and yum install epel-release && yum install awscli on CentOS.

Once you’ve got this all in place, go ahead and move your data onto the server so that we have something to back up.

 

Automating your backups

I wrote this script under the assumption that you’re running Linux.

#!/bin/bash
dom=`date '+%d'`
dow=`date '+%w'`

zip -q -r backup.zip /path/to/files/*
  
aws s3 cp backup.zip s3://bucket-name/backup-daily.zip
  
if [[ $dow == 01 ]]; then
  aws s3 cp backup.zip s3://bucket-name/backup-weekly.zip
fi
  
if [[ $dom == 01 ]]; then
  aws s3 cp backup.zip s3://bucket-name/backup-monthly.zip
fi

There are obviously so many changes you could make here. One of my personal use cases is adding individual, gzipped MySQL database dumps to the backup set:

databases=`$MYSQL --user=$MYSQL_USER -p$MYSQL_PASSWORD -e "SHOW DATABASES;" | grep -Ev "(Database|information_schema|performance_schema)"`
  
for db in $databases; do
  mysqldump --force --opt --user=$MYSQL_USER -p$MYSQL_PASSWORD --databases $db | gzip > "/root/mysql/$db.gz"
done

I’m interested in hearing your backup ideas for other data in the comments!

Taking it further

You can do a lot more with Amazon S3 backups if you’re willing to play around with it a little. Built-in versioning is a good starting point, although if you’re not careful costs can mount up quickly. Therefore, using file-specific lifecycle configurations might help; for example, you could set your daily backups to permanently delete and remove markers after 7 days, weeklies after 30 days, and monthlies after a year, and you could easily maintain a year’s worth of backups while minimizing storage costs. You could even look into the Infrequent Access storage class for your monthly backups in order to save even more money; IA costs about 1/3 of the price of S3 standard storage. You can configure IA through lifecycle configurations.