[002.1] Automated Backups and Disaster Recovery Part 1

If you haven't tested restore, you don't have a backup

Subscribe now

Automated Backups and Disaster Recovery Part 1 [01.31.2018]

Have you ever lost data? It can make for a bad day.

Now, imagine you lose your customer's data. That can make for a bad week, if you're lucky. If you're not so lucky, it can cost money, customer trust, bad publicity, and much more.

So, in today's episode, we are going to be discussing disaster recovery. In particular, we will be focusing on a few techniques to deal with data loss mitigation and resiliency.

Getting Started

When it comes to databases, at one time or another, most of us have dealt with configuration of automated backups, replication, and the headaches that can come from those tasks.

Today, we'll be taking a look at some of AWS's offerings for automatic backups, Multi-AZ, and read replicas.

Let's take a moment to describe some of the available options in AWS.

  • Backup

    • This simply refers to the action of taking a snapshot of a database in its current state. This snapshot can then be used to recreate the database at that specific point in time.
  • Read Replicas

    • Outside of backups, this is the most commonly used configuration. In its most basic form, you have the concept of master and slave. The master propagates changes over to the slave.
  • Multi-AZ

    • Multi-AZ stands for multiple availability zone, which is very telling of its nature. A Multi-AZ setup consist of having multiple databases: a primary and one or more standbys that are in different availability zones.

Backups

Starting with the simplest solution, we'll take a look at backups.

By default, when you create an RDS instance in AWS, daily backups are enabled with a 7 day retention policy. The backups are performed during the daily maintenance window, which is defaulted to a random time based on the specified region.

In the RDS dashboard, let's click Instances in the left navigation menu. Once we are on the Instances page, let's click the Launch DB instance button in the top right corner.

We should now be on a page similar to the one below, to select our database engine for our new instance.

rds instance create step 1

Select PostgreSQL and choose Next.

rds instance create step 2

Now, we should be on the Choose Use Case step. For our scenario, we can choose Dev/Test and click Next.

rds instance create step 3

On this page, we can check Only enable options eligible for RDS Free Usage Tier. This will limit our settings, but will be fine for our use case.

Scroll down and enter your Instance Identifier, Username, and Password, then click next.

rds instance create step 3

We are now on our last step: Configure Advanced Settings.

On this page, we need to make sure that Public Accessibility is set to Yes, set our Database Name, and scroll down to the Backup section.

Here, we can see the default retention period is set to 7 days. If we set this to 0, it would disable backups. We have the option of scaling this out to a maximum of 35 days.

We'll leave the defaults here and move down to the Maintenance section.

In the Maintenance section, you can see where you have the option of selecting a maintenance window, or leaving the default setting of No Preference for a random window to be assigned.

For now, we can leave that to its default setting and click Launch DB instance.

Once we've created the instance we can click the View DB Instance Details button which will take us to the overview page of our instance. You may notice that DB Instance Status is in a Creating state. It will take a few minutes for the instance to be available.

Once the instance is available, we can click the Instance Actions in the top right and see a dropdown of available actions. Specifically, we'll notice that we can trigger a manual snapshot with Take Snapshot and restore with Restore To Point and Time.

If we scroll further down the screen to the Snapshots section, we can see that we have one snapshot. This initial snapshot is created as part of the creation of the instance and plays into how the Restore From a Point in Time works.

Being able to restore from a point in time will use the latest snapshot before that point and use transaction logs up to the point so that you can restore from any point available in your retention policy.

So, let's dig into actually creating a snapshot and restoring.

Taking a snapshot

First, let's connect up to our database and add some data:

psql -Uroot -dtest -htest.cxywwt2zdqub.us-east-1.rds.amazonaws.com
test=> CREATE TABLE todos ( id SERIAL PRIMARY KEY, name varchar(100) );
test=> INSERT INTO todos (name) VALUES
test-> ('setup automatic backups'),
test-> ('take snapshot');
test=> select * from todos;
id |        name
----+-------------------------
3 | setup automatic backups
4 | take snapshot
(2 rows)

Now, let's take a snapshot.

We can scroll up to the top of the screen and click Instance Actions and choose Take Snapshot. It will ask us to enter a Snapshot Name and then we can click Take Snapshot.

Once we've created the snapshot, it should redirect us back to the Snapshots page and we should see our new snapshot.

snapshots

As you can see, I've named mine multiple-todos.

Restoring

Now that we've created a snapshot, let's delete some data from our database.

test=> delete from todos;
DELETE 2

Oops — we only wanted to delete one item, but accidentally deleted everything. Let's restore from our manual snapshot and see if we can get it back.

Let's go back to our Snapshots page and choose our snapshot. Then, in the top right, we'll click Snapshot Actions and choose Restore Snapshot.

We should now be on the Restore DB Instance page where we will configure a new instance to be created that our snapshot will be applied to.

On this page, let's select db.t2.micro for the instance class, which is the same as the instance we took our snapshot from. I'm going to set the DB Instance Name to test-snapshot. Now, we can scroll to the bottom and click Restore DB instance.

After a few minutes, our new instance should be available.

Before we can connect to the new RDS instance, we will need to go into the security group, click on the Inbound tab and set up a rule to allow traffic from our IP.

inbound rule - new db instance

Afterwards, we should be able to connect to the new DB instance and verify that our data is now available.

psql -Uroot -dtest -htest-snapshot.cxywwt2zdqub.us-east-1.rds.amazonaws.com
test=> select * from todos;
id |        name
----+-------------------------
3 | setup automatic backups
4 | take snapshot
(2 rows)

Now, we have access to all of our data from our snapshot; we could manually move bits of data over that we need, or point our application to our new database.

Summary

In this video, we went through an overview of backups, read replicas, and Multi-AZ. Then we configured a backup and created a snapshot. In part 2 we will configure a read replica and a Multi-AZ.

Title: Automated Backups and Disaster Recovery Part 2 Teaser: Configure a Read Replica and a Multi-AZ

Read Replica

Let’s discuss replicas.

As we mentioned earlier, in a replica configuration, you have the concept of a master and slave. The master is the primary database where the application is writing data. The master then propagates the changes to the replica. The method of propagation is usually asynchronous, utilizing transaction logs, but can be synchronous as well.

There are several benefits of using a read replica. One benefit is minimizing downtime. If you only utilize database backups as part of your disaster recovery plan (DRP) and you have a database failure, you will have to plan for a longer window of downtime. However, with a read replica, if the master fails, a slave can be promoted to master and your app will continue to run.

Another benefit of using replicas is the ability to split up reads between databases. If your application produces lots of database lookups for reporting purposes, it may make sense to offload some of those reads to a slave server, allowing your master to stay performant. Although we won't be covering that in this video, there are gems like octopus that allow you to easily support multiple databases for these type of scenarios.

Now, let's take a look at setting up a read replica.

Setting up replica

If we head back over to the RDS dashboard and go to the instances pages, we should see our test database that we created above. Let's select our database and click the Instance Actions button at the top right, and choose Create Read Replica.

At this point, we should be on a page similar to the page we used to create our initial database.

Most of the settings we will leave at their default. If we were going to leave this replica in place, we would preferably change the availability zone to be in a different region than our test database. But, since this is not going to be kept, we're going to leave it to the default.

What we will be changing is the DB Instance Identifier. I'm going to set this to test-replica. Then, scroll down to the bottom and click Create Read Replica.

This will return us to the test database instance page. If we click on Instances in the left navigation menu, we should see our replica is being created.

read replica creation

After a few minutes, the replica should be available and we now have replication setup.

Testing replication

We can test this by connecting to our test DB, writing some data, and verifying that it shows up in our replica.

➜  ~ psql -Uroot -dtest -htest.cxywwt2zdqub.us-east-1.rds.amazonaws.com
test=> insert into todos (name) values ('configure replica');


➜  ~ psql -Uroot -dtest -htest-replica.cxywwt2zdqub.us-east-1.rds.amazonaws.com
test=> select * from todos;
id |    name
----+-------------------
5 | configure replica
(1 row)

test=> insert into todos (name) values ('configure replica');
ERROR:  cannot execute INSERT in a read-only transaction

As we see here, our replication is up and running and we are only able to read from our replica.

One last thing to note is that if the master were to go down, we can easily promote the slave by going to RDS Dashboard > Instances, selecting the test-replica instance, clicking Instance Actions at the top right, and choosing Promote Read Replica.

promote read replicate

This will promote the replica, so it will no longer fetch changes via transaction logs.

Multi-AZ

Multi-AZ configuration is the last option we will discuss today.

In a Multi-AZ configuration, you’ll need a minimum of two database instances in different availability zones. With this setup, we have the notion of a primary and standby.

There are some major differences between a Multi-AZ configuration and read replica. For instance, in a Multi-AZ configuration, data is distributed synchronously so that all of the instances have the same data at any given time.

Also, with a Multi-AZ configuration, you can’t use the standby to offload reads from an application. The standby is only there for failover.

Lastly, failover is handled automatically. So, if the primary goes down, AWS changes the DNS records to point to a standby. So, there is no manual interaction needed for failover.

Multi-AZ configuration is just as easy to configure as a replica (actually, easier). If you followed along in our previous videos, when we initially set up the produciton database instance, we chose the production use case, which enables Multi-AZ by default.

production use case

But, even if you select the Dev/Test use case, once you're on the details page for creating the instance, you have the option of choosing a Multi-AZ setup.

multi-az setup

After that, the remaining configuration is the same as any other instance.

Summary

In this video, we configured a read replica and a Multi-AZ. We've discussed some of the pros and cons of each and some of the reasons to use one over the other.

At the end of the day, your disaster recovery plan is going to be catered to the needs of your business or application. In certain cases, longer downtime windows might be acceptable. High availability may or may not be a concern and your application might not have the amount of traffic that justifies replication.

We've only scraped the surface of what's available in terms of data loss mitigation and resiliency. We've added a few resources below that have far more information regarding what's available in AWS. If you're interested in this topic, you should definitely spend some time looking through them.

Resources