How to Use Playbooks to Execute an Incident Recovery Plan

A playbook is the official, formal written record that describes policies and processes that will reliably produce a working deployment of an organization's resource stack. When it comes to generating predictable results, the playbook is the plan.

I'll describe all the key elements of a good playbook in just a moment. But it's important to emphasize that a playbook on its own is more or less useless unless your team is able to read it and convert it into real-world results.

To do that you'll need to make sure every relevant member of your team is completely familiar with their roles and how they'll be expected to carry them out. That'll require you to distribute copies of the plan and ensure that everyone gets the training they'll need to perform perfectly when the time arrives.

This article comes from my Introduction to Cybersecurity course. If you'd like, you can follow the video version here:

How to Define Playbook Scope

At any rate, a good plan begins with clear definitions:

Where can you find up-to-date and clean copies of the source code?
Where should your production environment be hosted? In a public cloud like AWS? On-premises?
What is the infrastructure supposed to accomplish?
What's the scope of your operation: what scale of hardware resources will it require?

A playbook should also clearly define the policies that must be followed through the rebuilding process:

How is organizational data to be protected?
What decisions must be made only by senior company officers?
Are there restrictions on what software and third party solutions can be used...or from which countries they can be acquired?
Are there stack components that must remain local, or can everything live in the cloud?

How to Define the Tools

Perhaps the core of any playbook is the section addressing the software and deployment tools and procedures that you'll use at every stage of your workflow.

This section should include the complete code for the scripts handling moving resources from code to deployment, along with links to all the software code in use, and instructions for authenticating to the services you'll be using.

How to Define the Participants

IT deployments are performed by people. But which people?

Who do you speak to who has access to a credit card so you can purchase needed resources?
Who has access to the key codebases and online accounts you'll need?
Who's responsible for testing and signing off before code is pushed to production?
What if that person isn't available?

Each and every role relating to the project you're documenting needs to be defined, and the person responsible must be identified – along with current contact information.

Beyond operational contacts, the playbook should also include a complete company communications directory. If you're paying someone a paycheck each month, the odds are they'll be expected to perform some important function during a recovery. So you'll want a reliable source for contact information – preferably containing multiple contact endpoints for each person.

How to Document Your Recovery

Recovery operations can be chaotic. But it's nevertheless critically important that log records for every step – pre-, post-, and during recovery – should be kept. So log generation and storage should also be part of your playbook.

Even if you don't have the time to read them right now, they'll be invaluable later as you try to review events and figure out exactly what happened. The existence of accurate and reliable logs and other records might actually be legally mandated.

Any code review and application testing you would normally incorporate in your deployment lifecycles should be included in your recovery playbook. After all, bugs and failures aren't going to be any more fun after a crisis than they were before it. The actual code for all the scripts that would normally power your testing should be included here, too.

How to Keep Your Playbook Current

Finally, your playbook should be regularly updated to reflect changes to your application and its supporting environment. Naturally, you want to keep all details up-to-date, including changes to the personnel responsible for specific roles, along with their correct contact information.

A complete playbook created for a relatively complex operation can easily run into the hundreds of pages. When you add the task of coordinating the actions of all the many individuals who will be involved in your recovery, the whole thing might feel a bit hard to manage. Unfortunately, you just have to do this: there's really no alternative.

How to Automate Your Recovery

Well, there's almost no alternative. Remember how I told you that you should include complete operations scripts and links to your code base in the playbook? Do you think our playbook could be convinced to play itself? Why not?

Think about it. Orchestration tools like Ansible or Terraform – or cloud-specific tools like Amazon's CloudFormation – allow you to very closely define every layer of your infrastructure in a format that can be invoked and launched with a single command.

In theory at least, there's no reason why you couldn't build your playbook as an actual script, complete with commands to pull software repos, launch complex virtual networks and compute instances, and route DNS domains. That would be a fantastic example of the power of infrastructure as code.

How to Test Your Playbook

While we're still on the topic of plans and playbooks, I should add one more very important note. If you're going to go to all the trouble of researching and then writing a playbook, you don't want to discover in the middle of a crisis that your plans don't actually work.

The safe assumption is that nothing in technology will work unless it's been carefully and repeatedly tested in advance. That's true of recovery playbooks, and it's just as true of backups: until you've successfully restored a backup archive into a real environment, you should assume it'll fail.

With what you've now seen about the scope, tools, documentation, updates and automation for playbooks, you're now all set to get to work creating your own. Well don't let me get in your way!

This article comes from my Introduction to Cybersecurity course. And there's much more technology goodness available at bootstrap-it.com