09 Mar

How to Design a Disaster Recovery Plan

dr plan

Image from dilbert.com

Here is a quick rundown of some of the elements I consider when consulting on disaster recovery plans:

1. Inventory Systems

The first step in a disaster recovery plan is to inventory what systems will need to be restored (and in what order, which involves the next step) in the event of a disaster. In general I find it helpful to group all systems by category (see below), but you can do it however makes sense for your organization.

a. Applications – All applications that will need to be backed up in case of a disaster.

b. Databases – All related application databases.

c. Infrastructure – All physical (and virtual) infrastructure which houses any data and processes associated with system functionality. This typically includes physical and virtual servers.

d. Hardware Components – Hardware includes any physical hardware that may need to be duplicated or restored in the event of a disaster that may result in physical destruction of property. Physical servers, switches, routers, racks, UPS systems, etc.

2. Data Classification

Once you have performed an inventory of your systems it is important to understand the importance of the data on each system. This ultimately helps prioritize backup, assess the impact of the disaster, and hedge against regulatory or compliance concerns. Data can be grouped into as many categories as necessary according to the type of data your organization stores.

a. Critical Data – Critical data is any data that must be restored immediately to resume operations or to meet regulatory requirements. This might include vital applications and databases, healthcare data, financial data, or data associated with the operation of systems that keep people safe (i.e., utilities, communications, etc.).

b. Ancillary Data – Ancillary data is any information that is not required to restore the system. Typically this data aids in the efficiency and effectiveness of a product or service, but is not required for operations to presume.

data classification3. Duplication Strategies

After you have inventoried and classified data and systems, a strategy should be developed to determine how systems will be backed-up and restored in the event of a disaster. Backup and restoration may be done in-house, by a third party, or some combination thereof.

a. In-House Ability – Does your organization have the resources to back-up systems in-house at another location. Many companies may have capacity at a nearby location to duplicate vital data and records – those that do not have this capability may need to outsource backup services to a third party data center.

b. Third-Part Resources – Most companies rely on third party facilities to maintain copies of critical data and infrastructure. Since third party data centers have the infrastructure and training required to quickly restore data and provide quality services in doing so – this may be a more efficient and cost effective solution.

4. Restoration Management

Whether your Company decides to manage resources internally or via a third party it is vital that all parties understand their roles and responsibilities for restoring systems after a disaster.

a. Understand Roles – Clearly understand, articulate, document, and communicate to relevant parties all roles required to successfully recover from a disaster. Be sure to include everything from reporting the disaster, identifying the downed-components, contacting vendors and employees, and the specific steps necessary to re-boot the inventoried systems.

b. Agreed Upon Procedures – Restoring a potentially large number of systems require a great deal of technical know-how and coordination. This is why it is very important to document the steps required to restore each application, server, and database. This enhances restoration efficiency and ensures that employees clearly understand their role in the DR process.

5. Disaster Recovery Testing

After you design a disaster recovery program it is important to design and perform a periodic DR test. A DR test helps to ensure that everyone in the organization understands their roles and can perform them in an efficient and effective manner. In addition, DR testing helps uncover restoration and recovery gaps prior to a real event.

a. DR Policies and Procedures – All DR policies and procedures should be documented and communicated across the organization. Those individuals who have DR responsibilities should help design the policies and procedures. (Also see agreed upon procedures above.)

b. Periodic Tests – Periodic DR tests should be thoroughly documented and performed on a regular basis. Typically DR tests are performed on a semi-annual basis. Some companies choose to segment DR testing by location, application, or both. This may reduce the resource burden of DR testing on the business and the Company’s commitment to customers.

3 thoughts on “How to Design a Disaster Recovery Plan

  1. Christian – Where is your Business Impact Assessment? This is the best way to identify what systems are needed and what the timing of the restore needs to be. Also, this can be used to define where prevention of outage can be used to effectively support the systems and also the dollars when building a solution.

    Hope you are well!! Good Stuff!

  2. Another great article. I’d also like to see your thoughts regarding Business Continuity – which I see as a different take on DR. A lot of companies prepare for the “smoking hole” scenario. But a small outage affecting a system or two is more likely. Ideally a DR/BC plan is flexible to handle both scenarios. But it takes a little more planning and consideration. You don’t want to be stuck invoking the entire DR plan when only one system is down.

Leave a Reply