[NTLUG:Discuss] Disaster recovery

Robert Pearson rdpears at gmail.com
Fri Aug 12 15:18:11 CDT 2005


On 8/12/05, Leroy Tennison <leroy_tennison at prodigy.net> wrote:
> At the risk of starting a "religious" war, what would you consider to be
> the crucial things to document about a Linux system for the purpose of
> disaster recovery?

It sounds like you are asking about quickly, or even at all,
rebuilding a server that is lost completely. That is one common type
of Disaster Recovery (DR). IMHO Disaster Recovery (DR) is more of a
process than a series of discrete steps. If you start laying out the
process the discrete steps fall right into place. This is the best
reason to test DR before you really need it. Did you get the process
right?

You could think of Disaster Recovery as one of two types:
(1) Local Disaster Recovery - loss of a server, a cluster of servers,
a group of servers, a roomful of servers, a part of a floor of servers
or the entire floor, or a site building housing some of the servers.

You can group these by:
Performance Hit - loss of a server, a cluster of servers
Revenue Hit - LOB (Line or Lines of Business) Hit - a group of
servers, a roomful of servers, a part of a floor of servers or the
entire floor, or a site building housing some of the servers.

(2) Site Disaster Recovery - the passive mirror site, the active
mirror site, the site (building) all the Production servers are at (or
in) or the "Total (Nuclear) Meltdown".
Total (Nuclear) Meltdown Definition - loss of the Production site, the
mirror site (or sites), the Disaster Recovery site and all the
recovery material stored offsite and/or loss of all key Disaster
Recovery personnel. The knowledge left with the building.

Both of the above are server scenarios. 

There is actually a third type of Disaster Recovery that has to do with Storage.
(1) Revenue Hit - you lose the storage for the Information that
generates 80% of your revenue
(2) Out of Business Hit - you lose the storage for the "key"
Information that keeps you in business

Since most people do not know what that Information is for (1) and (2)
they backup everything and plan to recover everything in case of a
Disaster. Content Management can be a big help here.

In the past this was not a big issue because the storage was Direct
Attached to the servers. Since NAS and SAN and Virtualization have
become real this has become an issue. On the one hand if your servers
and storage are Geographically Dispersed from each other you are more
secure from Disasters. On the other hand you are not.

The Electronic Inventory and Imaging Server solutions address the
original question more directly and will be in a separate email. The
Electronic Inventory is similar to what "cfengine" or "servdoc" or any
good Configuration Management software does. It gets a "point-in-time
snapshot" of the configuration to give you "Configuration at a Glance"
for really quick recovery. An Imaging Server is just a bunch of "dd"
image files for quick recovery of servers. It can be more, much more.
If you have the time...

Thanks,  Robert




More information about the Discuss mailing list