Perils of Improper Disaster Recovery

I have been using LVM for personal and client usage. I have found it to be quite robust in managing swaths of disc space. Lately, I had begun to realize that I needed to address redundancy and possibly consider cloning some of the running applications. Primarily, I am running web based applications (ie SugarCRM, Moodle and OpenEMR).  At the core it is basically, apache and mysql database. I have heard of the advantages of Amazon EC2 and other so-called cloud solutions. Nonetheless, the strategy that I have deployed is a steady dose of rsync coupled with a very reliable storage repository called rsync.net   For those who might not be familiar with rsync.net, they provide a very reasonable and robust archiving and storage space. The cost is reasonable and they are absolutely a DIY shop. Nonetheless, if you need help with scripting backup solutions, they provide a very comprehensive documentation archive to help end-users get the most out of their offerings.  Primarily all you need is a set of ssh public keys and some ability to run rsync daemon.

All of this aside, I can say with certainty that the robustness of LVM comes with a complexity penalty.  A blend of dd and carelessness corrupted one of my LVM partitions which unfortunately had several directories whose filesystem were not backed up.  I spent time use a hex editor to carefully, parse through the metadata on each of the LVM volumes, hoping to find discernible differences. The idea here would be to look at the time stamps and then restore the LVM partition to its earlier state. All of these attempts failed miserably.

  • LVM

LVM (Photo credit: Luis M. Gallardo D.)

















Ultimately, after being unable to make sense of the inconsistencies within the various LVM volume groups, I simply had to punt and reinstall the operating system. Luckily there _were_ some back-ups available. Unfortunately, ssh keys, VPN certificates and such had to be recreated. Lots of work was lost and some very painful but necessary lessons were learned.

  • Establish a fault tolerant means of restoration
  • Automation of system configuration
Because LVM can be fairly complex to restore when a volume gets corrupted, it is important to ensure filesystem integrity. It is not enough to take snapshots of the logical volume. This is particularly true when your volume group occupies two physical disks. Though it is possible to discern where data begins and ends within a logical volume, it certainly is not for the faint of heart.  So, robust filesystem back-ups should be the order of the day. Again I would encourage the use of rsync and a virtual repository like rsync.net.

lvm offers a means to take snapshots of selected logical volumes. It basically retains a frozen copy of portions of the volume that change over time. Though this is not the same as ZFS or hammer of BSD, it does provide a measure of fault tolerance.

Next the idea of automated system configuration is the dream of every sysadmin. The gruesome prospect of having to re-install software and re-create user accounts on a system which must be restored is the equivalent of a splinter in the eye ball. Not much fun at all.
I had heard about CFEngine, Chef and Puppet, but had never deployed either configuration management tool. It appears that Puppet has a vibrant community around that project, so I thought it would be good to learn how to use Puppet. Won't get into a How-To for deploying Puppet, as it is beyond the scope of this entry, and there is a plethora of information on the Interwebs.  My plan is to mirror the bare metal server configuration on a virtual machine that I lease. The challenge of spinning up a database and an assortment of PHP based web applications was a bit daunting for newcomer to Puppet. So, it will take me a bit more time to learn more about configuration management automation. Until then, I will be using bash scripts and manual labor to accomplish the restoration of the user accounts and software. 
  • Authors abound
  • OpenSSL and the TLS HeartBleed
  • klogctl: Operation not permitted
  • Wonders of setgid
  • Monthly Archives

    Pages

    OpenID accepted here Learn more about OpenID
    Powered by Movable Type 4.25

    About this Entry

    This page contains a single entry by AG published on September 1, 2012 11:00 AM.

    Remembering the Stark was the previous entry in this blog.

    Time to Step Up is the next entry in this blog.

    Find recent content on the main index or look in the archives to find all content.