Minty to the rescue, tales of LVM basics and recoveries

In this post I’ll document a few replicable techniques that might help the less experts in managing and recovering a faulty hard drive with LVM in use. Note: it may contain Windows and Virtual Machines.

Why Windows?

Due to my gaming and drawing habits, even though I love the freedom of OpenSource platforms, I find myself using Win7 most of the time. We could argue that in this day and age it’s not even necessary, and I would be better off with a XenServer and a couple passthroughs to run everything in parallel, and I would agree with you. But I’m also lazy, and why fix something that is not broken? In any case all headless servers expertise came to no use when I found myself having to deal with a faulty LVM of a root partition in a notebook hard drive. Sort of a jackpot, of a kind. While LVM have undoubtedly their advantages, I find myself more comfortable in the physical realm rather than the logical. So I wasn’t much of an expert in that regard, and the notebook wouldn’t properly load making all the usual on-machine troubleshooting useless. Being a linux installation I couldn’t even just plug it into my main PC and scan the extN, since I am sporting Win7 for my daily routines. But then it dawned on me…

Why not virtual Zoidberg?

Given the monstrous specs of my PC, and the marvels of virtualization and passthrough technology, I thought to put them all to use and resurrect my dusty VMware Workstation I had lying around for such a long time. While attaching the external hard drive to a USB3 port, I could simply pass-through it to the *nix virtual machine, and while at it I’d try that neat Linux Mint distro I wanted to try for so long (and hence the name of the article). At this point it becomes a simple *nix recovery, which is for the best.

Dealing with LVMs

I armed myself with what documentation I could find, and started going at it:

# pvscan
  PV /dev/sda5   VG mint-vg   lvm2 [9,76 GiB / 0    free]
  Total: 1 [9,76 GiB] / in use: 1 [9,76 GiB] / in no VG: 0 [0   ]
# lvm pvs
  PV         VG      Fmt  Attr PSize PFree
  /dev/sda5  mint-vg lvm2 a--  9,76g    0

# lvm vgs
  VG      #PV #LV #SN Attr   VSize VFree
  mint-vg   1   2   0 wz--n- 9,76g    0 
# vgdisplay
  --- Volume group ---
  VG Name               mint-vg
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  6
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               9,76 GiB
  PE Size               4,00 MiB
  Total PE              2498
  Alloc PE / Size       2498 / 9,76 GiB
  Free  PE / Size       0 / 0   
  VG UUID               [UUID]

# lvm lvs
  LV     VG      Attr      LSize Pool Origin Data%  Move Log Copy%  Convert
  root   mint-vg -wi-ao--- 8,76g                                           
  swap_1 mint-vg -wi-ao--- 1,00g
# lvdisplay
  --- Logical volume ---
  LV Path                /dev/mint-vg/root
  LV Name                root
  VG Name                mint-vg
  LV UUID                [UUID]
  LV Write Access        read/write
  LV Creation host, time mint, 2015-02-08 22:44:19 +0100
  LV Status              available
  # open                 1
  LV Size                8,76 GiB
  Current LE             2242
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0
   
  --- Logical volume ---
  LV Path                /dev/mint-vg/swap_1
  LV Name                swap_1
  VG Name                mint-vg
  LV UUID                [UUID]
  LV Write Access        read/write
  LV Creation host, time mint, 2015-02-08 22:44:19 +0100
  LV Status              available
  # open                 2
  LV Size                1,00 GiB
  Current LE             256
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:1

What does this all mean? Let’s divide it for simplicity. When not dealing with physical partitions but with LVM, there are three different actors in play: physical volumes, volume groups, and logical volumes:

  • Physical Volumes: (pvscan, lvm pvs) are the classical partitions. They can be grouped into a single Volume Group to virtualize disk space and access and only handle space as a virtual entity.
  • Volume Group: (lvm vgs, vgdisplay) can be considered as a union of partitions. Just like logical volumes in RAID setups, VGs support the addition (and/or removal) of drives from it, which makes it easier to silently expand the space available without touching a partition. Suppose for example that we want an additional hard drive in our PC, we can mount the new hard drive, format it and attach it to our current VG. From there we can simply “expand” the Logical Volume(s) we want the additional space to go to, and it’s done. No need to mount partitions in directories or similar, it just becomes a de-facto stripe.
  • Logical Volume: (lvm lvs, lvdisplay) the usable “virtual” partition. These LVs can be mounted just like the good ol’ partitions, and can be used as such. If an hard drive is added to the VG we can simply expand it to make use of the new space. At the same time, we will have no need to take in accounting different mounting problems since it’s just a (possibly striped on various hard drives of different sizes) partition.

With this knowledge at hand, after understanding the concepts of LVMs, it was a matter of simply using mount /dev/<VGNAME>/root /mnt, and recover what salvageable data I had left. A question I never found myself asking (for obvious reasons) was to be asked, though: “what if the damage to the hard drive was fatal for just coincidence, and could I just fix and reuse it for less than critical events?”.

Once upon a time, in a bad block far, far away…

While everybody agrees that a bad block is a great signal for “duck and cover”, I’ve always been more of an inquisitorial type. Armed with a live Mint distro and an idiot proof documentation, I proceeded to simply do the following:

# badblocks /dev/sda > ./bad-blocks
# fsck -l ./bad-blocks /dev/sda

While reinstalling ex novo a new OS, I thought it would be helpful, in order to avoid the pesky “Unrecovered read error – auto reallocate failed” to leave a GB or two as unallocated space, for all intents and purposes. So far everything has worked fine, let’s hope and pray that it will continue to do so, but given the hard drive reassignment to non-aggressive duties, it probably will.

XenServer GrubConf.py fix script

In a previous article (Fixing XenServer error “Unable to find partition containing kernel”) I described how to fix a recurring problem after patching XenServer 6.2 installations. While the fix is known from years it’s never been adopted, and different distros (such as Ubuntu LTS 14.04) fail to boot properly when the GrubConf.py (on dom0) gets reset to its default state.

Being the lazy person that I am I decided to set up a script to do the work for me, after all we’re admins, not monkeys.

#!/bin/bash
GRUBCONF="/usr/lib/python2.4/site-packages/grub/GrubConf.py"
PATCHED=$(grep "_entry" $GRUBCONF | wc -l)
if [ $PATCHED -eq 2 ]; then
  echo "GrubConf.py is already patched"
else
  echo "Patching GrubConf.py to fix boot..."
  sed -i 's/_entry}":/_entry}":\n                        arg = "0"\n                    elif arg.strip() == "${next_entry}":/' $GRUBCONF
  PATCHED=$(grep "_entry" $GRUBCONF | wc -l)
  if [ $PATCHED -eq 2 ]; then
    echo "- Patch was applied successfully."
  else
    echo "- There was a problem while applying the patch."
  fi;
fi;
echo

This does just what I/we used to do manually: detects if GrubConf.py has been reverted and, if not, patches it up. Supplementary tests added for paranoia 🙂