Minty to the rescue, tales of LVM basics and recoveries

In this post I’ll document a few replicable techniques that might help the less experts in managing and recovering a faulty hard drive with LVM in use. Note: it may contain Windows and Virtual Machines.

Why Windows?

Due to my gaming and drawing habits, even though I love the freedom of OpenSource platforms, I find myself using Win7 most of the time. We could argue that in this day and age it’s not even necessary, and I would be better off with a XenServer and a couple passthroughs to run everything in parallel, and I would agree with you. But I’m also lazy, and why fix something that is not broken? In any case all headless servers expertise came to no use when I found myself having to deal with a faulty LVM of a root partition in a notebook hard drive. Sort of a jackpot, of a kind. While LVM have undoubtedly their advantages, I find myself more comfortable in the physical realm rather than the logical. So I wasn’t much of an expert in that regard, and the notebook wouldn’t properly load making all the usual on-machine troubleshooting useless. Being a linux installation I couldn’t even just plug it into my main PC and scan the extN, since I am sporting Win7 for my daily routines. But then it dawned on me…

Why not virtual Zoidberg?

Given the monstrous specs of my PC, and the marvels of virtualization and passthrough technology, I thought to put them all to use and resurrect my dusty VMware Workstation I had lying around for such a long time. While attaching the external hard drive to a USB3 port, I could simply pass-through it to the *nix virtual machine, and while at it I’d try that neat Linux Mint distro I wanted to try for so long (and hence the name of the article). At this point it becomes a simple *nix recovery, which is for the best.

Dealing with LVMs

I armed myself with what documentation I could find, and started going at it:

What does this all mean? Let’s divide it for simplicity. When not dealing with physical partitions but with LVM, there are three different actors in play: physical volumes, volume groups, and logical volumes:

  • Physical Volumes: (pvscan, lvm pvs) are the classical partitions. They can be grouped into a single Volume Group to virtualize disk space and access and only handle space as a virtual entity.
  • Volume Group: (lvm vgs, vgdisplay) can be considered as a union of partitions. Just like logical volumes in RAID setups, VGs support the addition (and/or removal) of drives from it, which makes it easier to silently expand the space available without touching a partition. Suppose for example that we want an additional hard drive in our PC, we can mount the new hard drive, format it and attach it to our current VG. From there we can simply “expand” the Logical Volume(s) we want the additional space to go to, and it’s done. No need to mount partitions in directories or similar, it just becomes a de-facto stripe.
  • Logical Volume: (lvm lvs, lvdisplay) the usable “virtual” partition. These LVs can be mounted just like the good ol’ partitions, and can be used as such. If an hard drive is added to the VG we can simply expand it to make use of the new space. At the same time, we will have no need to take in accounting different mounting problems since it’s just a (possibly striped on various hard drives of different sizes) partition.

With this knowledge at hand, after understanding the concepts of LVMs, it was a matter of simply using mount /dev/<VGNAME>/root /mnt, and recover what salvageable data I had left. A question I never found myself asking (for obvious reasons) was to be asked, though: “what if the damage to the hard drive was fatal for just coincidence, and could I just fix and reuse it for less than critical events?”.

Once upon a time, in a bad block far, far away…

While everybody agrees that a bad block is a great signal for “duck and cover”, I’ve always been more of an inquisitorial type. Armed with a live Mint distro and an idiot proof documentation, I proceeded to simply do the following:

While reinstalling ex novo a new OS, I thought it would be helpful, in order to avoid the pesky “Unrecovered read error – auto reallocate failed” to leave a GB or two as unallocated space, for all intents and purposes. So far everything has worked fine, let’s hope and pray that it will continue to do so, but given the hard drive reassignment to non-aggressive duties, it probably will.

XenServer fix script

In a previous article (Fixing XenServer error “Unable to find partition containing kernel”) I described how to fix a recurring problem after patching XenServer 6.2 installations. While the fix is known from years it’s never been adopted, and different distros (such as Ubuntu LTS 14.04) fail to boot properly when the (on dom0) gets reset to its default state.

Being the lazy person that I am I decided to set up a script to do the work for me, after all we’re admins, not monkeys.

This does just what I/we used to do manually: detects if has been reverted and, if not, patches it up. Supplementary tests added for paranoia 🙂

Better OSSEC syslog parsing for Splunk

Just as predicted by the documentation, the syslog parsing of the OSSEC app for Splunk was a bit meh: while it would work in several instances it would terribly fail in others, like HTTP access for example. Below you can find the current version I’m using, which also provide additional fields that can be used for reports.

What you see commented out are the original instructions that can be safely removed. The new REGEX is more complex than the original, maybe too much, but through this I can extract more information that were previously hidden, or not easily accessible, and at the same time remove redundant timestamps while having all the important messages correctly extracted.

If you have suggestions, feel free to comment below.

OSSEC Agent/Server + Splunk installation

There is a lot of documentation to be read about the installation of OSSEC, but it’s usually sparse and focused either on a local autonomous setup or on hundreds of VMs setups. In this article we will navigate through the necessary steps to set up a small OSSEC installation with the OSSEC agent running offsite on a web/mail server and the OSSEC server running onsite. Additionally we will take a look at Splunk and install it on the OSSEC server machine, which will make it easier to manage bigger volumes of data later on.


In order to compile and install OSSEC you will need build-essential on Ubuntu machines and MySQL/PostgreSQL for database support. You can read more details about this here.

Agent/Server installation

Installing the agent and the server is as easy as running the script (after checksumming it) and answering a few questions, although you should keep (most of) the defaults since they’re solid, and then build up on them.

Basic server/agent configuration

After the server configuration, you will need to manage the agents. On the server you will use manage_agents command to insert a number of agents with their ids, names and ip addresses.

After adding the agents on the server, you need to extract the agent keys.

You now need to add the hash to the agent, through manage_client.

If you remembered to configure the firewall rules properly, allowing traffic on UDP 1514, you should now have them synced upon restart. If everything is working as expected you will find the ossec-agentd connection in the logs within /var/ossec/logs/ossec.log: ossec-agentd(4102): INFO: Connected to the server (hostname/ipaddress:1514).

Adding global agent configurations

One of the smart moves that extend the capability of OSSEC is the possibility to push configurations to the agents. Anyone who managed a botnet knows how powerful this can be, and OSSEC is no exception. Let’s suppose we’re behind a static IP, say, by logging in through SSH, moving files through FTP and changing configuration files around we would generate a lot of white noise, but we can fix that by adding a simple agent configuration on our server side:

After a reset of the OSSEC processes the agent.conf will be pushed/pulled, and the IP should be now successfully white-listed. This method also allows to set specific rules for sets of agents, by specifying the names to which the configurations apply.

Agent configuration: we need to go deeper

As explained in this article, stopping to the defaults is no good practice. While all the base scenarios have been covered, specific needs have not. Using multi-user hosting or logging? You need to add these logs manually. Mail servers? These too. For some reasons you have verbose MySQL logging? This will need to be added too. That’s easily done by simply appending the specified logs and type to either the agent ossec.conf or the server agent.conf, whichever suits your needs best:

Remember that you can use wildcards and strftime for the logs, but not together. Also there are a few pitfalls in using wildcards you should be aware of.

Tweaking the server for Splunk

At this point we have a working agent/server configuration, but we want to push it a step further to make use of Splunk. Even though my setup has OSSEC and Splunk sharing the same machine I chose a syslog client configuration, and the reason is simple: through the use of syslog_output I am able to increase the granularity by raising or lowering the alert level as I see fit, while also allowing me to add a separate OSSEC server elsewhere without the need to reconfigure Splunk. It’s a win-win. The changes are to be made inside ossec.conf:

You should put the syslog_output before the <rules> tag. This is all it takes to be ready for Splunk

Where to start Splunking

Silly puns aside, we will need the Splunk software and the Reporting and Management for OSSEC. Given my setup I downloaded the deb package on the server, and the app tgz on my workstation. The installation is as easy as running a few commands:

On a Ubuntu server this will install the required files, and make it start on boot running as splunk user. Before running it though, we need to make a change that will allow us to receive information from OSSEC. The following code can be added in the inputs.conf after the [default] section:

This will start the UDP server, as per our mission. There are other modes available if you chose not to use the syslog_output method, but I will not go into that for now, I will just leave you the app documentation as reference.

At this point most of our work is done. Once the server is started (with service splunk start in my case) you can connect to it through its web interface, which should be up at http://ipaddress:8000/ and perfectly running. After the login you can navigate to App > Manage Apps… and click Install app from file, selecting the app tgz we downloaded earlier. If everything has been done correctly data should be now flowing, and a simple sourcetype=”ossec” query should hold all the collected information.

What to do with it, you ask? Well, that’s your job now 🙂

Solution to XenServer VM landing on initramfs

In my journey through XenServer lands, I once experienced a change in the UUID of the root partition, which resulted in a failed boot and being dropped into initramfs. Although this solution should have worked just fine, I either didn’t know of it at the time or it wouldn’t work for some reason.

While inside the VM initramfs I also had the pleasure of not having any text editor of sorts: no vi, no vim, no nano. Nothing at all. Even though I found the new UUID through the use of ls -al /dev/disk/by-uuid/ (and some guesswork), I had no way to edit the grub configuration. So, after some trial and error, I came up with the following:

After the proper root partition UUID was set in place, a reboot was all it took to set the machine back up and running.

Fixing XenServer error “Unable to find partition containing kernel”

Edit: I provided a scripted solution in this article. To know why the error happens and its fixes just keep reading.

Error: Starting VM ” – The bootloader for this VM returned an error — did the VM installation succeed? Unable to find partition containing kernel

This has been the major nightmare I had so far with XenServer machines. When upgrading distros it might just so happen they will refused to boot forever after, in my case it affects Ubuntu 14.04.x, not officially supported. Let’s look at the solutions.


Although the fix is in Citrix’s repository since 2012, give or take, it has not been streamed to the executables yet for some uncertain reasons. If you open /usr/lib/python2.4/site-packages/grub/ at line 428 you see:

This causes a problem during the parsing and two lines should be added:

After the file has been modified and saved you will be able to properly start the virtual machines. This holds currently true for Ubuntu 14.04 & 14.04.1 LTS server installations, but might also work for other distributions. Also take in consideration that applying some patches to the host might revert this change, so you might need to do it again at some point in the future.

Modifying grub.cfg

This might not be enough if the problem does not relate to PyGrub but rather to the configuration file itself. While on the host machine, you can run the following command:

This command will prepare and mount the drives assigned to the virtual machine, edit the boot loader configuration in vi and after quitting from vi will unmount and cleanup. If you installed Grub2 or you made mistakes in its configuration, this will allow you to edit it from inside the host machine, after which you will be able to properly boot it up.

XenServer VMs and easy autostart

One of the most tedious tasks I find myself doing on XS installation is switching VMs’ autostart off and on. While no big deal it got boring real fast. I thus crafted a couple bash scripts to be run on the XS host that speed the task up.

This script will parse all the VMs while skipping a few system instances, and will clearly show which ones are starting automatically, together with the corresponding UUID, so no guesswork or matching is needed.

With our UUID in hand, we can now enable/disable it through this tiny script:

With these two scripts the task becomes checking the VM name and switching it on/off in a matter of seconds. Mission complete.

Mass XenServer updates with batch script

I was looking for a way to optimize (read: not doing repetitive tasks by hand) the patches upload on my XenServer machine. I based it off a script, shown in this article, that I modified slightly for my lesser needs and built it around a Win7 XenCenter installation.

The first four parameters are shortcuts to the xe.exe, the path to the patches location, the remote server data and finally the UUID of the XS host. I download the packages in the directory and unzipped its content. When the script is ran without parameters it will remove any sources package I might have mistakenly extracted, it scans the path for any and all ZIP packages and, based on their names, uploads the extracted XSUPDATE to the XenServer host, returning a list of patch UUIDs. After that I can launch the script again passing a parameter list with the returned patch UUIDs, and it will cycle through them all and apply them. After they have been applied correctly I can reboot the XenServer host and delete the ZIP packages.

This is a bit rough around the edges, but it works when you only have a handful machines to upgrade. There is a lot of room for improvement though, and I might get back to it at a later date.