By Richard Bullington-McGuire on Fri, 2006-04-28 01:00. Software
Raid and Logical Volume Managers are great, until you lose data.
The combination of Linux software RAID (Redundant Array of Inexpensive Disks) and LVM2 (Logical Volume Manager, version 2) offered in modern Linux operating systems offers both robustness and flexibility, but at the cost of complexity should you ever need to recover data from a drive formatted with software RAID and LVM2 partitions. I found this out the hard way when I recently tried to mount a system disk created with RAID and LVM2 on a different computer. The first attempts to read the filesystems on the disk failed in a frustrating manner.
I had attempted to put two hard disks into a small-form-factor computer that was really only designed to hold only one hard disk, running the disks as a mirrored RAID 1 volume. (I refer to that system as raidbox for the remainder of this article.) This attempt did not work, alas. After running for a few hours, it would power-off with an automatic thermal shutdown failure. I already had taken the system apart and started re-installing with only one disk when I realized there were some files on the old RAID volume that I wanted to retrieve.
Recovering the data would have been easy if the system did not use RAID or LVM2. The steps would have been to connect the old drive to another computer, mount the filesystem and copy the files from the failed volume. I first attempted to do so, using a computer I refer to as recoverybox, but this attempt met with frustration.
Getting to the data proved challenging, both because the data was on a logical volume hidden inside a RAID device, and because the volume group on the RAID device had the same name as the volume group on the recovery system.
Some popular modern operating systems (for example, Red Hat Enterprise Linux 4, CentOS 4 and Fedora Core 4) can partition the disk automatically at install time, setting up the partitions using LVM for the root device. Generally, they set up a volume group called VolGroup00, with two logical volumes, LogVol00 and LogVol01, the first for the root directory and the second for swap, as shown in Listing 1.
The original configuration for the software RAID device had three RAID 1 devices: md0, md1 and md2, for /boot, swap and /, respectively. The LVM2 volume group was on the biggest RAID device, md2. The volume group was named VolGroup00. This seemed like a good idea at the time, because it meant that the partitioning configuration for this box looked similar to how the distribution does things by default. Listing 2 shows how the software RAID array looked while it was operational.
If you ever name two volume groups the same thing, and something goes wrong, you may be faced with the same problem. Creating conflicting names is easy to do, unfortunately, as the operating system has a default primary volume group name of VolGroup00.
To recover, the first thing to do is to move the drive to another machine. You can do this pretty easily by putting the drive in a USB2 hard drive enclosure. It then will show up as a SCSI hard disk device, for example, /dev/sda, when you plug it in to your recovery computer. This reduces the risk of damaging the recovery machine while attempting to install the hardware from the original computer.
The challenge then is to get the RAID setup recognized and to gain access to the logical volumes within. You can use sfdisk -l /dev/sda to check that the partitions on the old drive are still there.
To get the RAID setup recognized, use mdadm to scan the devices for their raid volume UUID signatures, as shown in Listing 3.
This format is very close to the format of the /etc/mdadm.conf file that the mdadm tool uses. You need to redirect the output of mdadm to a file, join the device lines onto the ARRAY lines and put in a nonexistent second device to get a RAID1 configuration. Viewing the the md array in degraded mode will allow data recovery:
[root@recoverybox ~]# mdadm --examine --scan /dev/sda1 ↪/dev/sda2 /dev/sda3 >> /etc/mdadm.conf [root@recoverybox ~]# vi /etc/mdadm.conf
Edit /etc/mdadm.conf so that the devices statements are on the same lines as the ARRAY statements, as they are in Listing 4. Add the “missing” device to the devices entry for each array member to fill out the raid1 complement of two devices per array. Don’t forget to renumber the md entries if the recovery computer already has md devices and ARRAY statements in /etc/mdadm.conf.
Then, activate the new md devices with mdadm -A -s, and check /proc/mdstat to verify that the RAID array is active. Listing 5 shows how the raid array should look.
If md devices show up in /proc/mdstat, all is well, and you can move on to getting the LVM volumes mounted again.
The next hurdle is that the system now will have two sets of lvm2 disks with VolGroup00 in them. Typically, the vgchange -a -y command would allow LVM2 to recognize a new volume group. That won’t work if devices containing identical volume group names are present, though. Issuing vgchange -a y will report that VolGroup00 is inconsistent, and the VolGroup00 on the RAID device will be invisible. To fix this, you need to rename the volume group that you are about to mount on the system by hand-editing its lvm configuration file.
If you made a backup of the files in /etc on raidbox, you can edit a copy of the file /etc/lvm/backup/VolGroup00, so that it reads VolGroup01 or RestoreVG or whatever you want it to be named on the system you are going to restore under, making sure to edit the file itself to rename the volume group in the file.
If you don’t have a backup, you can re-create the equivalent of an LVM2 backup file by examining the LVM2 header on the disk and editing out the binary stuff. LVM2 typically keeps copies of the metadata configuration at the beginning of the disk, in the first 255 sectors following the partition table in sector 1 of the disk. See /etc/lvm/lvm.conf and man lvm.conf for more details. Because each disk sector is typically 512 bytes, reading this area will yield a 128KB file. LVM2 may have stored several different text representations of the LVM2 configuration stored on the partition itself in the first 128KB. Extract these to an ordinary file as follows, then edit the file:
dd if=/dev/md2 bs=512 count=255 skip=1 of=/tmp/md2-raw-start vi /tmp/md2-raw-start
You will see some binary gibberish, but look for the bits of plain text. LVM treats this metadata area as a ring buffer, so there may be multiple configuration entries on the disk. On my disk, the first entry had only the details for the physical volume and volume group, and the next entry had the logical volume information. Look for the block of text with the most recent timestamp, and edit out everything except the block of plain text that contains LVM declarations. This has the volume group declarations that include logical volumes information. Fix up physical device declarations if needed. If in doubt, look at the existing /etc/lvm/backup/VolGroup00 file to see what is there. On disk, the text entries are not as nicely formatted and are in a different order than in the normal backup file, but they will do. Save the trimmed configuration as VolGroup01. This file should then look like Listing 6.
Once you have a volume group configuration file, migrate the volume group to this system with vgcfgrestore, as Listing 7 shows.
At this point, you can now mount the old volume on the new system, and gain access to the files within, as shown in Listing 8.
Now that you have access to your data, a prudent final step would be to back up the volume group information with vcfgbackup, as Listing 9 shows.
LVM2 and Linux software RAID make it possible to create economical, reliable storage solutions with commodity hardware. One trade-off involved is that some procedures for recovering from failure situations may not be clear. A tool that reliably extracted old volume group information directly from the disk would make recovery easier. Fortunately, the designers of the LVM2 system had the wisdom to keep plain-text backup copies of the configuration on the disk itself. With a little patience and some research, I was able to regain access to the logical volume I thought was lost; may you have as much success with your LVM2 and RAID installation.
Resources for this article: www.linuxjournal.com/article/8948.
Richard Bullington-McGuire is the Managing Partner of PKR Internet, LLC, a software and systems consulting firm in Arlington, Virginia, specializing in Linux, Open Source and Java. He has been a Linux sysadmin since 1994. You can reach him at rbulling@pkrinternet.com.
Last-Modified: 2007-09-12 07:46:56