Virtual Instance Fails to boot after an upgrade

If you have upgraded your installed Linux Virtual Machine, and it no longer boots you may have FooBar'd your boot parameters for your installation. In most cases this can fixed, and in the following lines I will show you how.

In this article I am using XenServer 5.6+ as my Hypervisor and Ubuntu as my example Linux Installation. However, this could happen to all Linux varieties. Which includes but is not limited to: Ubuntu, Debian, Fedora, RedHat, CentOS


Linux system has failed to Boot after an Upgrade

When you have a virtualized infrastructure, there are a whole bunch of nuances you will learn as your infrastructure grows. One thing that can happen to a Linux OS is an unintended modification to the GRUB configuration, and or the manipulation of the menu.lst file. In this article I will show you how to un-FooBar your instance.

When you are managing Linux Servers back-port patching and Kernel updates are essential to maintaining a well running operating system. In a virtualized infrastructure modifying the GRUB config can be detrimental to your instance. When upgrading most Linux Operating systems, there are times when configuration files and or scripts will need to be replaced by new versions. These new versions sometimes contain new syntax or updated features.

In the case of upgrading modern Linux, there has been a systemic change to the device parameters. This change has impacted common device naming from a name label or device path, to the UUID. The primary benefit of the UUID is that the device label or name is no longer subject to change. While these benefits are GREAT in a standard hardware infrastructure, in an abstracted hardware infrastructure, virtual, this can cause some frustration.


The Symptoms

If you have upgraded your installation and have chosen to use the package maintainers version of the GRUB configuration and menu.lst, there may have been a change to your grub configuration that has introduced the UUID as device parameters. If this is the case, your instance will no longer boot throwing errors stating that the UUID can not be processed in a pygrub setup or that the installation will no longer boot due to its inability to find the root device.

When starting the VM you will have an error on the host stating
something similar to this :

ERROR:Using <class \'grub.GrubConf.GrubConfigFile\'> to parse /grub/menu.lst:
[ Traceback (most recent call last):;  
    File \"/usr/bin/pygrub\", line 746, in ?;    
    raise RuntimeError, \"Unable to find partition containing kernel\";
RuntimeError: Unable to find partition containing kernel;  ]

The Fix

The fix to these issues are relatively simple. You need to correct the
menu.lst file. This can be done in a number of ways. Here are some of
the ways that I have performed this fix.

  • If the VHD can be mounted you can directly edit the menu.lst file directly
  • If you do not know how to mount the VHD, you can use the
    xe-edit-bootloader command from a XenServer host. This command will edit the menu.lst file on the described instance. The syntax for this command looks like this :
xe-edit-bootloader -p 1 -n KevinsVirtualServer
  • If you do not have Host access to the server, but you have the
    ability to put the server into a rescue state you can place the
    server into rescue mode. Once in a Rescued state you will have to
    mount the partition and then edit the /boot/grub/menu.lst file.

Once you get access to the menu.lst file here is a sample of what you
will see. There may be differences between the example provided and the
file you have on your instance. What you need to modify are the
directives found at the bottom of the file.

Here is the ORIGINAL menu.lst file

title Ubuntu 12.04 LTS, kernel 3.2.0-26-generic
uuid ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/vmlinuz-3.2.0-26-generic root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.2.0-26-generic

title Ubuntu 12.04 LTS, kernel 3.2.0-26-generic (recovery mode)
uuid ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/vmlinuz-3.2.0-26-generic root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.2.0-26-generic

title Ubuntu 12.04 LTS, kernel 3.0.0-12-virtual
uuid ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-12-virtual

title Ubuntu 12.04 LTS, kernel 3.0.0-12-virtual (recovery mode)
uuid ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.0.0-12-virtual

title Chainload into GRUB 2
root ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/grub/core.img

title Ubuntu 12.04 LTS, memtest86+
uuid ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/memtest86+.bin

If your menu.lst file looks similar to the previous example, you will
have to adjust it to look like the following example.

Here is the CORRECTED menu.lst file

title Ubuntu 12.04 LTS, kernel 3.2.0-26-generic
kernel /boot/vmlinuz-3.2.0-26-generic root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.2.0-26-generic

title Ubuntu 12.04 LTS, kernel 3.2.0-26-generic (recovery mode)
kernel /boot/vmlinuz-3.2.0-26-generic root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.2.0-26-generic

title Ubuntu 12.04 LTS, kernel 3.0.0-12-virtual
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-12-virtual

title Ubuntu 12.04 LTS, kernel 3.0.0-12-virtual (recovery mode)
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.0.0-12-virtual

title Chainload into GRUB 2
root (hd0,0)
kernel /boot/grub/core.img

title Ubuntu 12.04 LTS, memtest86+
root (hd0,0)
kernel /boot/memtest86+.bin

You will notice that the UUID references have been removed and the root directives have been changed to reflect the location of the mounted partitions. In my case my BOOT partition as well as my ROOT partition were found at (hd0,0).


Wrap up

After making the needed changes, save the file, and exit. Un-Mount any partitions you need to and then restart your instance. With any kind of luck, your instance will start without further issues.

In my experiences when I have come across these types of problems the root partition has needed to be FSCK'd but after these simple modifications and the subsequent FSCK the instance had been recovered
without further incident.