XenServer upgrade failed to boot
Virtual Instance Fails to boot after an upgrade
If you have upgraded your installed Linux Virtual Machine, and it no longer boots you may have FooBar'd your boot parameters for your installation. In most cases this can fixed, and in the following lines I will show you how.
In this article I am using XenServer 5.6+ as my Hypervisor and Ubuntu as my example Linux Installation. However, this could happen to all Linux varieties. Which includes but is not limited to: Ubuntu, Debian, Fedora, RedHat, CentOS
Linux system has failed to Boot after an Upgrade
When you have a virtualized infrastructure, there are a whole bunch of nuances you will learn as your infrastructure grows. One thing that can happen to a Linux OS is an unintended modification to the GRUB configuration, and or the manipulation of the menu.lst
file. In this article I will show you how to un-FooBar your instance.
When you are managing Linux Servers back-port patching and Kernel updates are essential to maintaining a well running operating system. In a virtualized infrastructure modifying the GRUB config can be detrimental to your instance. When upgrading most Linux Operating systems, there are times when configuration files and or scripts will need to be replaced by new versions. These new versions sometimes contain new syntax or updated features.
In the case of upgrading modern Linux, there has been a systemic change to the device parameters. This change has impacted common device naming from a name label or device path, to the UUID. The primary benefit of the UUID is that the device label or name is no longer subject to change. While these benefits are GREAT in a standard hardware infrastructure, in an abstracted hardware infrastructure, virtual, this can cause some frustration.
The Symptoms
If you have upgraded your installation and have chosen to use the package maintainers version of the GRUB configuration and menu.lst
, there may have been a change to your grub configuration that has introduced the UUID as device parameters. If this is the case, your instance will no longer boot throwing errors stating that the UUID can not be processed in a pygrub
setup or that the installation will no longer boot due to its inability to find the root device.
When starting the VM you will have an error on the host stating
something similar to this :
ERROR:Using <class \'grub.GrubConf.GrubConfigFile\'> to parse /grub/menu.lst:
[ Traceback (most recent call last):;
File \"/usr/bin/pygrub\", line 746, in ?;
raise RuntimeError, \"Unable to find partition containing kernel\";
RuntimeError: Unable to find partition containing kernel; ]
The Fix
The fix to these issues are relatively simple. You need to correct the
menu.lst
file. This can be done in a number of ways. Here are some of
the ways that I have performed this fix.
- If the VHD can be mounted you can directly edit the
menu.lst
file directly - If you do not know how to mount the VHD, you can use the
xe-edit-bootloader
command from a XenServer host. This command will edit themenu.lst
file on the described instance. The syntax for this command looks like this :
xe-edit-bootloader -p 1 -n KevinsVirtualServer
- If you do not have Host access to the server, but you have the
ability to put the server into a rescue state you can place the
server into rescue mode. Once in a Rescued state you will have to
mount the partition and then edit the/boot/grub/menu.lst
file.
Once you get access to the menu.lst
file here is a sample of what you
will see. There may be differences between the example provided and the
file you have on your instance. What you need to modify are the
directives found at the bottom of the file.
Here is the ORIGINAL menu.lst file
title Ubuntu 12.04 LTS, kernel 3.2.0-26-generic
uuid ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/vmlinuz-3.2.0-26-generic root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.2.0-26-generic
title Ubuntu 12.04 LTS, kernel 3.2.0-26-generic (recovery mode)
uuid ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/vmlinuz-3.2.0-26-generic root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.2.0-26-generic
title Ubuntu 12.04 LTS, kernel 3.0.0-12-virtual
uuid ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-12-virtual
title Ubuntu 12.04 LTS, kernel 3.0.0-12-virtual (recovery mode)
uuid ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.0.0-12-virtual
title Chainload into GRUB 2
root ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/grub/core.img
title Ubuntu 12.04 LTS, memtest86+
uuid ff64509f-a0ea-448e-9c33-aaa8889c7d76
kernel /boot/memtest86+.bin
If your menu.lst
file looks similar to the previous example, you will
have to adjust it to look like the following example.
Here is the CORRECTED menu.lst file
title Ubuntu 12.04 LTS, kernel 3.2.0-26-generic
kernel /boot/vmlinuz-3.2.0-26-generic root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.2.0-26-generic
title Ubuntu 12.04 LTS, kernel 3.2.0-26-generic (recovery mode)
kernel /boot/vmlinuz-3.2.0-26-generic root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.2.0-26-generic
title Ubuntu 12.04 LTS, kernel 3.0.0-12-virtual
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-12-virtual
title Ubuntu 12.04 LTS, kernel 3.0.0-12-virtual (recovery mode)
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.0.0-12-virtual
title Chainload into GRUB 2
root (hd0,0)
kernel /boot/grub/core.img
title Ubuntu 12.04 LTS, memtest86+
root (hd0,0)
kernel /boot/memtest86+.bin
You will notice that the UUID references have been removed and the root
directives have been changed to reflect the location of the mounted partitions. In my case my BOOT partition as well as my ROOT partition were found at (hd0,0)
.
Wrap up
After making the needed changes, save the file, and exit. Un-Mount any partitions you need to and then restart your instance. With any kind of luck, your instance will start without further issues.
In my experiences when I have come across these types of problems the root partition has needed to be FSCK'd but after these simple modifications and the subsequent FSCK the instance had been recovered
without further incident.