Migrating BTRFS Subvolumes locally

This post will focus on moving a BTRFS file system from loopback file to a mounted BTRFS subvolume on a local file system.

The Scenario

A server has BTRFS as it's root file system and is being used with systemd-nspawn. While this server has been online for several months it was only recently discovered that the containers running on the host were being executed out of a loopback file which was formatted BTRFS. The loopback file is created automatically by systemd when /var/lib/machines does not exist and/or is not a BTRFS subvolume.

The mission

Move the containers from the loopback file system to the locally mounted subvolume which will improve general container performance and ensure we're not bound the limitations of the loopback file. The stretch goal is to have less than 10 minutes of downtime.


Container Stop

To begin the maintenance the containers must all be stopped. This can be done with the following loop which will invoke systemctl to stop all of the containers. The systemd-escape command is used in conjunction with the container name to ensure all non-standard characters are properly escaped.

for i in $(machinectl list-images | awk '{print $1}'); do
  systemctl stop $(systemd-escape --template=systemd-nspawn@.service $i)
done

Once all of the containers are stopped, unmount the container path /var/lib/machines and remount it elsewhere, for the purpose of this post the new temporary location will be /mnt.

umount /var/lib/machines
mount -t btrfs /var/lib/machines.raw /mnt

Subvolume Create

Next, create the new subvolume which will be used for the container workloads. This new subvolume will be a system level subvolume (child of the root file system) and will eventually be mounted at /var/lib/machines.

btrfs subvolume create /@machines
Mounting the subvolume

For a complete overview on mounting subvolumes, have a look here. This next command mounts the root disk with the new subvolume specified as an option.

Remember to change the XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX UUID to the actual UUID of the local block device.

# Ensure the target directory exists
mkdir -p /var/lib/machines
# Mount the subvolume
mount -t btrfs -o defaults,noatime,nodiratime,compress=lzo,commit=120,space_cache=v2,subvol=@machines /dev/disk/by-uuid/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX /var/lib/machines

Preparing the Original Subvolumes

Before the original subvolumes can be sent to their new home they must be marked read-only. To mark these subvolumes read-only the btrfs property command will be used. The following loop will iterate over all of the subvolumes and mark them read-only.

for i in $(ls -1 /mnt/); do
  btrfs property set -ts /mnt/$i ro true
done

Transfering the Subvolumes

With everything prepared, its time to transfer the original subvolumes to their new home. This process is done via the btrfs send and btrfs receive commands through a pipe. This loop will send all of the subvolumes over to the new location.

Be aware, sending the subvolumes from one place crosses a file system boundary and can take some time, have patience.

for i in $(ls -1 /mnt/); do
  btrfs send /mnt/$i | btrfs receive /var/lib/machines/
done

Before the containers can resume their normal workloads, the newly transfered subvolumes need to be marked read-write. Once again the btrfs property command will be used to reset the read-only flag.

for i in $(ls -1 /var/lib/machines); do
  btrfs property set -ts /var/lib/machines/$i ro false
done

Restart the Containers and Cleanup

With all of the transfers complete it's time to resume the container workloads on the host. This loop will list all of the container images and invoke the systemctl command to start the containers. The container names will be escaped using systemd-escape to ensure we're escaping everything properly.

for i in $(machinectl list-images | awk '{print $1}'); do
  systemctl start $(systemd-escape --template=systemd-nspawn@.service $i)
done

Assuming all containers were started and the workloads resumed successfully, umount the old loopback file and remove it.

umount /mnt
rm /var/lib/machines.raw

Making the Subvolume mounts persistent

Because the newly created @machines subvolume will be mounted, it will need to be made persistent and available on boot. This can be done in two ways, the systemd style or the fstab style. Both are perfectly valid.

systemd style

The file to be modified or created for systemd mounts will need to be at this location /etc/systemd/system/var-lib-machines.mount and contain the following lines.

Remember to change the XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX UUID to the actual UUID of the local block device.

[Unit]
Description=Auto mount for /var/lib/machines

[Mount]
What=/dev/disk/by-uuid/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
Where=/var/lib/machines
Type=btrfs
Options=defaults,noatime,nodiratime,compress=lzo,commit=120,space_cache=v2,subvol=@machines

[Install]
WantedBy=multi-user.target
fstab style

Updating the local fstab file is the most common (legacy) way to create persistent mounts. Simply updating the /etc/fstab file with the following content will be all that's needed.

Remember to change the XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX UUID to the actual UUID of the local block device.

UUID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX /var/lib/machines  btrfs  defaults,noatime,nodiratime,compress=lzo,commit=120,space_cache=v2,subvol=@machines 0 0

That's all folks

It's my hope this simple post shines a light on some many capabilities found with BTRFS and nspawn and how they can be used to your advantage in real operational environments. If you liked this post or have questions reach out, hit me up on Twitter, Linked-in or IRC. Until next time!