OSIC Baremetal (Ironic) network Separation

Generating Isolated (Separated) Networks on Baremetal

In the OSIC-DFW environment we're running Ironic with OpenStack integration to allow our consumers access to physical compute resources. Because of limitations in Ironic at the time of this writing we're not able to fully provide for L3 tenant networking or full project isolation. This limitation means all hosts enrolled in Ironic will exist within the same broadcast domain from an initial connectivity standpoint. However we're able to overcome "some" of the tenant isolation issues by using a user specific mesh topology.

Encapsulating user traffic in a VXLAN mesh allows hosts to communicate in isolation. The following post will cover how this is done within the OSIC-DFW cloud and will provide tooling to automating the setup across most Linux distributions.

If you're only interested in getting access to the tooling please click here


Pre-requisites

Before you go any further make sure your running Kernel 3.10 or greater (preferably greater) or that you're base OS has proper VXLAN support. Once you've verified that the system is capable of supporting VXLAN, install the following packages.

# Debian based installation
apt-get update && apt-get install -y bridge-utils python-requests
# RHEL based installation
yum install -y bridge-utils python-requests
# SUSE based installation
zypper in -y bridge-utils python-requests

While not required, it's recommended to run with the mesh topology using an MTU of 9000. Everything should work if the underlying network does not support Jumbo frames but there will be performance impacts and some applications may need to be tweaked in-order for them to work with an MTU of less than 1500. VXLAN networks generally operate at 50 bytes less than the underlying network which means by default a VXLAN network would use an MTU of 1450 causing most traffic to fragment more frequently than it would otherwise.

If the underlying network supports running with an MTU greater than 1500 make sure your network interfaces are setup accordingly.

You can change the network MTU of given device using this command:

# Example command for changing the MTU of an existing interface
ip link set $DEVICE_NAME mtu 9000

Basic VXLAN network setup

This a basic overview regarding the creation of a VXLAN mesh. If you simply want to know how to create an isolated VXLAN mesh on baremetal hosts jump straight here.

Getting started

Creating the VXLAN network interface is simple. Before we run the simple command we'll need a device name, a VXLAN ID between 1 and 16777216, a multicast address between 224.0.0.0 and 239.255.255.255 which is used to broadcast communication, and the name of a physical interface which will be the basis for connectivity.

# Defined variables
DEVICE_NAME=vxlan-1
VXLAN_ID=100
MULTI_CAST_ADDRESS=230.0.0.1
PHYSICAL_INTERFACE=bond0

# Command to create the interface
ip link add ${DEVICE_NAME} type vxlan id ${VXLAN_ID} group ${MULTI_CAST_ADDRESS} ttl 4 dev ${PHYSICAL_INTERFACE}

Once the interface is created you can bring it up.

ip link set ${DEVICE_NAME} up

With the interface active you're now able to add an IP address to the device or plug it into a bridge. At this point the device is synonymous with a VLAN tagged interface and be used as you see fit.

The interesting points however are the MULTI_CAST_ADDRESS and the VXLAN_ID. These two entries together create the specific "isolated network". It should be noted that VXLAN is not segregated away from other consumers in a shared cloud environment and does no encryption. Should another user "join your network" the joining user will have access to everything within that network.

The main purpose of this entire post is to illustrate how 1 logical Layer 2 network can be partitioned into Sixteen million logical networks which are identified by VXLAN using a multicast group ranging between "224.0.0.0" and "239.255.255.255". While not fully segregated or encrypted the traffic is obscured which can be beneficial in a lot of environments needing more internal networks than what is normally available when using an automated provisioning system, like Ironic.


Preventing VXLAN Collisions at Scale

While creating a VXLAN network with a random ID and a random multicast address is decent at preventing collisions we want to be more consistent and programmatic about it.

Generating the magic numbers

In OpenStack we can consume metadata and consistently generate variables on a per-user basis which will be used to intelligently isolate traffic from other users within the cloud. To be able to programmatically isolate networks we're going to be generating an integer from the user provided public key as found in the OpenStack metadata service. To get "magic number" the public key will be hash using sha256 which will then be converted to a base36 integer and the returned value will be the modulo of "16776216" which is "1000" less than the maximum number of VXLAN IDs available. This little python script will do everything needed.

import requests
import hashlib
import random
try:
    key = requests.get('http://169.254.169.254/1.0/meta-data/public-keys/0/openssh-key')
except Exception:
    string = str(random.randrange(1, 16776216))
else:
    string = key.content.encode('utf-8')
finally:
    string = hashlib.sha256(string).hexdigest()
    print(int(string, 36) % 16776216)

After executing this script, or running the commands by hand, the output will be used as the VXLAN ID tag and with that we can generate all of the rest of the data we'll need isolate a specific users internal traffic. Set the output as a variable known as VLAN_ID.

Now grab the primary network interface. If you know it, just set the PIF variable accordingly otherwise running the following bash commands will get the interface providing the default route.

PIF=$(ip -o r g 1 | awk '{print $5}')

Now set the following variables to define the multicast group.

read -n 3 FRN_OCT <<< ${VLAN_ID}
FRN_OCT=$(( ${FRN_OCT} % 254 ))
MID_OCT="$(( ${VLAN_ID} % 254 ))"
END_OCT="$(( ${VLAN_ID:${#VLAN_ID}<3?0:-3} % 254 ))"
GROUP_ADDR="${GROUP_ADDR:-230.$FRN_OCT.$MID_OCT.$END_OCT}"

Name the VXLAN network

DEVICE_NAME="vxlan-0"

Now string it all together to create a specific vxlan network.

ip link add ${DEVICE_NAME} type vxlan id ${VXLAN_ID} group ${GROUP_ADDR} ttl 4 dev ${PIF}
ip link set ${DEVICE_NAME} up  

Network Setup Script

This is the mesh creation script (easy button). The script will generate everything needed to isolated a users traffic between hosts, create 10 vxlan type networks, and 6 bridges with unique IP addresses on them. Once run, the script will drop all of the persistent configs in /opt/network-mesh.

Rerunning the network config

If you ever need to rerun anything to recreate an interface or reset a value you can execute the scripts directly as found in the scripts directory which will create various parts of the stack or you can remove the file /var/run/mesh-active and rerunning /opt/network-mesh/run_network_setup.sh which will rerun the entire setup.

NOTICE: If you force rerun /opt/network-mesh/run_network_setup.sh the bridges will be re-created and any ephemeral devices plugged into those bridges may be broken (Container and VMs can be greatly effected by this).

Joining another user's network

If you find yourself in a situation where you need to join another user's deployment you can by simply changing the network-mesh defaults as found here /opt/network-mesh/defaults. In order to join another user's deployment you will need to copy over the defaults file and rerun the network setup. Rerruning the network setup can be done by rebooting the host or by executing the following.