Xen ARM with Virtualization Extensions whitepaper

From Xen
Revision as of 20:17, 29 April 2014 by GarveyPatrickD (talk | contribs) (Mobile platforms and new PV protocols: "such the compass" -> "such as the compass")

Xen on ARM

What is Xen?

Xen is a lightweight, high performance, Open Source hypervisor. Xen has a very low footprint: the ARM port amounts to less than 90K lines of code. Xen is licensed GPLv2 and has an healthy and diverse community that supports it and funds its development. Xen is hosted by the LinuxFoundation, that provides stewardship for the project.

The Xen Architecture

Xen is type-1 hypervisor: it runs directly on the hardware, everything else in the system is running as a virtual machine on top of Xen, including Dom0, the first virtual machine. Dom0 is created by Xen, is privileged and drives the devices on the platform. Xen virtualizes CPU, memory, interrupts and timers, providing virtual machines with one or more virtual CPUs, a fraction of the memory of the system, a virtual interrupt controller and a virtual timer. Xen assigns devices such as SATA controllers and network cards to Dom0, taking care of remapping MMIO regions and IRQs. Dom0 (typically Linux, but it could also be FreeBSD or other operating systems) runs the same device drivers for these devices that would be using on a native execution.

Dom0 also runs a set of drivers called paravirtualized backends to give access to disk, network, etc, to the other unprivileged virtual machines. The operating system running as DomU (unprivileged guest in Xen terminology) gets access to a set of generic virtual devices by running the corresponding paravirtualized frontend drivers. A single backend services multiple frontends. A pair of paravirtualized drivers exist for all the most common classes of devices: disk, network, console, framebuffer, mouse, keyboard, etc. They usually live in the operating system kernel, i.e. Linux. A few PV backends can also run in userspace in QEMU. The frontends connect to the backends using a simple ring protocol over a shared page in memory. Xen provides all the tools for discovery and to setup the initial communication. Xen also provides a mechanism for the frontend and the backend to share additional pages and notify each other via software interrupts.

Xen arch1.png

Even though it is the most common configuration, there is no reasons to run all the device drivers and all the paravirtualized backends in Dom0. The Xen architecture allows driver domains: unprivileged virtual machines with the only purpose of running the driver and the paravirtualized backend for one class of devices. For example you can have a disk driver domain, with the SATA controller assigned, running the driver for it and the disk paravirtualized backend. You can have a network driver domain with the network card assigned, running the driver for it and the network paravirtualized backend. As driver domains are regular unprivileged guests, they make the system more secure because they allow large pieces of code, such as the entire network stack, to run unprivileged. Even if a malicious guest manages to take over the paravirtualized network backend and the network driver domain, it would not be able to take over the entire system. Driver domains also improve isolation and resilience: the network driver domain is fully isolated from the disk driver domains and Dom0. If the network driver crashes it would not be able to take down the entire system, only the network. It is possible to reboot just the network driver domains while everything else remains online. Finally driver domains allow Xen users to disaggregate and componentize the system in ways that would not be possible otherwise. For example they allow users to run a real-time operating system alongside the main OS to drive a device that has real time constraints. They allow users to run a legacy OS to drive old devices that do not have any new drivers in modern operating systems. They allow users to separate and isolate critical functionalities from less critical ones. For example they allow to run an OS such as QNX to drive most devices on the platform alongside Android for the user interface.

Xen arch2.png

Xen on ARM: a cleaner architecture

Xen on ARM is not just a straight 1:1 port of x86 Xen. We exploited the opportunity to clean up the architecture and get rid of the cruft that we accumulated during the many years of x86 development. Firstly we removed any need for emulation. Emulated interfaces are slow and insecure. QEMU, used for emulation on x86 Xen, is a well maintained Open Source project but is big both in terms of binary size and lines of source code. The smaller, the simpler, the better. Xen on ARM does not need QEMU because it does not do any emulation. It accomplishes the goal by exploiting virtualization support in hardware as much as possible and using paravirtualized interfaces for IO. As a result Xen on ARM is faster and more secure.

On x86 two different kinds of Xen guest coexist: PV guests, such as Linux and other Open Source OSes, and HVM guests, usually Microsoft Windows, but any OS can run as HVM guest. PV and HVM guests are quite different from the hypervisor point of view. The difference is exposed all the way up to the user, that needs to choose how to run the guest by setting a line in the VM config file. On ARM we did not want to introduce this differentiation: we felt that it is artificial and confusing. Xen on ARM only supports one kind of guest that is the best of both worlds: it does not need any emulation and relies on paravirtualized interfaces for IO as early as possible in the boot sequence, like x86 PV guests. It exploits virtualization support in hardware as much as possible and does not require invasive changes to the guest operating system kernel in order to run, like x86 HVM guests.

The new architecture designed for Xen on ARM is much cleaner and simpler and it turned out to be a very good match for the hardware.

Xen on ARM: virtualization extensions

ARM virtualization extensions provide 3 levels of execution: EL0, user mode, EL1, kernel mode, and EL2, hypervisor mode. They introduce a new instruction, HVC, to switch between kernel mode and hypervisor mode. The MMU supports 2 stages of translation. The generic timers and the GIC interrupt controller are virtualization aware.

Xen arm arch1.png

ARM virtualization extensions are a great fit for the Xen architecture:

  • Xen runs entirely and only in hypervisor mode
    Xen leaves kernel mode for the guest operating system kernel and EL0 for guest user space applications. Type-2 hypervisors need to frequently switch between hypervisor mode and kernel mode. By running entirely in EL2 Xen significantly reduces the number of context switches required.
  • HVC, the new instruction, is used by the kernel to issue hypercalls to Xen
  • Xen uses 2-stage translation in the MMU to assign memory to virtual machines
  • Xen uses generic timers to receive timer interrupts as well as injecting timer interrupts and exposing the counter to virtual machines
  • Xen uses the GIC to receive interrupts as well as injecting interrupts into guests
Xen arm arch2.png

Xen discovers the hardware via device tree. It assigns all the devices that it does not use to Dom0 by remapping the corresponding MMIO regions and interrupts. It generates a flatten device tree binary for Dom0 that describes exactly the environment exposed to it. Dom0's device tree contains:

  • the exact number of virtual cpus that Xen created for it (maybe less than the number of physical cpus on the platform)
  • the exact amount of memory that Xen gave to it (surely less than the amount of physical memory available)
  • the devices that Xen re-assigned to it and no more (not all devices are assigned to Dom0, at the very least one UART is not)
  • an hypervisor node to advertise the presence of Xen on the platform

Dom0 boots exactly the same way it would boot natively. By using device tree to discover the hardware, Dom0 finds out what is available and loads the drivers for it. It does not try to access interfaces that are not present and therefore Xen does not need to do any emulation. By finding the Xen hypervisor node, Dom0 knows that it is running on Xen and therefore can initialize the paravirtualized backends. Other DomUs would load the paravirtualized frontends instead.

Xen on ARM: code size

We wrote previously that the new architecture turned out to be a very good match for the hardware. This is proven by the code size: the smaller the better. Xen on ARM is 1/6 of the code size of x86_64 Xen, while still providing a similar level of features. In Xen 4.4.0:

Common ARMv7 ARMv8 Total
xen/arch/arm 11,767 3,503 1,812 17,082
C 11,587 954 813 13,354
ASM 180 2,549 999 3,728
xen/include/asm-arm 4,786 984 1,050 6,820
Total ARM 16,553 4,487 2,862 23,902
x86_64 Total
xen/arch/x86 124,615
xen/include/asm-x86 18,530
Total x86_64 143,145

Porting Xen to a new SoC

Assuming that you already have a functional Dom0 kernel (usually Linux) for your SoC, porting Xen to it is a very simple task. In fact in terms of devices, Xen only uses:

  • GIC
  • generic timers
  • SMMU
  • one UART for debugging

Therefore the porting effort is limited to writing a new UART driver for Xen (if the SoC comes with an unsupported UART) and the code to bring up secondary CPUs (if the platform does not support PSCI, for which Xen has already a driver). See for example the Exynos 4210 Xen driver and the Exynos5 platform code.

Porting an operating system to Xen on ARM

Porting an OS to Xen on ARM is easy: it does not require any changes to the operating system kernel, only a few new drivers to get the paravirtualized frontends running and to obtain access to network, disk, console, etc. The paravirtualized frontends rely on:

Once the OS has support for the basic building blocks, the next step is introducing the paravirtualized frontend drivers. You are likely to be able to reuse the existing ones:

Mobile platforms and new PV protocols

Virtualizing a modern mobile platform involves dealing with devices such as camera, compass, gps, etc, for which PV frontend and backend drivers do not exist today. If only one VM needs access to one of these devices at a time, you can simply assign the device to the VM, remapping the corresponding MMIO regions and interrupts. If multiple VMs need access to the device simultaneously, you have to write a new pair of PV frontend and backend drivers. Fortunately many open source implementations of PV frontends and backends for different class of devices already exist in Linux and other operating systems. Something similar is likely to already exist. The difficulty of writing a new pair of PV frontends and backends increases with the complexity of the device you are trying to share. If the device is simple, such as the compass, writing the new pair of drivers is going to very easy. If the device is complex, such as a 3d graphic accelerator, writing the new pair of frontends and backends is going to be difficult.