Xen Project 4.10 Feature List
Re-architecting the Xen Project Hypervisors x86 Core
Starting from Xen Project 4.8, the Xen Project team has been working on overhauling the x86 core of the Xen Project Hypervisor. The major work items in progress are PVHv2, de-privileging QEMU and splitting the PV and HVM code paths, to enable building a HVM and PVHv2 only hypervisor.
Xen 4.10 adds a PVHv2 xl/libxl stable interface for unprivileged guests. Although PVHv2 DomU support has been present in previous Xen releases the lack of a stable toolstack interface prevented users from reliably testing and deploying this feature. From Xen 4.10 onwards PVHv2 DomU is a supported feature.
PVHv2 guests are lightweight HVM guests which use PV drivers for I/O and native interfaces for the rest of the operations. Unlike HVM guests, PVHv2 guests do use QEMU device emulation. This reduces the memory footprint of Xen Project based systems significantly as for each HVM guest a QEMU instance runs in Dom0.
In addition, PVHv2 relies on Hardware Virtualization extensions and does neither use the PV kernel infrastructure nor the PV MMU, significantly reducing the number of Xen specific interfaces a PVHv2 guest uses compared to PV.
Consequently, PVHv2 guests have a much smaller TCB and attack surface compared to PV and HVM guests. Removing a large component such as QEMU, which consists of approximately 1.2 million lines of code - twice as much as the Xen Project Hypervisor itself, significantly reduces the potential of security vulnerabilities in a Xen Project based software stack compared to HVM guests.
In contrast to HVM based virtualization, PVHv2 does require operating system support, which is available in Linux 4.11 or newer.
In Xen Project 4.4 and 4.5 we introduced a virtualization mode called PVH: This is essentially a PV guest using PV drivers for boot and I/O and hardware virtualization extensions for everything else. In late 2015, we started an initiative to re-architect and simplify PVH: PVHv2 was born. PVHv2 addresses key limitations of PVHv1, such as: not restricted to a specific paging mode decided at boot time, less usage of hypercalls and availability of some emulated platform devices provided by Xen itself.
Xen Project 4.8 laid the groundwork for PVHv2. In Xen Project 4.9 we completed most of the Hypervisor portion of PVHv2 and removed PVHv1. In Xen Project 4.10 we completed PVHv2 DomU by providing a stable interface in the toolstack in order to manage PVHv2 guests. We also delivered all necessary Linux Functionality for PVHv2 Guest support in Linux 4.11.
What is next?
PVHv2 support for FreeBSD is currently being reviewed, but has not yet been committed. Support for Dom0 PVHv2 and pci-passthrough for DomU will follow in a subsequent release. Work on supporting EFI boot in addition to support for Direct Kernel Boot is in progress.
In addition, we started the groundwork for wrapping the PV ABI inside a PVH container, which will eventually allow removal of the PV ABI from Xen and the Linux kernel, while allowing users to run legacy PV guest images on hardware with virtualization extension.
User Interface Changes
In Xen Project 4.10 we have made significant changes to the user interface. Guest types are now selected using the type option in the configuration file, where users can select a pv, pvh or hvm guest. The builder option is being deprecated in favor of the type option. The pvh option has been removed and a set of PVH specific options have been added (see here).
It is now also possible to modify certain hypervisor boot parameters without the need to reboot Xen. In 4.10 the main focus was on adding the needed infrastructure to the hypervisor and the tools. Modifying console message parameters (like log level) at runtime is now possible removing the need to reboot the host in case a higher log level is needed e.g. to analyze a problem.
The maximum number of grant table entries of a domain can be selected on a per domain base in 4.10. Before that this value was only modifiable via a boot parameter of the hypervisor, which is acting now as an upper bound of the per domain value and which can be modified at runtime, too.
In Xen 4.10 we added a machine-readable file (SUPPORT.md) to describe support related information about this release, collating all support related information into a single place. The document defines support status a Xen Project Features and whether features are security supported and to which degree (e.g. a feature may be security supported on x86, but not on ARM). This file will be back-ported to older Xen releases and will be used to generate support information for Xen Project releases and will be published on xenbits.xen.org/docs/ (for Xen 4.10, see here). Centralizing security support related information is a pre-condition to become a CVE Numbering authority.
dm_restrict: In Xen 4.9 the interface between Xen Project software and QEMU was completely re-worked and consolidated via DMOP. In Xen 4.10 we built on DMOP and added a Technology Preview for dm_restrict to restrict the scope of what device models such as QEMU can do after startup. This feature limits the impact of security vulnerabilities in QEMU which is used for HVM guests only (PV, PVH and ARM guests do not use QEMU): in other words, QEMU vulnerabilities that could normally be used to escalation privileges to the whole host, cannot escape its sandbox.
Splitting PV and HVM code: a slimmer x86 Hypervisor
One of the long-term goals of the Xen Project is to build a slimmer x86 Hypervisor, allowing users to build a Xen Hypervisor without PV guest support via KCONFIG. Once completed this will improve security by significantly reducing the Hypervisor’s attack surface, reduce memory footprint and will allows us to reclaim precious address space enabling Xen to support >16TB of host memory.
To do this, we started re-factoring of the Xen Project Hypervisor x86 code, cleanly separating PV and PVH/HMV code, while in parallel reviewing the code for security issues. In Xen 4.10 several major x86 components have been re-factored and reviewed. This work will require several release cycles to complete.
L2 CAT for Intel CPUs: In Xen 4.10 we added support for Intel’s L2 Cache Allocation Technology(CAT) which available on certain models of (Micro) Server platforms. Xen L2 CAT support provides Xen users a mechanism to partition or share the L2 Cache among virtual machines, if such technology is present on the hardware Xen runs. This allows users to make better use of the shared L2 cache depending on the VM characteristic (e.g. priority).
Local Machine-Check Exception(LMCE) for Intel CPUs: In Xen 4.10 we implemented LMCE support for HVM guests. A LMCE, if the affected vCPU is known, will be injected to related vCPU, otherwise, the LMCE will be broadcasted to all vCPUs running on the host. It allows MCE being passed more efficiently from hypervisor to virtual machines for further handling. Besides, the quality of existing MCE code is also improved and test code is provided to verify the functionality of the MCE.
User Mode Instruction Prevention(UMIP) for Intel CPUs: User-Mode Instruction Prevention (UMIP) is a security feature present in new Intel Processors. If enabled, it prevents the execution of certain instructions if the Current Privilege Level (CPL) is greater than 0. Xen 4.10 exposes UMIP to virtual machine so virtual machine can take advantage of this feature.
SBSA UART Emulation for ARM CPUs: In Xen 4.10 we implemented SBSA UART emulation support in the in the Xen Project Hypervisor and made it accessible through the command line tools. This enables the guest OS to access the console when no PV console driver is present. In addition, the SBSA UART emulation is also required to be compliant with the VM System specification.
ITS support for ARM CPUs: Xen 4.10 adds support for ARM’s Interrupt Translation Service (ITS) which accompanies the GICv3 interrupt controller such as the ARM CoreLink GIC-500. ITS support allows the Xen Project Hypervisor to harnesses all of the benefits of the GICv3 architecture, improving interrupt efficiency and allows for greater virtualization on-chip. ITS support is essential to virtualize systems with large amounts of interrupts. In addition, ITS increases isolation of virtual machines by providing interrupt remapping, enabling safe PCI passthrough on ARM.
GRUB2 on ARM 64: The GRUB community recently merged support to boot Xen on ARM 64 bit CPUs platform. GRUB2 support for ARM 64 improves the user experience when installing Xen via distribution package on UEFI platform.
Secure Monitor Call (SMC) Compliance for ARM CPUs: Xen Project 4.10 added support for the SMC Calling Convention. This allows VMs to issue standards compliant SMC calls to TrustZone Secure Monitors such as OP-TEE, which is a key element of secure embedded electronics stacks as used in IoT and automotive.
Support for latest System-on-chip (SoC) technology: The Xen Project now supports Qualcomm Centriq 2400 and Cavium ThunderX.
Improvements to Existing Functionality
Credit 2 scheduler improvements: In Xen 4.10, we added soft-affinity support for the Credit 2 scheduler, which allows users to specify a preference for running a VM on a specific CPU. This enables NUMA aware scheduling for the Credit 2 scheduler. In addition, we added cap support, which allows users to set the maximum amount of CPU a VM will be able to consume, even if the host system has idle CPU cycles.
Null scheduler improvements: A number of improvements to the "null" scheduler, which guarantees near zero scheduling overhead, significantly lower latency, and more predictable performance have been delivered in Xen 4.10. We introduced tracing support to enable users to optimise workloads and introduced soft-affinity: this can improve caching and improve performance for some workloads.
RTDS scheduler improvements: The extratime option was added to the RTDS scheduler.
VMI improvements: a number of performance improvements have been made to VMI. In addition, we added a software page table walker to VMI on ARM, which lays the groundwork to alt2pm for ARM CPUs. For more information on alt2pm is available here.
PV Calls Drivers in Linux: In Xen 4.9 we introduced the PV Calls ABI which allows forwarding POSIX requests across guests, which for example allows guest networking socket calls to be executed in Dom0, enabling a new networking model that is a natural fit for cloud-native apps. The PV Calls backend driver was added to Linux 4.14.