Xen paravirt_ops for x86 Linux
What is paravirt_ops?
paravirt_ops (pv-ops for short) is a relatively new piece of Linux kernel infrastructure to allow it to run paravirtualized on a hypervisor. It currently supports VMWare's VMI, Rusty's lguest, and most interestingly, Xen.
The infrastructure allows you to compile a single kernel binary which will either boot native on bare hardware (or in hvm mode under Xen), or boot fully paravirtualized in any of the environments you've enabled in the kernel configuration.
It uses various techniques, such as binary patching, to make sure that the performance impact when running on bare hardware is effectively unmeasurable when compared to a non-paravirt_ops kernel.
At present paravirt_ops is only available in the x86 architecture, though the ia64/Xen developers are implementing a form of it for their architecture (XenIA64/UpstreamMerge), and are sharing the non-architecture specific pieces of Xen (such as the pv device drivers).
Xen support has been in mainline Linux since 2.6.23, and is the basis of all on-going Linux/Xen development (the old Xen patches officially ended with 2.6.18.x-xen, though various distros have their own forward-ports of them). Redhat has decided to base all their future Xen-capible products on the in-kernel Xen support, starting with Fedora 9.
Current state
Xen/paravirt_ops has been in mainline Linux since 2.6.23, though it is probably first usable in 2.6.24. While I wouldn't put it in production just yet, for normal desktop/developer workloads it has proven to be pretty stable.
It is definitely a work in progress, so I recommend using as modern kernel as possible.
- Features in 2.6.26:
- x86-32 support
- SMP
- Console (hvc0)
- Blockfront (xvdX)
- Netfront
- Balloon (contraction only)
- paravirtual framebuffer + mouse (pvfb)
- 2.6.26 onwards will be PAE-only
- Queued for 2.6.27:
- x86-64 support
- Save/restore/migration
- Further pvfb enhancements
- Work in progress:
- dom0 support
- pv-hvm driver support
- Balloon expansion (using memory hotplug)
- To be done:
- CPU hotplug
- Device hotplug
- Other device drivers
- kdump/kexec
- ...?
Using Xen/paravirt_ops
Building
- Get a current kernel. The latest kernel.org kernel is generally a good choice.
- Configure as normal; you can start with your current .config file
- Make sure you have CONFIG_X86_PAE enabled (whcih is set by selecting CONFIG_HIGHMEM64G)
- non-PAE mode doesn't work in 2.6.25, and has been dropped altogether from 2.6.26.
- Enable these core options:
- CONFIG_PARAVIRT_GUEST
- CONFIG_XEN
- And Xen pv device support
- CONFIG_HVC_DRIVER and CONFIG_HVC_XEN
- CONFIG_XEN_BLKDEV_FRONTEND
- CONFIG_XEN_NETDEV_FRONTEND
- And build as usual
Running
The kernel build process will build two kernel images: arch/x86/boot/bzImage and vmlinux. They are two forms of the same kernel, and are functionally identical. However, only relatively recent versions of the Xen tools stack support loading bzImage files (post-Xen 3.2), so you must use the vmlinux form of the kernel (gzipped, if you prefer). If you've built a modular kernel, then all the modules will be the same either way. Some aspects of the kernel configuration have changed:
- The console is now /dev/hvc0, so put "console=hvc0" on the kernel command line
- Disk devices are always /dev/xvdX. If you want to dual-boot a system on both Xen and native, then it's best that use use lvm, LABEL or UUID to refer to your filesystems in your /etc/fstab.
Testing
Xen/paravirt_ops has not had wide use or testing, so any testing you do is extremely valuable. If you have an existing Xen configuration, then updating the kernel to a current pv-ops and trying to use it as you usually would, then any feedback on how well that works (success or failure) would be very interesting. In particular, information about:
- performance: better/worse/same?
- bugs: outright crash, or something just not right?
- missing features: what can't you live without?
Debugging
If you do encounter problems, then getting as much information as possible is very helpful. If the domain crashes very early, before any output appears on the console, then booting with: should provide some useful information. If you are running a debug build of Xen (set "debug = y" in Config.mk in the Xen source tree), then you should get crash dumps on the Xen console. You can view those with "xm dmesg".
Contributing
Xen/paravirt_ops is very much a work in progress, and there are still feature gaps compared to 2.6.18-xen. Many of these gaps are not a huge amount of work to fill in.
Devices
The Xen device model is more or less unchanged in the pv-ops kernel. Converting a driver from the xen-unstable or 2.6.18-xen tree should mostly be a matter of getting it to compile. There have been changes in the Linux device model between 2.6.18 and 2.6.26, so converting a driver will mostly be a matter of forward-porting to the new kernel, rather than any Xen specific issues.
CPU hotplug
All the mechanism should already be in place to support CPU hotplug; it should just be a matter of making it work.
Device hotplug
In principle this is already implemented and should work. I'm not sure, however, that it's all plumbed through properly, so that hot-adding a device generates the appropriate udev events to cause devices to appear.
Device unplug/module unload
The 2.6.18-xen patches don't really support device unplug (and driver module unload), mainly because of the difficulties in dealing with granted pages. This should be fixed in the pvops kernel. The main thing to implement is to make sure that on driver termination, rather than freeing granted pages back into the kernel heap, they should be added to a list; that list is polled by a kernel thread which periodically tries to ungrant the pages and return them to the kernel heap if successful.
Getting the current development version
All x86 Xen/pv-ops changes queued for upstream Linus are in Ingo Molnar's tip.git tree. You can get general information about fetching and using this tree in his README. The x86/xen topic branch contains most of the Xen-specific work, though changes in other branches may be necessary too. Using the auto-latest branch is the merged product of all the other topic branches.
Bleeding edge patches
The current day-to-day development is happening in a mercurial patch queue. This queue is very raw, and not guaranteed to work (or even apply) from moment to moment. It is, however, the best place to see the current state of work-in-progress items.
At any given time, the queue is based on the mercurial mirror of linux-2.6.git, available here. The first patch is always "x86/x86.patch", which includes the current state of tip.git, followed by the current in-progress patch queue.
To get a working tree:
hg clone http://www.kernel.org/hg/linux-2.6
cd linux-2.6/.hg
hg clone http://xenbits.xensource.com/paravirt_ops/patches.hg patches
cd ..
ln -s .hg/patches . # for convenience
hg update `cat patches/KERNEL_VERSION`
hg qpush -a
(You may need to add this to your ~/.hgrc file:
[extensions] hgext.mq=
)
Contact
Please mail questions/answers/patches/etc to the Xen-devel mailing list.
