Windows PV Drivers Presentation
This page contains the speaker's notes for the overview presentation of the Windows PV Drivers of Xen Project.
- 1 Presentation:
- 2 Notes:
- 2.1 Slide 1
- 2.2 Slide 2
- 2.3 Slide 3
- 2.4 Slide 4
- 2.5 Slide 5
- 2.6 Slide 6
- 2.7 Slide 7
- 2.8 Slide 8
- 2.9 Slide 9
- 2.10 Slide 10
- 2.11 Slide 11
- 2.12 Slide 12
- 2.13 Slide 13
- 2.14 Slide 14
- 2.15 Slide 15
- 2.16 Slide 16
- 2.17 Slide 17
- 2.18 Slide 18
- 2.19 Slide 19
- 2.20 Slide 20
- 2.21 Slide 21
- 2.22 Slide 22
- 2.23 Slide 23
Windows PV Drivers: Architecture, History, and Usage by Paul Durrant, 2015
Hi, I’m Paul Durrant. I’m a principal engineer in the XenServer group at Citrix and I’m project lead for the XenProject Windows PV Drivers.
In this presentation I’m going to be giving an overview of the drivers.
We’ll start with the origins of the drivers, and the journey from the original XenServer-specific closed-source ‘Legacy’ drivers, through the open source XenServer drivers (dubbed the ‘Standard’ drivers in Citrix and available on GitHub), to the current generic XenProject drivers, the source of which is now hosted on Xenbits.
I’ll then move on to the way that functionality is broken down into interfaces, how they are provided and consumed, and how compatibility is maintained as they evolve.
And finally I’ll give a brief overview of what you need to do to build and install the drivers, and contribute to the project.
To start with I need to introduce some Windows driver terminology and some conventions I’ve used in the diagrams in this presentation.
Windows devices are organized into a tree, or a set of trees, rooted at what’s called a Physical Device Object or PDO. In my view of the world trees grow downwards so I put PDOs at the top
Normally a PDO just represents a piece of hardware, which is not that useful unless you have some code to talk to it. That code is called a Function Driver and when a function driver attaches to a PDO it creates a corresponding Function Device Object or FDO.
Unlike some OS, such as Linux, Windows has a concept of demand-loading drivers. Hence function drivers do not contain code to discover their hardware. Instead they are part of a package described by what is called an INF file. In that INF file there are entries to tell Windows what PDO ‘names’ a particular function driver will ‘bind’ to, So, as Windows builds its device tree it can look at the names of newly created PDOs and determine which Function Drivers to load.
A Function Driver can also be what’s called a Bus Driver. That means that, having created its FDO, it can also create PDOs. For example, the root PCI driver binds to a PDO created by the ACPI driver (which is parsing the DSDT). It will create an FDO to bind to that, enumerate the root bus (using PCI config cycles) and create a PDO for each unique bus/device/function that it finds.
The first set of drivers we’ll mention are the closed source ‘Legacy’ drivers.
Before XenServer 6.1 was released, these were the only PV drivers and they were getting pretty long in the tooth. I believe they were written for Windows 2000 support on the first version of XenServer (or possibly even XenEnterprise?) to support HVM guests.
They are still used in XenServer today, but only for Windows Server 2003 (and XP before it went EOL).
Citrix have never provided source for these drivers, and that is mainly because there is code in them that is of unknown origin. Also, there is less and less point in doing so as time goes by. Server 2003 will be EOL this year (2015), at which point these drivers will finally be consigned to history.
To give you an idea of why Citrix made these drivers ‘Legacy’ and replaced them with a new set for Vista onwards, let’s take a look at the structure of the driver packages and how they (just about) hang together…
The first thing you’ll notice is there are essentially two ‘root’ PDOs. The one on the right is the Xen Platform PCI device, created by QEMU, and a key part of any HVM guest running on pretty much any Xen distribution. The one on the left, however, is synthesized by a driver installer package.
The main virtual bus driver is called XENEVTCHN (don’t know why) and that, along with the export driver XENUTIL (an export driver is like a kernel DLL), is where most of the code that talks to Xen lives. XENEVTCHN is the ultimate parent of the PV network devices, but not the storage devices. Those are dealt with by XENVBD, which binds directly to the PCI device, but uses code in XENUTIL to co-ordinate with XENEVTCHN.
The XENVBD package also installs a filter driver, SCSIFILT. The reason for this driver is that (because it needs to work on versions of Windows older than Vista) XENVBD uses a very old storage driver API in Windows called SCSIPORT, and SCSIPORT has very poor locking semantics and only a single request queue for an HBA. This makes it very slow. SCSIFILT is designed to sit between the generic Windows DISK driver and the XENVBD and intercept storage requests. Being a filter driver it’s not bound by any logo requirement to use a standard Windows storage API and so it bypasses the whole SCSIPORT queuing and locking framework and talks directly to the PV backends, which is a lot faster.
Back over on the left, you can see the XENNET driver for PV networks devices but in between that and XENEVTCHN is another driver, XENVIF. Because the legacy drivers used to be used for versions of Windows all the way from Server 2000 through to 7 and Server 2008R2 they actually had to have two distinct versions of the XENNET driver. Between releasing Server 2003 and Vista Microsoft changed the NDIS API in an incompatible way, so anyone writing Windows network drivers needed to fork their code. Server 2003 and before uses NDIS version 5.x and Vista onwards uses version 6.x.
The original code had both these flavours of XENNET but there was a lot of code duplicated between them and when bugs cropped up it was easy to end up applying a fix to one driver that really should be applied to both. I therefore re-wrote the drivers, moving all the common code into a driver called XENVIF which I also made the parent of all XENNETs, to allow for dynamic interface discovery which is something we’ll come onto later.
So this rather complex structure causes some problems…
SCSIFILT, whilst working round the deficiencies of SCSIPORT, causes some problems. There are utilities which directly open storage devices (SCSIPORT allows this) and send read and write requests. Those requests, because they did not come from the DISK driver, bypass SCSIFILT and thus XENVBD has to have a very odd ‘loopback’ path where it injects the requests into the storage stack as if they did come from the DISK driver to allow them to be intercepted by SCSIFILT. Also, because there are some circumstances where SCSIFILT is not loaded (e.g. if a disk is disabled in Device Manager) both XENVBD and SCSIFILT must have code to deal with the PV state modes, for purposes of VBD unplug… which is more code duplication.
Cross-package linkage dependencies (generally to XENUTIL) are a massive problem. There never was a defined ABI and so it was very easy for packages to become binary incompatible leading to very odd BSODs during upgrade. Really there is no safe way to upgrade legacy drivers… it is best to remove the old set before adding the new set. But, that requires two reboots.
The two root nodes also cause a big problem. Initialization of the PV interfaces to Xen need to be done before either XENEVTCHN or XENVBD can fully function, but you never know which one is going to come up first, and worse… a resource rebalance (something Windows may need to do to redistribute interrupts for example) means either one can be unloaded and reloaded at any time. This makes the initialization code very very complicated, non-obvious and fragile.
Finally, the use of a synthetic root node completely precludes deployment via Windows Update as those nodes can only be created by a driver installer.
Windows Update deployment has always been a goal for Citrix and so that final point is really a showstopper for these drivers.
So, what did we do… We wrote some new drivers, which are now dubbed the ‘standard’ drivers.
Now, it so happened that around the time these drivers were getting towards being fully functional Microsoft changed the landscape and said that all drivers for the then new Windows 8 and Server 2012 release had to be built with the new WDK and the oldest version of Windows supported by that WDK is Vista. So, it was decided that the new set of drivers would only support Vista onwards and older OS would continue with the legacy drivers.
This is the structure of the standard drivers…
As you can see, it’s a bit simpler than the structure of the legacy drivers.
It’s basically a single tree structure with the only complex part surrounding the root node. The new parent bus driver XENBUS binds to the PCI device (which you’ll not has a new ID… more on that later), but makes use of an export driver called XEN. Then there’s a filter driver called XENFILT which actually sits not only between the Xen platform PCI device’s PDO and XENBUS but also between all PCI PDOs and their function drivers.
The reason for the presence of XENFILT is that it allows us to execute code before QEMU emulated devices are exposed to the Windows PnP subsystem and hence we can make sure that emulated device unplug occurs early enough in boot such that Windows does not see devices disappearing when the unplug occurs.
The use of the XEN export driver also gives us a useful hook. Its DllInitialize() routine is called only at boot time, allowing us to perform Xen operations which only need doing after initial domain creation and do not need to be, or should not be, repeated on domain resume (i.e after a suspend or migrate).
This new set of drivers addressed the major shortcomings…
Because the drivers only support Vista onwards, XENVBD could use the newer and much better performing STORPORT storage API and thus SCSIFILT was no longer needed, which reduced complexity.
Because there are no cross package link dependencies, the installation ordering issues and binary compatibility issues during upgrade were solved.
And crucially, because of the single PCI enumeration root node the drivers no longer require an installer and this allows them to be deployed via Windows Update.
But there were still some problems…
That new device ID… The idea of changing it from the standard Xen platform ID was…
When drivers are posted to Windows Update you can only control their deployment by OS version and physical device name. So, if we were to post drivers that deployed on the standard Xen platform physical device anyone anywhere in the world, with a Windows HVM guest (not just on XenServer but AWS for instance) would suddenly start getting drivers from Windows Update! The standard drivers are also completely incompatible with the legacy drivers - installing them before removing legacy drivers leads to instant BSOD. Unsurprisingly, we did not want this to happen.
The big problem with this new device ID is that, to use standard drivers you need to have a PCI device with the new ID and that made upgrade from legacy to standard somewhat complex. Another problem though was that the new device ID required changes to QEMU and the host toolstack, which was unacceptable to the upstream community. Another way was needed.
There was also a second problem. Use of interface discovery removed the cross package link dependencies and made the load ordering of drivers flexible, but the interface compatibility check is an exact match, which means drivers must still be upgraded together otherwise you may get a non-functional system… and that’s a bit of a problem if you are getting drivers from Windows Update and you upgrade your XENNET driver first and then find the new one is not compatible with your XENVIF. How are you going to get the new version of XENVIF without a network? This required some more thought.
Now, in 2013 XenServer went fully open source and this included the standard PV driver source (which, as I said, went onto GitHub).
However, there was a desire to make the Windows drivers even more open such that they would work on most Xen installations.
I therefore proposed, in mid 2014, that the Linux Foundation adopt the PV drivers as a sub-project of the Xen Project. The advisory board agreed to this in June and there is now a project front-page on xenproject.org, source repositories on xenbits and even publicly available binaries courtesy of a build VM hosted by Rackspace.
I’m the project lead, chief maintainer and committer. My Citrix colleagues Ben and Owen are also committers and maintainers.
Citrix plan to use these upstream drivers in the next version of XenServer for all versions of Windows (since XP is already gone and Server 2003 will be gone by mid 2015).
Like the standard drivers, they will be built for XenServer with the Windows 8 WDK and VS 2012, although they can be built with the 8.1 WDK and VS 2013. The reason we don’t plan to update our toolchain for XenServer is because the 8.1 WDK doesn’t support any OS prior to Windows 7, and Vista and Server 2008 are still in support.
Also, the Xen Project drivers have addressed the device ID and compatibility problems of the standard drivers. We’ll come to how in a moment…
The structure of the drivers is basically identical to the standard drivers…
The crucial differences are in how we handle binding to the PCI device and the details of how interface discovery is managed…
The new XENBUS can now bind to 3 different devices…
Depending on where your VM comes from you will have one of the two devices on the left. However, for Windows Update purposes in a XenServer VM you may also end up with the new device on the right. (The C000 device ID is reserved for XenServer for the purposes of Windows Update in a header in the main Xen source repository.)
When XENBUS installs, there’s a module called a co-installer (that is part of the package) that runs just before the driver binds to the PDO and again just after. This module can be used to control how XENBUS behaves…
If the Windows Update device (on the right) is present then the XENBUS instance bound to that will be ‘active’ and the other instance bound to the device on the left will not be. However if the Windows Update device is not present than, when XENBUS binds to one of the devices on the left, that will be active.
Only the active instance of XENBUS will talk to XEN and only the active instance of XENBUS will enumerate child PDOs and those PDOs will carry the device ID of the PCI device in their name. This makes all PDOs that Citrix will target for Windows Update distinct from those in an, say, AWS VM but still allows the drivers to be installed into a generic Xen VM without needing toolstack or QEMU modifications.
Dynamic interface discovery works using an IRP_MJ_PNP minor number that Microsoft dedicate for that purpose: IRP_MN_QUERY_INTERFACE.
A child driver will pass an IRP_MN_QUERY_INTERFACE IRP to its parent, identifying the interface it wants to get hold of with a GUID and version number. The IRP references a data buffer and, if the parent recognises the GUID it fills the buffer with a jump table and some context information and completes the IRP with a success code. If the parent doesn’t recognize the GUID it could fail the IRP but generally it will pass it on up to its parent. Hence a driver can usually get hold of an interface implemented by any of its ancestors (or filters thereof). For example XENNET makes use of an interface provided by XENBUS, even though XENVIF sits between the two.
Let’s look at a concrete example; XENVIF getting hold of the EVTCHN interface from XENBUS.
The XENVIF codebase carries a copy of the evtchn_interface header from XENBUS. The code uses this header to allocate a structure of type XENBUS_EVTCHN_INTERFACE. It then passes this buffer via an IRP_MN_QUERY_INTERFACE message using the GUID defined in the header and always requesting XENBUS_EVTCHN_INTERFACE_VERSION_MAX, so get the latest version of the interface specified in the header. (We’ll discuss the subject of interface compatibility in a moment).
XENBUS, recognizing the GUID, will fill the buffer with a jump table of functions implementing the various interface methods, and an opaque context pointer and pass this back to XENVIF by completing the IRP with a success code.
XENVIF then acquires a reference to the interface by calling the ‘Acquire’ method (using the convenience macro in the header) before using any of the other methods. When XENVIF is done with the interface it calls the ‘Release’ method. All the interfaces in the PV drivers have the same semantic.
You may ask why the acquisition of the interface is not done during the query? The reason is that during a VM transition to S4 (hibernation), most interface providers need to know that their interfaces are not in use (because internal state will need to be re-built on resume – since the VM will resume in a new domain). Thus subscribers can call Release to say they are done, without the need to re-query for the interface on resume (which is problematic since querying must be done at passive level and most of the resume code has to execute at dispatch level – because it needs to use spin locks).
Now for how compatibility is managed.
A driver that is providing interfaces maintains a PDO revision for each combination of interface versions that it provides. For example, a driver may provide the FOO and BAR interfaces and support version 1 and version 2 of each. It therefore maintains 4 PDO revisions:
1 -> FOO v1 BAR v1
2 -> FOO v1 BAR v2
3 -> FOO v2 BAR v1
4 -> FOO v2 BAR v2
A consuming driver needs to bind to the PDO revision representing the versions of the interfaces it will use. The reason that it’s important to bind to the right PDO revision is slightly complex…
If, say, it is decided to retire version 1 of the FOO interface then any consuming driver coded to use it would no longer work. By removing the PDO revisions that represent that version (1 and 2 in this case) consuming drivers will actually no longer bind – which will prompt Windows to look for a new driver on Windows Update. If the providing driver were not to remove old PDO revisions in this way, then nothing would trigger that search for a new driver and the system would remain non-functional until an update was scheduled or manually triggered.
The other important thing that a consuming driver package should do is to set some registry keys in the providing drivers Interfaces subkey that state which version of each of the interfaces it consumes it is expecting to query for. The reason this is important is when an update of the providing driver is attempted the co-installer in that package can veto the upgrade if the new driver no longer supports a version of an interface relied upon by a consuming driver. The intention is that updates to consuming drivers should always be rolled out in advance of retirement of an interface version, to try to ensure that version is no longer in use by the time the update of the providing driver is rolled out. But, we cannot control the behaviour of administrators or require that they always update in the ‘correct’ order, so the registry key mechanism ensures that the system remains functional even if it is out-of-date.
So, now that I’ve discussed how interfaces are discovered at run time and how compatibility is maintained let’s just quickly run through the interfaces that exist today and what they do. I’ve listed them by driver…
XENFILT provides a couple of interfaces…
The UNPLUG interface actually only has a single method. The internals of the code deal with unplugging emulated devices early in boot if PV drivers are present in the system and this can be pretty much entirely dealt with internally except for one case… Resume from suspend. In this case you have a new domain, with new emulated devices, but the OS has already booted so the usual unplug sequence doesn’t get re-run. So, just after the return from the suspend hypercall the XENBUS code has to call the Replay method to re-unplug the emulated devices that were unplugged when the OS booted.
The EMULATED interface on the other hand is there so that other drivers can find out what devices are currently being emulated. This is used by, for instance, XENVBD so that it can avoid bringing a PV disk online when the equivalent emulated disk is present.
XENBUS provides the bulk of the interfaces…
The DEBUG interface is there so that drivers can register functions to be called when the ‘q’ debug key is hit (e.g. xl debug-keys q)
The SUSPEND interface is there so that drivers can register functions to be called on resume from suspend. These functions need to perform such tasks as attaching to PV backends since, upon resume, the OS will be running in a fresh domain with new backends. There are two flavours of callback:
Early – called on VCPU 0, with interrupts disabled (all other VCPUs spinning with interrupts disabled)
Late – called again on VCPU 0, but with interrupts enabled (all other VCPUs spinning with interrupts enabled)
The SHARED_INFO interface is actually mostly for XENBUS’s internal use as it underpins the use of Xen event channels but one of the methods, GetTime, is used by the XENIFACE driver to resynchronise VM time with Xen on resume from suspend (otherwise the OS time would go backwards by however long the VM was suspended).
The EVTCHN interface is an abstraction of Xen event channels that provides methods to Open and Close channels, send events, etc.
The STORE interface is a xenstore frontend providing methods to Read, Write and enumerate xenstore keys,start and end transactions and register watch events.
The RANGE_SET interface exposes functionality that’s used internally by XENBUS for maintaining sets of dis-contiguous ranges. This sort of API is commonly provided by an OS kernel (and there is even an implementation in Xen, for internal use) but Windows doesn’t provide such a thing.
Similarly, the CACHE interface is a caching object allocator – similar in some ways to the slab allocator in Solaris or Linux – but here layered on top of Windows non-paged pool. Again, this is useful functionality that would normally be provided by an OS kernel but that’s missing in Windows.
The GNTTAB interface manages the guest grant tables. Granting is the way one domain gives permission to another domain to access its memory and is a crucial part of all PV protocols. The interface allows creation of caches of grant entries dedicated to individual PV frontend instances so that they can allocate and free entries without contention on a single lock.
The last provider is XENVIF…
The VIF interface is actually the frontend of the PV network protocol. So, it can be used to send and receive network packets, set multicast filters etc. All the stuff that a network driver needs to do. Doing this work in a ‘class’ driver like XENVIF makes life a lot easier because it is not subject to the restrictions of the NDIS miniport wrapper as XENNET is. This makes XENNET a very small and relatively simple driver and that had proven to be a very good things when it comes to Windows logo testing.
Onto building the drivers…
Each repository contains a BUILD.md file (md for markdown) which should tell you what you need to know but it’s basically identical for all drivers so here’s what it says…
You’re going to need a Windows instance (obviously) and in that you’ll need to install Visual Studio (2012 or 2013). If you’re using 2012 though, you’ll need the paid-for edition. Also you’ll need the relevant Windows Developer Kit, or WDK – The Windows 8 one it you’re using VS 2012, or the 8.1 one if you’re using 2013. Lastly you’ll also need a copy of Python 3.x as all the build scripts are written in Python.
There’s also a couple of environment variables to set:
KIT tells the scripts where the WDK is
SYMBOL_SERVER tells the scripts where you want the PDBs (debug symbols) generated by the build to be stored. You’ll need these if you use WinDBG.
Now you’re in a position to build the driver so, at a shell prompt, navigate to the repository and execute build.py specifying whether you want a free (non-debug) or checked (debug) build and whether you want to skip the Static Driver Verifier post-build phase. (It’s probably best to do that while you’re actively developing new code as it can be really slow).
However, if you’re not actually modifying the code and don’t mind using the bleeding edge source then you can actually skip all of the above and download the latest greatest builds from the XenProject website Beware that those builds are just development builds though; the code is not extensively tested before the builds are posted.
As BUILD.md is to building drivers, INSTALL.md is to installing drivers… There’s one in each repository and it tells you what you need to know.
Again the information is largely common and it boils down to copying the driver package onto your target VM, choosing the x86 or x64 variant as appropriate and then running dpinst.exe.
You can use device manager if you prefer:
- right click on the device described in INSTALL.md
- select ‘Update Driver Software…’
- select the ‘browse’ option and then point it at wherever you put your driver package
The scripts will sign the driver with a test certificate – each repository has its own, in the src directory – so on 64-bit Windows you’ll need to enable test-signing. Also, if you want to avoid the big scary warning when you install the driver then you should copy the pfx file onto the target system and pre-install the certificate in the ‘trusted root’ store. Installation will ask you for the pfx password; it’s blank for all of them so just hit enter.
Ok, so you’ve built and installed the drivers, so how about making a contribution?
Bug-fixes and new features are very welcome and, being part of the XenProject, we use the Xen workflow (see the general guidance URL) so check the MAINTAINERS file in the repository you’re modifying for who to Cc on your patches and send them to the dedicated list firstname.lastname@example.org and someone should get back to you (promptly, I’d hope).
Also, the win-pv-devel list is not just for posting patches. It’s for discussion too – although try to keep it development related
Now onto the Q & A