XenFS
The XenFS project is a high-performance alternative to NFS for sharing files within a single Xen host. It is currently under development by Mark Williamson (homepage http://www.cl.cam.ac.uk/~maw48/) at the University of Cambridge.
The public Mercurial (http://www.selenic.com/mercurial/) repository for XenFS development is at http://xenbits.xensource.com/maw/xenfs.hg. This contains only the stuff I've committed, so expect it to lag what I'm working on for the foreseeable future.
Planned Features
- Super fast coherent filesystem sharing between Xen domains.
- Shared buffer cache functionality for improved sharing performance and reduced memory footprint.
- Application level interdomain memory sharing using standard mmap API.
- Copy-on-write filesystem functionality enabling multiple domains to share a common base filesystem. This will be supported by CoW mechanisms at the memory level.
The features are being implemented in roughly this priority order and beta a version of the code should be available within a few months. The first 3 features are quite strongly linked and should appear at about the same time. Copy on write functionality will then be implemented on top.
For more information, you might like to take a look at:
my 1st year report, for what's been done - http://www.cl.cam.ac.uk/~maw48/report_final.pdf
my thesis proposal, for what I'm planning - http://www.cl.cam.ac.uk/~maw48/proposal_final.pdf
my 2nd year report, for a progress update and plans for the future. This is a bit less technical, and more an administrative document: http://www.cl.cam.ac.uk/~maw48/2nd_year_report.pdf
Current Status
29/03/07
Gosh yes, I'm really still alive. I've been somewhat distracted by other stuff, other work, and confusion but I'm still here and working on stuff. I can transfer sizeable files through XenFS without corruption and without anything blowing up. There is a lot of ground still to cover, but this is something of a positive milestone, since I'm now able to say I can transfer realistically sized data... I'm going to take a look at the performance, which ought to be able to be pretty good...
9/11/06
The exciting code cleaning extravaganza continues. I've now converted pretty much all of the XenFS code to follow the Linux CodingStyle guidelines reasonably closely (in terms of indenting, bracing, use of structs, etc). This should make it a bit more consistant, and easier to read and work with. Previously it had been a mix of the Xen coding style originally used for drivers and a few interesting formatting mishaps perpetrated by myself and emacs.
Expect checkins of actual new code again in the near future! I'd quite like to look at getting large files to transfer through the filesystem interface. Untarring a Linux kernel from XenFS (and eventually, *onto* XenFS) seems like a worthy goal, so lets see what happens.
7/11/06
I've been doing loads of code cleaning in the past few days, converting more stuff to Linux coding style, moving over to pr_debug for most of my tracing, handling more error cases in more correct ways. I've also disabled (by default) all of the verbose printing, so it's now possible to use the filesystem without it spewing debug output everywhere. It's almost like being there! (TM)
I really need to improve the reporting of errors from the backend to the frontend (e.g. why things failed, etc) so that more meaningful errors can be returned to the user. I might do this soon.
Looking further ahead, I need to get some of the more advanced functionality working - I'll need to start looking at getting mmap working. I anticipate this being difficult (will wrap a wet towel around head just in case) but it should be rather rewarding, as it'll make interdomain shared memory work, and allow executable text to be shared directly. That would be spiffy.
Now it's almost 6am, which is even late for me - time for bed. Stay tuned for the next thrilling installment.
3/11/06
I now support removing extended attributes, and have put in the appropriate server-side error paths for extended attributes. I've also done a load of clean ups and finally converted the two main.c files wholesale to the Linux coding style. I think I'm going to go on another springclean and try and weed out a load of the compile warnings. The frontend could probably use better error handling so I'll take a look at that too.
I've also been doing some tests into passing large files through XenFS. I *think* I managed to get a 17Meg file through successfully, but I'm not sure why it failed just yet - I'm messing up the memory management code in the server kernel somewhere.
1/11/06
Extended attributes are now supported if the server filesystem supports them. The xattr code is a bit grim as I haven't put in most of the error paths yet
however it does appear to work. Xattr support is necessary for some newer apps (Beagle being a big example) and for SELinux. For xattr-heavy applications it might be nice to have a means of fetching xattrs in bulk - and possibly a means of prefetching all or a selection of xattrs when they might be needed.
Things seem to be going quite nicely, but there are plenty of niggling bugs and important features that I still need to see to. I'll keep cranking away...
20/10/06
I've implemented a couple of tweaks to improve inotify support at the backend. It should be quite feasible to monitor guest filesystem activity using inotify in the server domain's userspace. The caveat is that we currently need the frontend to inform us which files we're writing to, which is a pain. It might be nicer if we could somehow infer this and generate the correct notification when the writepage is pushed through. Nevertheless, it's a start. We can, in any case, monitor opens and suchlike to infer when changes might have been made.
The inotify support is not fully tested, so YMMV. It may not generate all the appropriate events yet
I've also started looking into implementing a versioning filesystem. I quite can straightforwardly backup the XenFS filesystem at the server side using something like rdiff-backup (stores backups in an efficient format using compressed diffs, handles links, permissions, etc). I intend to add the ability for the guest to request a snapshot of its filesystem state. From the POV of the domU's owner this will create a tamperproof, named backup of the filesystem state that can be accessed in future should the need arise. The administrator at the server can trivially add routine scheduled backups to this functionality just by taking extra snapshots.
19/10/06
It's the small hours of the start of 19th October - (almost) a whole day's work still to come. I just managed to debug write support for multi-page files. I can now fire up emacs, write a helloworld.c, save it, compile it - all on XenFS. The resulting binary needs to be copied off of XenFS to run because I don't support mmap (and equivalents) yet. Still it feels good to be able to do something resembling the real world. I also checked the binary appeared correctly on the server - seems to execute fine - although the executable bit doesn't always get set for some reason. Still, rather spiffy overall.
Even without binary execution / mmap support (both intended to appear at some point not terribly far away from now!) plain write support enables (for instance) a revision controlled / snapshottable filesystem to be implemented fairly trivially. This is something I will also be looking into.
A few days ago I went through and added a whole load of extra error handling code serverside, so things are looking much better there. I know there are still some potential error cases unhandled. In the future, a "crashme" equivalent for XenFS server stress testing would definitely useful. Of course, that's not so important whilst there are still plenty of bugs I actually know about myself
I also checked in some minor fixes for spurious warnings during today, just minor headers tweaks.
12/10/06
Things are going at a reasonable rate again now, I've been back from holiday for a bit and have been working on implementing write support. Basic write support actually seems to work, but there's still plenty of a long way to go. I need to robustify and stabilise a lot of the code, and track down some of the causes of existing bugs.
7/09/06
Long time since I've written a status report here, but the code is still in progress. I've added a whole load of metadata operations. I just checked in support for running the server in a domU - to my suprise this worked with only changes to userspace. If I'd known it'd be that easy then I would have done it ages ago
Should make development much less tedious by avoiding my having to reboot dom0 all the time.
24/04/06
The XenFS on Xenbits repository has been recreated. It shouldn't affect anyone, but if something funny happens let me know. (I did this at the same time as updating my local repository to use Hg's RevlogNG, with index inlining. I originally intended to change the contents of the repository, but then didn't. But I did recreate the public one, even though there wasn't any point
I'll be spending the next while trying to get XenFS up and running with Xen-unstable again, so that I can figure out more of how the memory sharing should work (alongside DCSSBLK).
28/03/06
DCSSBLK sharing between multiple domains now works properly. Just add the dcss segment config to multiple domain config files, and access in the usual way from within the domain.
22/03/06
DCSSBLK just successfully attached to one of my emulated dcss segments, formatted it with an ext2 filesystem and allowed me to mount it. I can save files into it, and run shell scripts, etc. There's still plenty of work to be done, but things should be much easier now the basic DCSS implementation is done.
07/03/06
Yes, I'm still alive!!! At the moment, I'm on a slight tangent, implementing an emulation of IBM S390 DCSS segments for Xen. This will allow us to use an unmodified S390 DCSSBLK shared Ramdisk driver. This is likely to be useful to some customers, but eventually it'd be nice to see a comparison of DCSSBLK vs XenFS. The latter should provide more flexibility in the longterm.
The infrastructure for the two projects is shared in important places: the xenstore service advertisement protocol, and (more crucially) the likely extensions to the grant tables API will be needed by both. I intend to develop them for the (simpler) DCSSBLK project before leveraging them in XenFS.
09/11/05
Have had various useful discussions regarding how things should work. In particular, we've been looking at exactly what the XenFS setup protocol should look like and how to (ab)use xenstore for it (thanks to Ewan Mellor for comments on this) and how XenFS might work in fully translated guests (i.e. translated shadow mode, external shadow mode...) (thanks to Michael Vrable for making me think about this some more ;-)). One of the key principles of XenFS is to avoid hiding the underlying memory abstraction from the guest; if that is being done *anyhow* I'd like to investigate offloading CoW semantics to Xen rather than relying on the CTW fault.
Side node: recent problems cloning the XenFS repository on Xenbits have been fixed. It was due primarily to a resource limits issue. Let me know if you have any future problems.
03/11/05
Have spend quite a lot of time running around writing reports, attending conferences, etc. Am still looking at using Xenbus / Xenstore for XenFS. Annoyingly it seems to be a relatively bad fit, and I'm missing the control message API. I'm trying to decide how best to implement ad-hoc mount requests using it. They may have to wait... Right now I'm having issues getting mainline Xen to run properly for me, so that needs fixing first! Oh Xen, why do you torture me so?
23/08/05
I've got XenFS working on a recent tip from unstable. I've included just enough functionality in the Xend code to get the filesystem mounted. Full functionality in the control plane will now only be implemented via XenBus (which I'm implementing support for right now). I've checked in the current prototype code for the filesystem, server, tools, etc and it should be in the xenbits repository.
07/08/05
As promised, more frequent updates now
I've pushed a couple of infrastructure changes to the kernel grant tables API and the balloon driver, to make it easier for XenFS to work with foreign pages, page transfers, etc. I'm in the process of updating the Xend code to use the new threaded model. This is just a "proof of concept" to show that everything still works. Once I've committed all the code, I'll be converting over to XenStore before they can deprecate the control message API from underneath me
Stay tuned...
04/08/05
I've been a bit lax about updating this page. There has been progress in the intervening time but I've also been working on other things, which have slowed me down a bit. Directory read performance is now dramatically better since I implemented metadata caching. I'm currently in the process of moving over to Mercurial (http://www.selenic.com/mercurial/) for development (my old tree is still in BK) and pulling myself up to the most recent builds of the unstable tree. Once this is done, my XenFS commits will be mirrored nightly at http://xenbits.xensource.com/maw/xenfs.hg.
04/05/05
OK, everything (save the device channel memory page) is now using grant tables for mapping and working happily. I've also robustified somewhat, including fixing a latent bug in grant tables that I was getting bitten by. I'm now integrating directory reads with the Linux page cache - this should provide a very large speedup for navigating directory trees. Doing this will take a day or two, then it'll probably be time to think about extending the grant tables mechanism. - Mark
03/05/05
My implementation of interdomain filehandles is now working. This removes a lot of ugly cases where full paths had to be generated. Along with grant-tracking and inode translation (initial implementations of which are already in place) this is one of the 3 pillars on which I'll build the rest of the filesystem. I've also done various code cleanups. Before bedtime tonight I'll hopefully move all memory mapping across to grant-tables. - Mark
28/04/05
I now have a reasonably good idea how XenFS dentry handles are going to look, will start implementation shortly. Infrastructure for virtualising the inode number space is now in place, with inode translation working. For early development I'd exposed server inode numbers directly but this made it hard to protect the server's VFS layer. The combined effect of inode translation and interdomain dentry handles will yield more robust and efficient sharing. - Mark
26/04/05
Prototype code for maintaining file and metadata consistency is under development. In support of this, new code for tracking XenFS grants is now in place in both the server and clients - this will be useful later for the shared buffer cache. Device nodes, fifos and symlinks are now supported in XenFS shares. I've begun implementing support for metadata writes and I'm thinking about introducing a dentry abstraction. - Mark
13/04/05
Initial implementation is progressing well. It has been possible for some time to mount XenFS and list directories, cat files, etc. I am now implementing the coherent shared buffer cache, which will give XenFS much of its performance and flexibility. - Mark
