Xen 
 
Home Products Support Community News
 
   

(Users may want to jump down to the HowTo section)

Overview

dm-userspace provides for userspace control over device-mapper. It does this by sending messages to a userspace application when block requests are made, allowing userspace to respond with a destination device and sector location for the request to be satisfied.

When used with the "cowd" daemon, the two provide a method for presenting a generic block device interface to a complex disk format, such as qcow. Further, reads and writes are trapped separately, allowing for Copy-on-Write behavior, which is highly useful in a virtualized environment.

Justification

Xen needs a mechanism for effective utilization of CoW system images, as well as native access to complex disk formats made popular by other virtualization technologies such as QEMU (qcow) and VMware (vmdk).

Use Cases

  • Utilization of existing images (such as qcow and vmdk)
  • Providing a block device interface to Linux and Xen with CoW semantics (possibly with a xen-specific, or xen-friendly CoW format)
  • Migration of images from one format to another
  • Other block-related tasks such as debugging, testing, etc

Implementation

Below I describe the components required for a working system. The first (device-mapper) is in the mainline kernel tree, but the others are new pieces provided by this project.

device-mapper

NB: This is just an overview for those who don't know much about it. Skip this section if you already know about device-mapper

Currently, device-mapper provides a framework for presenting pseudo block devices that consist of a range of sectors mapped to a number (one or more) targets. When a block request is made against the pseudo-device, the target assigned to the sector range of the request is determined, and responsibility for remapping the request is transferred. The target then adjusts the destination block device and sector offset of the request appropriately.

For example, the "dm-linear" target simply rewrites the target block device to a static value (such as /dev/hda1), and rewrites the sector offset to some place determined by a static offset in that device. This is how LVM works; it redirects all requests for, say /dev/vols/myLogVol, to some region of a device, such as /dev/hda1. So, if /dev/hda1 is 4G and you have a 1G logical volume, dm-linear may be configured to redirect everything from 0-1G in the pseudo-device to the 2-3G range in /dev/hda1.

dm-userspace

The purpose of dm-userspace is very similar to that of dm-linear (as described above), with two important differences:

  1. The destination device and offset is dynamic
  2. The values of the destination device and offset are determined by userspace as needed

To reduce communication and simplify the internal data structures, the pseudo-device is divided into logical blocks, which are multiples of a single sector, as passed on the command line. All requests to a single logical block are handled together. The logical block size is the smallest unit that can be handled by userspace.

When a request comes in, a message is generated containing information about the block accessed and the type of request (read or write), and is placed on queue. Periodically, userspace reads a batch of these messages from a character control device. It then makes some determination about where each request should be sent (details below), and then writes the responses back to the control device. The responses contain information about whether the map should be:

  1. Where the request should be sent (block device and sector offset)
  2. Whether the mapping should be for reading only, or read/write
  3. Whether or not the block should be copied to the destination device from another device before flushing the requests
  4. Whether or not this mapping should be remembered after the pending requests are satisfied

Requests that come in while waiting for userspace to respond, while waiting to copy a block copy, or while flushing pending requests are queued and flushed in-order. If the mapping is to be remembered, then future requests of the same type (read or write) are remapped and flushed without contacting userspace. Further requests of a different type (a write request to a read-only mapping, for example) are "faulted" back to userspace as new requests.

cowd

The userspace component of a working CoW system is provided by the cowd daemon. It is just one example of a userspace application that can take advantage of dm-userspace. The goals of cowd are:

  1. Provide a userspace daemon for enabling CoW behavior with dm-userspace

  2. Be as format-independent as possible
  3. Provide a generic plugin loader to perform format-dependent tasks

cowd handles the logic of polling the control device for dm-userspace, reading batches of messages, calling on a plugin to perform mappings, etc.

qcow plugin

The qcow cowd plugin provides qcow format support

dscow plugin

The dscow cowd plugin provides an example CoW format that (should be) high performance and extremely xen and dm-userspace friendly. It uses a sparse file to represent the virtual disk, keeping remapped blocks in the same position as the original block, with the exception of a constant linear shift to make room at the beginning of the disk for metadata. Large block sizes are supported for latency and communication reduction, which is important in a xen environment.

Test Plan

The main dm-userspace tree contains several tests for verifying the stability and correctness of the components. Existing tests such as fsstress, bonnie, and dbench are used, and wrappers for running them against dm-userspace are provided in the tests/ subdirectory.

Future Thoughts

  • (add stuff here)

Current Status

  • The dscow plugin seems to be pretty stable and error-free
  • The qcow plugin needs work (do not depend on it)
  • cowd is stable
  • dm-userspace is stable
  • Everything needs lots of polish, documentation, etc.

How to get it

Option 1: Patch Xen

Probably the easiest way for Xen users to try dm-userspace is to apply the latest patch set to their xen tree. The latest patches (as of this writing) are available here:

http://lists.xensource.com/archives/html/xen-devel/2006-08/msg01370.html http://lists.xensource.com/archives/html/xen-devel/2006-08/msg01372.html http://lists.xensource.com/archives/html/xen-devel/2006-08/msg01371.html http://lists.xensource.com/archives/html/xen-devel/2006-08/msg01373.html http://lists.xensource.com/archives/html/xen-devel/2006-08/msg01374.html http://lists.xensource.com/archives/html/xen-devel/2006-08/msg01375.html

Save out the patches and apply to your Xen tree like this:

% patch -p1 < patch1.patch
 ...

Then rebuild and install Xen:

% make world && make install

Option 2: Get the latest source

Clone the dm-userspace repository:

% hg clone http://static.danplanet.com/hg/dm-userspace

How to build it

Build and insert the kernel module:

% cd dm-userspace.hg/module
% make
% sudo insmod ./dm-userspace.ko

Build and install cowd:

% cd ../tools/cowd
% ./autogen && ./configure --enable-internal-dmu
% make
% sudo make install

How to use it

If you have patched your xen, all you need to do is change a xen configuration file to use it. For example, if you have a disk entry in a config file like this:

disk = [ "phy:/dev/vols/Fedora4,hda1,w" ]

You can change it to this:

disk = [ "dmu:dscow:/tmp/fedora4.dscow:/dev/vols/Fedora4,hda1,w" ]

When the domain starts, all changes made to the disk will be stored in /tmp/fedora4.dscow. The base image /dev/vols/Fedora4 will not be touched.

If you don't have a patched Xen, or want to use it manually, you can start cowd yourself.

Given a base image of /tmp/base.img, create a dscow image with 64k blocks:

% dscow_tool -c /tmp/mydom.dscow /tmp/base.img
Size:         537919488
First block:  65536
Block size:   65536
Blocks:       8208
Bitmap count: 257

Start cowd to create a pseudo device called mydev:

% cowd -p dscow mydev /tmp/mydom.dscow

This will create a /dev/mapper/mydev device, which can then be used as a a normal block device with CoW semantics. The writes will not propagate to /tmp/base.img, but will instead be saved in /tmp/mydom.dscow.

DmUserspace (last edited 2006-08-29 16:34:02 by DanSmith)