Remus provides transparent high availability to ordinary virtual machines running on Xen. It does this by continually live migrating a copy of a running VM to a backup server, which automatically activates if the primary server fails. Key features:
- The backup VM is an exact copy of the primary VM (disk/memory/network). When failure occurs, the VM continues running on the backup host as if failure had never occurred.
- The backup is completely up-to-date: even active TCP sessions are maintained without interruption.
- Protection is transparent: existing guests can be protected without modifying them in any way.
Host (dom0) requirements
- Xen hypervisor with remus support and tools (included with Xen 4.0+)
- Note: Remus is not included with XCP, XenServer, or with some of the Linux pre-packaged versions of Xen, so please check your distribution or you may need to build Xen from source
- Xen dom0 kernel that meets the Remus dom0 requirements
- Shared storage is not required
- DRBD shared storage is supported, allowing faster and automatic re synchronization after a failed host is brought back online
- Otherwise to bring a failed node back online, the VM must be turned off
Guest (domU) requirements
- Xen PV guests that meet the Remus PV domU requirements
- Xen HVM guests don't require any changes for Remus
Installation varies slightly depending upon the host platform, so please see the guides below for examples.
Using DRBD instead of blktap2 for storage replication allows for quick resynchronization of the disk backend after failed host is back online. Since storage (re)synchronization is done online - while the VM is operational, there is no need to shutdown the VM. Once storage is synchronized, one can start, stop and restart Remus on a running VM anytime.
However, DRBD must be custom built with support for protocol D (see the above install guides), so the normal packaged versions of DRBD are not suitable.
- Remus PV domU guest support works for most pvops kernels, but you may see this warning when you start Remus "WARNING: suspend event channel unavailable, falling back to slow xenstore signalling". This means the kernel in the guest doesn't have "suspend event channel" support, which in turn basically Remus is not going to work, but isn't going to perform well.
In Xen 4.0.0:
- Xen hypervisor and tools have Remus support.
- Only linux-2.6.18-xen is supported as Xen dom0 kernel with Remus.
- If using a PV domU you need to run linux-2.6.18-xen as domU kernel.
In Xen 4.0.1:
- Pvops dom0 kernel support for Remus has been added in Xen 4.0.1-rc4, so it's available in Xen 4.0.1 final release. You can use Linux 2.6.32 based pvops dom0 kernel with Remus.
- PV domU kernel still needs to be linux-2.6.18-xen.
In Xen 4.2:
- Many bugfixes to Remus.
- Remus support for pvops domU kernels: Linux 220.127.116.11 and later upstream kernel.org versions are now supported as PV domU kernels, in addition to Jeremy's xen.git xen/stable-2.6.32.x branch.
- For better Remus performance you should use a domU kernel with "suspend event channel" support, which means linux-2.6.18-xen, or any of the xenlinux forwardports (novell sles11sp1 2.6.32 kernel, for example). pvops domU kernels don't have suspend event channel support yet.
- Checkpoint compression for less data to transfer between hosts.
Note that if using linux-2.6.18-xen kernel it needs to be new enough to include Remus support/patches! It's recommended to download the latest version from linux-2.6.18-xen.hg mercurial repository for use with Remus.
- The Remus project web site is http://nss.cs.ubc.ca/remus/
- Remus documentation: http://nss.cs.ubc.ca/remus/doc.html
- Research/design paper about Remus: http://nss.cs.ubc.ca/remus/papers/remus-nsdi08.pdf .
- Configuring and installing Remus tutorial: http://remusha.wikidot.com/
- Latest linux-2.6.18-xen kernel is available from Mercurial tree at http://xenbits.xen.org/linux-2.6.18-xen.hg .
- Remus version of DRBD git://aramis.nss.cs.ubc.ca/drbd-8.3-remus