Transparent high availability ("Fault Tolerance") for Xen VMs
Remus provides transparent, comprehensive high availability to ordinary virtual machines running on the Xen virtual machine monitor. It does this by maintaining a completely up-to-date copy of a running VM on a backup server, which automatically activates if the primary server fails. Key features:
- The backup VM is an exact copy of the primary VM. When failure happens, it continues running on the backup host as if failure had never occurred.
- The backup is completely up-to-date. Even active TCP sessions are maintained without interruption.
- Protection is transparent. Existing guests can be protected without modifying them in any way.
Remus requires the following components:
- Xen hypervisor and tools with support for Remus.
- Xen dom0 kernel with support for Remus.
- PV (Paravirtual) guests need to have Remus support in the domU kernel.
- Xen HVM guests don't require special changes for Remus. No kernel patches needed in the guest.
- For initial testing: Shared storage accessible by both dom0s.
- Actual Remus usage doesn't require or support shared storage!
- DRBD storage backend is supported, allowing faster and automatic resynchronization after failed host is back online.
In Xen 4.0.0:
- Xen hypervisor and tools have Remus support.
- Only linux-2.6.18-xen is supported as Xen dom0 kernel with Remus.
- If using a PV domU you need to run linux-2.6.18-xen as domU kernel.
In Xen 4.0.1:
- Pvops dom0 kernel support for Remus has been added in Xen 4.0.1-rc4, so it's available in Xen 4.0.1 final release. You can use Linux 2.6.32 based pvops dom0 kernel with Remus.
- PV domU kernel still needs to be linux-2.6.18-xen.
In Xen 4.2:
- Many bugfixes to Remus.
- Remus support for pvops domU kernels: Linux 18.104.22.168 and later upstream kernel.org versions are now supported as PV domU kernels, in addition to Jeremy's xen.git xen/stable-2.6.32.x branch.
- For better Remus performance you should use a domU kernel with "suspend event channel" support, which means linux-2.6.18-xen, or any of the xenlinux forwardports (novell sles11sp1 2.6.32 kernel, for example). pvops domU kernels don't have suspend event channel support yet.
- Checkpoint compression for less data to transfer between hosts.
Note that if using linux-2.6.18-xen kernel it needs to be new enough to include Remus support/patches! It's recommended to download the latest version from linux-2.6.18-xen.hg mercurial repository for use with Remus.
Config options for Linux 2.6.32 pvops dom0 kernel for Remus
Here are the .config options you need to enable for Remus, when using jeremy's xen.git xen/stable-2.6.32.x branch as dom0 kernel.
CONFIG_IFB=m CONFIG_IP_NF_IPTABLES=m CONFIG_IP_NF_FILTER=m CONFIG_NET_SCHED=y CONFIG_NET_SCH_PRIO=m CONFIG_NET_SCH_INGRESS=m CONFIG_NET_SCH_PLUG=m CONFIG_NET_CLS=y CONFIG_NET_CLS_BASIC=m CONFIG_NET_CLS_TCINDEX=m CONFIG_NET_CLS_U32=m CONFIG_NET_CLS_ACT=y CONFIG_NET_ACT_MIRRED=m
Using Linux kernel v3.x as dom0 kernel for Remus Xen hosts
Upstream Linux v3.x kernel contains Xen pvops dom0 support, but it does not contain "sch_plug" driver which is required for Remus. It's possible to manually add that driver to your custom upstream Linux v3.x kernel build. "sch_plug" driver is available for example from: http://pasik.reaktio.net/xen/remus/linux3x/ .
DRBD storage backend support for Remus
A Remus version of DRBD can be found from git://aramis.nss.cs.ubc.ca/drbd-8.3-remus . Using DRBD instead of blktap2 for storage replication allows for quick resynchronization of the disk backend after failed host is back online. Since storage (re)synchronization is done online - while the VM is operational, there is no need to shutdown the VM. Once storage is synchronized, one can start, stop and restart Remus on a running VM anytime.
- The Remus project web site is http://nss.cs.ubc.ca/remus/
- Remus documentation: http://nss.cs.ubc.ca/remus/doc.html
- Research/design paper about Remus: http://nss.cs.ubc.ca/remus/papers/remus-nsdi08.pdf .
- Configuring and installing Remus tutorial: http://remusha.wikidot.com/
- Latest linux-2.6.18-xen kernel is available from Mercurial tree at http://xenbits.xen.org/linux-2.6.18-xen.hg .