Friday, 21 March 2014

Why I dislike Xen

I have a hate/hate relationship for Xen. All forms of virtualisation
*except* Xen, emulate a CPU in a way where it is extremely difficult to
tell the difference. Xen replaces certain instructions with an API
and a portal to the outside world. If you look at such instructions
as CLI and STI - classic 8080 instructions (yes, all the way back to the
original 8-bit cpus), they work in a well understood way.

Fast forward to Xen. Those instructions dont exist - nor any other
instruction which can affect the virtual state of the CPU. Instead,
the kernels are compiled to call a function via the hypercall portal.

Now, if you do the wrong thing - you are on your own. You wont know what you
did wrong. It is so easy to crash the VM *host* if you are not careful.

So, lets look at dtrace. Dtrace appears to work well. But if you are
on Xen, as of this writing, the results are erratic - after a few
probes, you will crash the process, or the kernel, or the VM.

Heres an example of a process crash:

[ 2390.769639] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[ 2390.769640] IP: [<ffffffffa01cd193>] cpu_core_exec+0x133/0xfffffffffffbcfa0 [dtracedrv]
[ 2390.769641] PGD 17319067 PUD 172b9067 PMD 0
[ 2390.772090] Oops: 0002 [#1] SMP
[ 2390.772090] CPU 1
[ 2390.772090] Modules linked in: dtracedrv(PO) nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc psmouse microcode serio_raw i2c_piix4 e1000 [last unloaded: dtracedrv]
[ 2390.772090]
[ 2390.772090] Pid: 3057, comm: bash Tainted: P W O 3.4.0 #1 innotek GmbH VirtualBox/VirtualBox
[ 2390.772090] RIP: e030:[<ffffffffa01cd193>] [<ffffffffa01cd193>] cpu_core_exec+0x133/0xfffffffffffbcfa0 [dtracedrv]
[ 2390.772090] RSP: e02b:ffff8800173d5f78 EFLAGS: 00010193

When a dtrace probe fires (because we replaced the instruction with a breakpoint),
we have to single step the replaced instruction. What is
happening above is that the single step is not honored.
The EFLAGS register shows bit 0x100 is set - a single step is in
effect. But Xen has decided it isnt. So the instruction
at the buffer location continues after the instruction. Heres
the disassembly:

(gdb) x/i 0xffffffffa01cd190
0xffffffffa01cd190: push %rbp
0xffffffffa01cd191: nop
0xffffffffa01cd192: nop <=== shouldnt get here
0xffffffffa01cd193: add %al,(%rax) <=== but we died here

Such sweet joy to figure out what "rule" I broke. I had
reported in a prior blog entry that the Xen documentation
is very poor.

More than likely I am setting the trap flag by bit-mask operation
on the flags and not by telling Xen what we did.

Off to find the code which is stopping Ubuntu 12.04 from working

