I think I found the Xen + DTrace culprit. I have a couple
of routines in dtrace which invoke the CLI and STI instructions.
They are a bit naughty, and designed to shield dtrace from the complex
number of enable/disable interrupt primitives in the kernel.
(The kernel has multiple layers of interrupt functions and macros -
to try and hide the implementation details of each cpu, along with
preemption and scheduling functions).
I spent a lot of time thinking about DTrace and Xen - what was
I doing wrong. I spent a lot of time not thinking about DTrace and Xen.
As much as I wanted and needed help, to find this horrible bug, I couldnt
even phrase the right question.
The evidence was clear: occasional single step traps would not properly
single step - it was as if the single-step trap flag was not set
in the FLAGS register. I tried molesting the code in the interrupt
routines - checking very careful the assembler code.
The Linux kernel has two sets of interrupts for many of the core
traps (int1, int3 and others). Which one is used depends on whether
Xen is active or not. When Xen is active, there are two extra registers
on the interrupt stack.
After much playing with the dtrace code and the linkage from dtrace to
the underlying kernel...I started to investigate something else.
I noticed the dtrace dumps were caused at the point we switched from
one cpu to another. My VirtualBox VM had 4 cpus. So I set it to 1 cpu.
DTrace worked remarkably better - but soon enough, we still core dumped
$ dtrace -n fbt::sys_chdir:
in one window and
$ while true
in the other. The fact that we had a single-step trap, which was not
single stepping is really curious. So I started grepping for CLI/STI
in the driver code, and came across my old friends dtrace_enable_interrupts()
These functions use assembler to execute the CLI/STI instructions.
Now, what if we had done the wrong thing? Xen wouldnt really know
(except by accident) that we had executed these instructions and we
may be running without correct disabling interrupts. Xen is a hypervisor
and if it doesnt know whats going on, all bets are off.
So I replaced the CLI/STI with arch_local_irq_restore and
arch_local_irq_save, and it works! I had gotten away with this for
a small duration and this is the Xen issue (I hope its the only
issue). Whilst writing this blog, my test case ran with no problems.
Now I can formalise the change and push out a new release.
I still dont like Xen - I would rather have my VM die and tell me
that something is wrong immediately, not lie through its teeth that
the guest code is working when it isnt.