Friday, 18 November 2011

Dtrace and the NMI Interrupt

Nigel has helped greatly in moving us forward on Dtrace. The latest
release works well inside a VM, but alas, might not work on real hardware.

I ran some tests on Ubuntu 11.04 on real hardware and dtrace was
rock solid (well, it survived 500m+ probes and running the test
suite twice over).

But, if the real hardware is generating NMI interrupts then we are toast.
Ubuntu doesnt do this by default. Maybe Fedora does, or its a function
of the hardware and cpu.

What I found is that if I loaded the oprofile package (which uses NMI
interrupts to feed the profiler), then the host will reboot if dtrace
is loaded or if its invoked.

The reason is more than likely that within the NMI handler of the kernel,
if we place a dtrace probe, then we will trigger a breakpoint trap
from inside the NMI handler. I dont believe this is valid or meaningful
(nothing should interrupt an NMI - it should only be used for small
lightweight and contextless operations, such as watchdogs).

So, we have a problem because we dont know the call graph of an NMI
interrupt, so we dont know what is safe to probe (even if we did know,
chances are high that common/useful probably routines would have to be
excluded).

I will experiment with turning off the NMI whilst dtrace is loaded.
Thats a very unfair thing to do (disabling oprofile, or other real
hardware events which need NMI), but at least we would be safe.

I am going to research what Solaris does for NMI ints. Maybe
that will educate me to the problem.

Post created by CRiSP v10.0.17a-b6103


No comments:

Post a Comment