Since using VMWare and the gdb debugger to reliably debug this issue,
I have an update.
It turns out that as soon as the page-fault interrupt handler is enabled,
when we take the first page fault interrupt, we pass it over to the kernel
default handler. On return from the kernel, the page where the
dtrace handler is located is no longer mapped in the page tables
(for some reason).
On the next interrupt, the CPU jumps to a non-existant page, resulting
in a nested page fault interrupt. This continues for a few thousand
iterations, until the kernel stack blasts through something, leading to
a double fault.
Interesting that the kernel stack contains thousands of copies
of the same data (pushing of the page fault code, CS:IP and flags
gdb under VMWare player lets me set hardware breakpoints, so I can
single step the kernel page fault handler. I've just had my first
attempt, but unfortunately, maybe because of how long I took, the
page fault handler decided to reschedule a different process to run,
so I lost control.
Its truly great single stepping whilst gdb is showing me the line of
code we are on.
Lets see if I can to what caused the mapping to disappear.