Sunday, 29 January 2012


Having "understood" my nested page fault issues, I have been trying
to finalise the code changes. However, any attempt to do so leads me
to a lot of pain.

$ dtrace -n fbt::page_fault:

is a dangerous thing to do - we intercept the page fault handler.
But the page fault handler can be called if a D script tries
to access an unmapped page. We could deter users from putting a probe
on page_fault, but that seems a real shame - thats a very useful
and interesting function to probe.

This works brilliantly on x86/64 systems but fails abysmally on i386.
Having chased the problem down to the issue of kernel page tables
and user process page tables disagreeing about what is "visible", and
the way the kernel does "lazy page table population", its very difficult
to stop a page fault, for instance, in the breakpoint handler.

(We hit the page_fault function, which generates a breakpoint trap to
execute the probe, but whilst processing the breakpoint trap, we induce
a page_fault trap: BOOM!)

I've experimented with various mechanisms to avoid these lazy page
faults. Theres a function in the kernel: vmalloc_sync_all() which
ensures all page tables are in sync with the kernel - so that minor
page faults cannot happen. If I ensure this is called during the driver
load, then the problem of a nested page fault appears to go away.

(This is a better job than the code I wrote which does something similar
but only for specific locations in dtrace itself; vmalloc_sync_all is
a generalised function to sync all page tables of all processes).

So, I will need to recode and remove the cruft from my work-in-progress

(I am trying to track down if vmalloc_sync_all is called by the x86/64
kernel - but not the i386 one; it would certainly explain why I see such
a difference in behavior when tracing the page_fault code).

More later this week if I can successfully resolve this issue, once and
for all.

Post created by CRiSP v10.0.23a-b6159

No comments:

Post a Comment