Tuesday 19 April 2011

smp_call_function and dtrace_sync

I've been reading Adam Leventhals (old) blog entry on the
IPI mechanism (inter-process interrupt) and how some of the
code in dtrace works in order to try and debug the Linux problem,
whereby the timer interrupt can invoke a call to dtrace_sync and hence
dtrace_xcall. dtrace_xcall in turn invokes smp_call_function() (and
its friends), but Linux explicitly disallows this behavior - calling
the function from an interrupt or whilst interrupts are disabled.

Linux's implementation seems ok. But it is at right angles to the Solaris
implementation. Solaris seems to have a higher level of semantics in
this area, allowing interrupt code to invoke inter-cpu synchronisations.

I have a question - and if anyone (Adam?) knows the answer, feel
free to educate me. Without an answer, I may have to wrap the
Linux APIC interrupt handlers with code that resembles Solaris.

What does dtrace_sync() *actually do*? In reading the code, it is trying
to ensure other cpus are synchronised in terms of the probe states,
but the implementation *looks wrong*. dtrace_sync() is a way of ensuring
that another cpu is either not running the dtrace code, or, if it is,
is at a sync point. But there are many sync points in the dtrace driver, and
no guarantee that the other cpus are anywhere close to where the invoking
cpu is asking for help.

Its a bit like a 3-dimensional goto - trying to prove safety via
the various code paths is not easy (maybe not possible).

Normally, mutex's are used to ensure guarded regions of code, along
with interrupt enable/disable, to prevent nested interrupts.

But dtrace_sync() is different - it suspends the invoking cpu
until the other cpus have acknowledged the interrupt - and the
acknowledgement is not done based on where the other cpus are.

The problem that dtrace/linux is having is mostly around the timer
interrupt - breaking the kernel contract on interrupts and
scheduling state. Its not possible (without hacking or damage) to conform
to the contract, which means I need to either stop using smp_call_function
or seek some other mechanism.

I need to sleep on this and work out the various permutations of
code paths.

[Note, I do have a safe workaround, but the workaround will cause the
odd timer tick drop within dtrace (the rest of the kernel is not affected).
I may have to release this as a temporary safe fix].


Post created by CRiSP v10.0.5a-b5971


2 comments: