Now, this has been driving me nuts for months. Why was my spanking new
cross-cpu code hanging occasionally? I had spent ages building up
the courage to write it, and was fairly proud of it. But it just
wasnt relilable enough and I disabled it in recent releases of dtrace.
Heres the problem: a cross-cpu synchronisation call is needed in dtrace.
Not often, but in key components. I feel like the way this was done in
dtrace was almost laziness, because there are other ways to achieve
this (I believe). But the single cross call (in dtrace_sync()) is
Interestingly, I was surprised it was called so often. Its called
during the tear-down of /usr/bin/dtrace as the process exits. I had
wondered why dtrace intercepts ^C and doesnt die immediately. It
does something very curious - it intercepts ^C and asks the driver nicely
to tear down the probes we may have set up. Of course, you can kill -9
the process, and it works. *But*. *But*. If you do that the probes
arent torn down! Instead, they are left running. After about 20-30s,
since nothing in user land empties the buffers, the kernel auto
garbage collects, but it means on a kill -9 scenario, whatever
you were tracing may continue to take effect.
I dont like the way ^C works in dtrace and I may attempt to fix it
(eg fork a child to tear down the probes; tear down is done by a STOP
Ok - so cross calls happen a lot especially during tear down (and also
during timer/tick interrupt handling).
So .. what happens? Well, on a two cpu system, the cpu invoking
cross call deadlocks against the other cpu waiting for the remote
procedure call to be acknowledged.
With the original Linux smp_call_function() there were lots of issues
in calling it with interrupts disabled (ie from the timer tick interrupt).
This is not allowed - two cpus calling each other at the same time
The cross-call code has to run with interrupts enabled and that
means being very careful with reentrancy and mutual invocation.
One day I put some debug into the code to try and spot mutual or
nested invocations and I got a hit. On a real machine. But never on
I modified the code to allow a break-out - after too long waiting, the
code gives up and allows the machine to stay in tact. Without this, the
machine would lock up (deadlock with interrupts disabled).
I fixed the code to handle mutual invocation and recursion.
But I could not figure out what the locked-up CPU was doing. I tried
to get stack dumps from the locked CPU - but these would only happen
after dtrace had given up waiting. Its as if the other CPU was asleep
and wouldnt wake up until the primary CPU had given up looking
(a definite Heisenbug!).
The web link at the top of this page illustrates the exact same
setup I was seeing. So, I followed the page (it tells that
acknowledging an end of interrupt to the APIC too prematurely may not
work on a VM).
Not only had I spent a huge amount of time to understand, fix and engineer
a solution but I almost had a working solution without realising it.
I had moved the APIC_EOI code to the end of the interrupt routine
previously, but because of the lack of support for mutual invocation, it
hadnt worked. So I put it back again.
So I think this is looking good - much better than before. I need
to do more torture testing and cleanup before I release.
On the way, I tried or started trying with lots of things
(like using a crash dump to analyse this problem .. which wasnt
successful). Or, using NMI interrupts instead of normal interrupts.
I've learnt a lot and been frustrated by a lot too along the way.
Keep an eye on twitter .. I'll report a status update if I think
I am not close enough.