The fast-teardown was proving too erratic.
This release re-enables the standard teardown. The performance difference
here of fast teardown vs classic is about 40+s to terminate an fbt:::
I've put in a different optimisation, which not as good as the
original fast teardown, is a decent optimisation (around ~10s).
The new optimisation realises that during teardown, not only is
the current cpu invoking probes (eg dtrace_xcall invokes the functions
to send an IPI interrupt to the other cpus, and the probes for our
current cpu are totally pointless), but also the others are
just passing the time, doing "stuff" and invoking probes. So, we try
to "slow down" the other cpus - as soon as they try to probe, we
put them in a small poll loop, checking for xcall calls.
I put some stats into /proc/dtrace/trace to show the tail of the
teardown. Heres an example. (4-cpu VM):
46.251921756 #3 teardown start 1322415220.918787912 xcalls=35 probes=2964975
51.106543255 #3  x_call: re-entrant call in progress.
56.645252318 #3 teardown done 10.393330562 xcalls=108922 probes=1161780
The teardown took 10.3 seconds, and during this time, 1,161,780 probes
fired. We did 108,922 xcalls. (Previously we did a handful only - the
xcalls are very expensive).