Friday, 25 November 2011

New dtrace release

After the release last weekend, more work was done to chase down
reliability issues. They never seem to disappear. Lets hope this is
better than the last "good one".

The Fedora Core 15 kernel has some "different" stuff in it compared
to Ubuntu - I tracked down a few unique instructions at the probe
entry points which werent emulated properly.

I also fixed some other bugs in the instruction emulator.

Next, the fast-teardown mode wasnt resilient enough and had to be
rewritten. Without this, Ctrl-C a fbt::: probe could take 20+ seconds
due to the frantic xcall conversations as each probe is removed one by one.
Ubuntu and Fedora would complain about a stuck CPU when this happens,
which isnt very social.

The new fast-teardown removes all dtrace_sync() calls, bar a few.
(I would like to know how Apple does it; I know how Solaris does it; I
dont know if FreeBSD is "as good as" Apple, or very poor). What is very
interesting in this area is that I dont believe the Solaris mechanism
can work on Linux, since Linux can have interrupt-disabled spinlocks which
can prevent cross-cpu interrupts from happening and causing deadlock.
My solution was to keep the lockless solution from before, but
avoid the 2x calls to dtrace_sync per probe teardown, but "using a better
algorithm". For simple one/two probes, it doesnt matter, but when you
have 40,000+ probes in flight it makes a big difference.

Now the next issue is the tick/profile provider. I found today
that the "profile" provider isnt implemented (it generates a TODO
warnings). On re-reading the Solaris tick/profile provider pages, I
found they are "wrong". The documentation refers to the fact that
you can use, e.g. "tick-1us" to have microsecond level granularity probes,
but they are not supported. Dont know why (maybe my source code is
out of date, or they felt it too worrisome to support them).

My favorite provider at present is "tick-1ms" - a one millisecond tick.
I knew dtrace/linux could "drop" probes when under pressure, but I havent
had time to investigate why. A simple test:


dtrace -n ' int cnt_1ms, cnt_1s;
tick-1ms {
cnt_1ms++;
printf("%d %d.%09d", cnt_1ms, timestamp / 1000000000, timestamp % (1000000000));
}
tick-1s { cnt_1s++;
printf("tick-1ms=%d tick-1s=%d", cnt_1ms, cnt_1s);
}
tick-5s {
printf("the end: got %d + %d\n", cnt_1ms, cnt_1s);
exit(0);
}
'


Lets you quickly see if there are 1000 tick-1ms probes in a second.

There arent.

Linux is achieving about 750/sec, and MacOS achieve around 997/second.

I suspect my attempts to get it to "work" have left a gap in that the
timer is not a cyclic timer, but a refiring timer, so the latency between
handling the end of one tick and repriming the timer, is causing is to expose
a window. Thats probably a part of the explanation; there may be more.
On an idle system, losing 25% of the ticks seems a bit excessive.
(Dtrace/Linux can achieve in excess of 1-2m probes/sec, so, 1000 measily
ticks shouldnt be too bad).

I need to experiment further. Some things that could be affecting this include:
(a) bad (my) code, (b) virtualisation, (c) high latency kernel scheduling.

I hope it is (a).

Post created by CRiSP v10.0.17a-b6112


No comments:

Post a Comment