Tuesday 31 May 2011

dtrace update 20110531

Its been a while since I put out a dtrace update and thought it
worthwhile to give a brief update of what is annoying me.

The most annoying thing in dtrace at the moment is me. I have spent
the last two months trying hard to resolve some resilience issues.

At the moment, there are two of them: (1) is the xcall code, and
the other is (2) something to do with syscall tracing.

As related on prior blogs, the mapping of dtrace_xcall() (which
does inter-cpu synchronisation), doesnt map to Linux very well. On
Solaris, the inter-cpu calling code works from interrupt context,
but we cannot do that in Linux. (Linux will write a warning to
the /var/log/messages files when this happens - although it does mostly
work).

I have tried a number of variants of a native IPI system in dtrace and
they have failed with various problems. The biggest problem is that
on an SMP system, cpu#1 will invoke a call to cpu#2 but cpu#2 wont respond
until cpu#1 finishes the xcall (a deadlock). In the code, I have
resolved the deadlock by giving up after a suitable period of time,
but thats not good enough. Trying to find out what cpu#2 is doing
when it refuses to respond to the interrupt is very tricky. Various
ad-hoc debug tricks (like using the native smp_call_function() to dump
stacks) failed. Additionally, the synchronous order of messages
written to /var/log/messages is horrendous when I am doing my implementation
of xcall - the cpus write out of order with timestamps going backwards.
(I can understand why, but it doesnt help).

I have given up resolving the SMP cross-call issue: and instead have
been trying something different. The only place where the
xcall issue is a problem is the profile/tick provider hr_timer clock
interrupts. So I have modified the code to use a tasklet structure instead.
This seems to work (I have some race condition problem to fix before I
can release it).

But, during all this testing, I hit another strange and annoying scenario.
Doing:


$ dtrace -n syscall:::


and doing intensive things in another window, like:


$ while true ; do date ; done
...
date: error while loading shared libraries: /lib64/ld-linux-x86-64.so.2: cannot apply additional memory protection after relocation: Error 9
...


Very occasionally, a system barfs. I have seen the output from dtrace
hang (it hangs until I press a key on the keyboard). I have tracked
this down : when a write() syscall is being executed, its being
turned into a read() syscall.

The event is very rare - 1 in hundreds of thousands of syscalls, but its
horrible. And its *this* problem which is likely what prompted me to go
on the xcall wild-goose chase. The "make test" regression suite is very
good at pushing the cpu load to the max whilst doing dtrace things, but
it occasionally would have issues.

So, if I can chase the 1:100,000 issue in syscall tracing, then I can
move forward. (I suspect a timer interrupt coming in during a syscall might
be causing the issue).

As always, I will release the code when I feel its better than where
we are.


Post created by CRiSP v10.0.10a-b6012


2 comments:

  1. Just stumbled across this blog, and I am probably the gazillionth person to point this out, but the link to dtrace at the crisp.demon.co.uk blog doesn't resolve. At least not for me.

    A post at openindiana.org provided a link that resolves, to wit:

    ftp://crisp.dyndns-server.com/pub/release/website/dtrace

    This is either correct, or an entertaining hoax.

    Might I ask which dtrace tarball is the currently anointed one? Or is linux dtrace still too unstable to take out for a dance?

    Thanks..

    PS: apologies if this posts multiple times. blogger keeps eating my attempts to comment.

    ReplyDelete
  2. Hello Jim

    Yes - crisp.dyndns-server.com is the correct place for dtrace. Apologies - I didnt notice my side bar is out of date. I screwed up somewhere.

    Given you found the dtrace repository, the latest dated release is the place to go to and try out. I am not happy for this to be used under heavy load on a production system. I have spent a lot of effort to find the cause of instability, and I will publicise my findings and results as soon as I am happy. For now, treat it with kid gloves.

    it does work remarkably well, but under my stress tests, strange things happen - some, I have fixed in my unreleased code base, but others are an embarrassment for me.

    I will try and fix that side bar link now!

    thanks

    ReplyDelete