CRiSP, DTrace, and other technobabble: Couple of tools

In the previous post, I talked about the difficulty of debugging
the cpu-stuck lock syndrome.

I thought it worth documenting a couple of trivial tools I wrote.

The first tool, is a simple "tail -f" tool. It seems that the GNU
"tail" tool doesnt work for character special device files. So:


$ tail -f /proc/dtrace/trace

doesnt track the new data in there. A char-special file doesnt have
a "size", so, when EOF is hit, you cannot necessarily do the standard
trick of seeking and rereading the new data.

Whilst debugging dtrace, its useful to keep an eye on /proc/dtrace/trace.
I had given up using kernel printk() messages for debugging, because
when faced with cpu lockups, you may not be able to see the messages
and earlier lockups had implied that the lockups were a by product
of multiple cpus causing printk() which in the later kernels, has
its own mutex.

So, tools/tails.pl does the magic of doing the tail. The algorithm
is very simple: keep reading lines, storing them in a hash table. If
we havent seen the line before, print it out. On EOF, sleep for a while
and keep rereading the file. Its cpu intensive (and potentially memory
intensive) but for the purposes, it fits the bill.

I also found it useful for tracking /var/log/messages, since in the event
of a stuck-cpu, you may not have a chance to see the file.

cpustuck.pl

The next tool - which is quite interesting, is trying to visually see
a stuck cpu. Almost, like a black hole, you cannot see it - its likely
stuck in a tight loop with interrupts disabled.

Heres how we do it: have a program which prints, to the console, once
a second, the cpu we are running on. Using the taskset utility, we can
ensure that this process *only runs on the nominated cpu*.

So, now the algorithm is: foreach cpu, spawn a child process, which can only run
on the specified cpu, and prints out the cpu it is on. By running these
processes in parallel, we will see an output sequence like (this is a 3-cpu
example):


0120120122100010002222220012012...

The order of printing is indeterminate. Under normal circumstances,
we expect all cpus to say something (in any order). If one or more
cpus lock up, then we will miss the notification, and the pattern breaks.
Eventually, if all cpus lock up, we will see no more output, and
usually no response to the keyboard, and pretty soon...the need to reset
the machine.

What this allowed was to detect *which* cpu was stuck, quite easily.
Unfortunately, that is all it does. We know a cpu is stuck but we
dont know what/where it is stuck.

Combine this with the "tail.pl /var/log/messages", and, if we are lucky,
when the kernel sisues the cpu stuck diagnostic, we can get a stack trace,
and get a better understanding of whats going on.

Its not sufficient - if we know the stack trace for a stuck cpu,
it may not be clear *why* it is stuck. In my case, it seemed to be
inside the code for kfree(), on a spinlock.

But to be stuck on a spinlock implies some other cpu is doing something,
which was not found to be the case.

In the end, the problem was really a deadlock which was not
easily understandable. One cpu is asking the other cpus to do something.
Whilst this is happening, the original cpu is being asked to do something.
But the first cpu is not listening (interrupts disabled).

The ultimate cure was to understand fully the two main areas where
dtrace blocks (mutexes and xcalls) and ensure they "knew" about
each other (mutexes will drain the xcall queue whilst waiting for
a mutex).

Post created by CRiSP v10.0.19a-b6127

CRiSP, DTrace, and other technobabble

Friday, 9 December 2011

Couple of tools

No comments:

Post a Comment