Sunday 30 October 2011

Dtrace and printk()

Been debugging lockups in 3.0 kernels. Very difficult to debug, since
all attempts to diagnose what was causing it were met with kernel
lockups.

Sometimes a trace like:


$ dtrace -n fbt::[a-e]*:


would work, and sometimes not. Lots of variations and thought processes
were applied, and nothing worked.

Then, after a little rest, I went back to basics. Lets assume one function
blocks us, so we try the binary search to see which fbt function it is.

Turns out that the new kernel has modified printk() - the kernel
printing function in some way. (I think its to do with recursive prints,
but not concluded this yet).

What appears to be happening is if printk() is called at the wrong
time, the kernel will lock up, waiting on a semaphore, to detect if
the console is free for printing.

printk() is not normally called much during dtrace, but there
seem to be enough places. If I map printk() to a do-nothing function,
then sanity appears to be restored and I can run against fbt:::.

So I need to either avoid printk() in dtrace, or, be judicious where
its used. (Dtrace already has an internal dtrace_printf function to write
to an internal circular buffer, but thats not visible if the kernel
crashes; I may need to fix that).

So, if you are having trouble on Ubuntu 11.04 or 11.10, or other
equivalent, using Linux 3.0.x, then stay tuned.

Thanks Nigel Smith, for pushing me to go hunt this down.

Post created by CRiSP v10.0.17a-b6103


No comments:

Post a Comment