Monday 9 February 2015

no dtrace updates

People have been questioning why there are no dtrace updates.
I hope to be in a position to properly respond shortly. Just before
Christmas, I started work on Debian Jessie (3.16 kernel) and hit a number
of issues. Although I made good progress fixing issues on x32 syscalls on
a x64 system, and systematically fixing other issues, I had to hack the
driver tremendously. These hacks are experiments to figure out why
I could so easily crash the kernel. The usual means of panicing the
kernel did not hold - normally a stray issue causes a kernel message
and I can debug around the issue to isolate the cause.

The issues I hit were all very low level - the cross-cpu calls, the
worker interrupt thread, and the current issue - relating to invalid
pointers when accessed via a D script. I have a "hard" test which wont
pass without crashing the kernel - crashing the kernel really hard,
requiring a VM reboot. This is nearly impossible to debug. The first
thing I had to do was increase the console mode terminal size - when
the panic occurs, the system is totally unresponsive and all I have
is the console output to look out, with no scrolling ability. Having
a bigger console helps - but it seems like the GPF or PageFault interrupt,
when occuring inside the kernel, does not work the same way as
it has on all prior Linux kernels. Looking closely at the
interrupt routines shows some changes in the way this works - enough
to potentially cause a paniccing interrupt to take out the whole
kernel; this makes life tough to debug.

If I am lucky, the area of concern is related to the interrupt
from kernel space. If I am unlucky, it is not this, but something else.
(Am hypothesing that the kernel stacks may be too small).

I have been saving up putting out any updates, despite some
pull requests from people, because I am not happy the driver is in
a consistent state to release. When I have finished this area
of debugging, I can cross-check the other/older kernels, and see if
I have broken anything.

It is very painful dealing with hard-crashing kernels - almost nothing
helps in terms of debugging, so am having to try various tricks to
isolate the instability. These instabilities in theory, exist on other
Linux releases - but I will only know when I have gotten to the bottom
of the issue.


Post created by CRiSP v12.0.3a-b6801


No comments:

Post a Comment