Monday, 23 February 2015

dtrace update ...

Still delaying the dtrace release. Having gotten 3.16 kernels to work,
I started working backwards on random 3.x kernels, to validate it
still worked there. I fixed a number of issues there, and then headed
into RedHat 5.6 / Centos 5.6 land (2.6.18+ kernel).

I spent some time trying to get execve() syscall tracing to work - and
am still working on that.

Along my journey, I noticed a few things. Firstly dtrace4linux is too
complicated - trying to support 32+64b kernels, along the entire
path back to 2.6.18 or earlier, is painful. I cannot easily automate
regression testing (not without a lot more hard-disk space, and not
worthwhile whilst I am aware of obvious bugs to fix). I could simplify
testing by picking any release, and just rebooting with different kernels -
rather than full ISO images of RedHat/Centos/Ubuntu/Arch and so on.

I also noticed that the mechanism dtrace4linux uses to find addresses in
the kernel is slightly overkill. It hooks into the kernel to find symbols
which cannot be resolved at link time. The mechanism I have is pretty
interesting - relying on a Perl script to locate the things it needs.
I found a case where one of the items I need is not visible at all
in user space - its solely in the kernel - part of the syscall interrupt
code (the per-cpu area). Despite what latest kernels do, some older
kernels *dont*. And catering for them is important. In one case
I have had to go searching the interrupt code to find this value.
I ended up writing a C program to run in user space, prior to the build,
and really, it would have been better to generalise this so that everything
we need is simply defined in a table compiled in to the code, rather than
the /dev/fbt code to read from the input stream. This would ensure
that a build compiles and works. Today, sometimes I debug issues with
old kernels because a required symbol is missing and we end up
dereferencing a null pointer (not a nice thing to do in the kernel).

One problem I had with the above, was that gdb on the older distro releases
cannot be used to read kernel memory due to a bug in the kernel
precluding reading from /proc/kcore. Fortunately, I include a script
in the release which emits a vmlinux.o, complete with symbol table,
from the distribution vmlinuz file.

I havent reverified the ARM port of dtrace, but thats something for a
different rainy or snowy day.

Post created by CRiSP v12.0.3a-b6808


No comments:

Post a Comment