Sunday 1 March 2015

another dtrace delay

Everything was looking promising to release a new dtrace sometime last
week. It was working on the 3.16 kernel, 3.8, 3.4 and then onto RH5.6 (2.6.18).
I ran into a lot of issues on 2.6.18 - not surprising, given the code
mutations. Much of the last 2 weeks was on the execve() system call.
It would panic the kernel. Despite a lot of experiments and reading
of the assembler and kernel code, I kept doing silly things. It
really doesnt help that the 2.6.18 kernel will hard panic on a stray
GPF - made it very difficult to figure out what was going on.

Eventually I got every line of assembler and issues with registers in C
code to work.

Along the way I had an issue with the "old_rsp" symbol. This is not exposed
in /proc/kallsyms, and not even in the /boot/System.map code. I had
to write a tool to extract this from inside the kernel. But this ran into
complications because /proc/kcore is broken on the RH/Centos kernels. I
had to create a new device driver, which has to be loaded into the kernel
prior to the build of dtrace ("/proc/dtrace_kmem"). Its a very simple
driver only designed to handle the scenario of building dtrace.

Having got this work, then the next roadblock was the rt_sigreturn() syscall
which paniced the kernel. Careful investigation showed a missing line
of assembler (for the 2.6.18 kernel). Now that works.

Now everything is looking good on RH5/Centos5 but before going on the
trawl of later kernels and proving I didnt break anything, I have an
issue with x_call.c. Either I use the native smp_call_function() interface -
which works great, until we panic the kernel, or I use my implementation,
which doesnt seem to be broadcasting to the cpus - this means
certain probes get "lost".

So, hopefully this week or next weekend - depending on the xcall issues.

Post created by CRiSP v12.0.3a-b6808


3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Having trouble finding the *right* place to reply to this post...

      I got past the 'old_rsp' issue without the dtrace_kmem driver & thought I'd share (perhaps it may be useful in diagnosis). Running the 'make all' I received the 'old_rsp' error. Thinking I might post the output to Issues on GitHub, I ran the following to display the kernel version and error messages at the same time:

      uname -a;make all | egrep -e 'BUILD_DIR|ERROR'

      The build completed successfully, as did the 'make load', and 'make install'. I tested the output on a little program I wrote:

      [root@lenx100e dtrace-20151106]# dtrace -n 'syscall:::entry /execname == "filecopy"/ { @[probefunc] = count(); }'
      dtrace: description 'syscall:::entry ' matched 652 probes
      ^C

      exit_group 1
      close 2
      write 1597794
      read 1597795
      [root@lenx100e dtrace-20151106]#

      OS = CentOS 6.7
      Kernel version = 2.6.32-573.8.1.el6.x86_64.

      Thanks for working on this port! This is excellent work.

      Delete
  2. Quite interesting post...if you do not mind my asking, does dtrace support the latest kernel 4?

    ReplyDelete