Monday, 6 August 2012

DTrace update

I put out a new release today. This addresses a problem with
syscall::rt_sigsuspend: on Centos/RedHat kernels. Its strange
that even doing a blatant:

$ dtrace -n syscall:::

didnt pick up this issue before, and no matter how hard I tried,
I didnt reproduce the error on Ubuntu (all the way back to Ubuntu 8).

What is interesting about the Centos 5 series of kernels is how
bastardised they are. They are based on the 2.6.18 kernel, but with many
upstream patches. This means using normal kernel version conditional
compilation, wont work, not without taking into account the RedHat
major version numbers, and even then, the prolific numbers of kernels
are problematic to support. Anyway, the issue with rt_sigsuspend related
to something I had forgotten to do with 32b binaries on 64b kernels.

That *appears* to be resolved; I did have some form of difficult to
narrow down regression which may persist in the RedHat/Centos kernels
(occasional CPU lock ups - cant get the info out of the locked kernel
to determine what it is, and need to get kdump or kgdb to work properly).

I am back to playing with the PID provider. Tracing every instruction
in the CRiSP executable works .. for a while, and then we jump off to location
0 for some reason. Difficult to track down (despite lots of debug
in /proc/dtrace/trace), since everything looks right, but we just
decided location zero was a good place to go.

I wish CPUs had some form of trace buffer to see where we had
jumped from. (Please? Pretty please? Maybe it exists).

The fasttrap instruction emulation is very clever stuff. Theres a
performance cost for dtrace on Linux since for some kernels, the NX
bit is turned on for stacks (no-execute), which means we have to fudge
the page table entry to ensure the trampoline instruction works ok.
This potentially involves a TLB flush, which is not nice for performance.
(Theres still quite a lot of printk() debug in the fasttrap code,
so the TLB misses dont hurt as much as the extra debugging code).

Heres an example of the debug code:

$ cat /proc/dtrace/trace
1468.820705312 #0 2343-ffff81001789be28: 4c 89 64 24 e0 ff 25 00 00 00 00 56 bd 4e 00 00
1468.820705312 #0 2343-ffff81001789be38: 00 00 00 4c 89 64 24 e0 cd 7f
1468.820705312 #0 2343-COMMON: 00000000004ebd56
1468.820705312 #0 2343-ffff81001789be28: 49 89 f5
1468.823704856 #0 2343-fasttrap_isa: 1710: pc=00000000004ebd59
1468.823704856 #0 2343-ffff81001789be28: 49 89 f5 ff 25 00 00 00 00 59 bd 4e 00 00 00 00
1468.823704856 #0 2343-ffff81001789be38: 00 49 89 f5 cd 7f
1468.823704856 #0 2343-copyout: line 1723 ffff81001789be28 00007fff775df638 c=22 0
1468.823704856 #0 2343-dtrace_linux.c:rw_exit:1925: TODO:please fill me in

Until this is resolved, please take care if doing instruction
level tracing.

I also have a report of a compile issue with Arch linux, but I have
not been able to take the distro release (for i386) and get it to survive
more than a minute or two in a VM, so I cannot easily debug the issue.

Post created by CRiSP v11.0.10a-b6436

No comments:

Post a Comment