Sunday, 1 March 2015

another dtrace delay

Everything was looking promising to release a new dtrace sometime last
week. It was working on the 3.16 kernel, 3.8, 3.4 and then onto RH5.6 (2.6.18).
I ran into a lot of issues on 2.6.18 - not surprising, given the code
mutations. Much of the last 2 weeks was on the execve() system call.
It would panic the kernel. Despite a lot of experiments and reading
of the assembler and kernel code, I kept doing silly things. It
really doesnt help that the 2.6.18 kernel will hard panic on a stray
GPF - made it very difficult to figure out what was going on.

Eventually I got every line of assembler and issues with registers in C
code to work.

Along the way I had an issue with the "old_rsp" symbol. This is not exposed
in /proc/kallsyms, and not even in the /boot/System.map code. I had
to write a tool to extract this from inside the kernel. But this ran into
complications because /proc/kcore is broken on the RH/Centos kernels. I
had to create a new device driver, which has to be loaded into the kernel
prior to the build of dtrace ("/proc/dtrace_kmem"). Its a very simple
driver only designed to handle the scenario of building dtrace.

Having got this work, then the next roadblock was the rt_sigreturn() syscall
which paniced the kernel. Careful investigation showed a missing line
of assembler (for the 2.6.18 kernel). Now that works.

Now everything is looking good on RH5/Centos5 but before going on the
trawl of later kernels and proving I didnt break anything, I have an
issue with x_call.c. Either I use the native smp_call_function() interface -
which works great, until we panic the kernel, or I use my implementation,
which doesnt seem to be broadcasting to the cpus - this means
certain probes get "lost".

So, hopefully this week or next weekend - depending on the xcall issues.

Post created by CRiSP v12.0.3a-b6808


Monday, 23 February 2015

dtrace update ...

Still delaying the dtrace release. Having gotten 3.16 kernels to work,
I started working backwards on random 3.x kernels, to validate it
still worked there. I fixed a number of issues there, and then headed
into RedHat 5.6 / Centos 5.6 land (2.6.18+ kernel).

I spent some time trying to get execve() syscall tracing to work - and
am still working on that.

Along my journey, I noticed a few things. Firstly dtrace4linux is too
complicated - trying to support 32+64b kernels, along the entire
path back to 2.6.18 or earlier, is painful. I cannot easily automate
regression testing (not without a lot more hard-disk space, and not
worthwhile whilst I am aware of obvious bugs to fix). I could simplify
testing by picking any release, and just rebooting with different kernels -
rather than full ISO images of RedHat/Centos/Ubuntu/Arch and so on.

I also noticed that the mechanism dtrace4linux uses to find addresses in
the kernel is slightly overkill. It hooks into the kernel to find symbols
which cannot be resolved at link time. The mechanism I have is pretty
interesting - relying on a Perl script to locate the things it needs.
I found a case where one of the items I need is not visible at all
in user space - its solely in the kernel - part of the syscall interrupt
code (the per-cpu area). Despite what latest kernels do, some older
kernels *dont*. And catering for them is important. In one case
I have had to go searching the interrupt code to find this value.
I ended up writing a C program to run in user space, prior to the build,
and really, it would have been better to generalise this so that everything
we need is simply defined in a table compiled in to the code, rather than
the /dev/fbt code to read from the input stream. This would ensure
that a build compiles and works. Today, sometimes I debug issues with
old kernels because a required symbol is missing and we end up
dereferencing a null pointer (not a nice thing to do in the kernel).

One problem I had with the above, was that gdb on the older distro releases
cannot be used to read kernel memory due to a bug in the kernel
precluding reading from /proc/kcore. Fortunately, I include a script
in the release which emits a vmlinux.o, complete with symbol table,
from the distribution vmlinuz file.

I havent reverified the ARM port of dtrace, but thats something for a
different rainy or snowy day.

Post created by CRiSP v12.0.3a-b6808


Friday, 20 February 2015

new dtrace .. small update

The next release of dtrace is hopefully this weekend. Having
resolved the issues I had previously, have been doing more
testing - so far only really on the 3.16 kernel, and found that
some of the syscalls were behaving badly due to reimplementation
in the kernel. Hopefully when I have fixed the last two or three,
then I can finish my merges and push out the latest release. I will
do a cursory check on some of the older kernels - it is likely I
have made a mistake somewhere and broken older kernels, but will
be easier to fix having made some internal changes.

Note that no new functionality is in here - the issues with
libdwarf remain - I may try again to solve that issue, and
"dtrace -p" is still a long way off from being functional.

Given that 3.20 is now the current kernel, I may need to see
if that works and pray that 3.17-3.20 didnt affect how dtrace works,
or, if it does, the work to make it compile should be much less
than the issues that 3.16 raised.

Post created by CRiSP v12.0.3a-b6801


Thursday, 19 February 2015

Why is gcc/gdb so bad?

When gcc 0.x came out - it was so refreshing. A free C compiler.
GCC evolved over the years, got slower and used more memory. I used
to use gcc on a 4MB RAM system (no typo), and wished I had 5MB RAM.
Today, memory is cheap, and a few GB to compile code is acceptable.
(The worst I have seen is 30+GB to compile a C++ piece of code - not
mine!)

One of the powerful features of gcc was that "gcc -g" and "gcc -O"
were not exclusive. And gdb came about as a free debugger, complimenting
gcc.

Over recent years, gdb has become closer to useless. It is a powerful
and complex and featureful debugger. But I am fed up single stepping
my code, and watching the line of execution bounce back and forth
because the compiler emits strange debug info where we move back
and forth over lines of code and declarations.

Today, in debugging fcterm - my attempt to place a breakpoint
on a line of code, puts the breakpoint *miles* away from the place I
am trying to intercept. This renders "gcc -g" close to useless, unless
I turn off all optimisations, and pray the compiler isnt inlining code.

Shame on gcc. Maybe I should switch to clang/llvm.

Post created by CRiSP v12.0.3a-b6801


Sunday, 15 February 2015

address: 0000f00000000000

Strange. Continue to keep finding why dtrace is not passing my tests.
I have narrowed it down to a strange exception. If the user script
accesses an invalid address, we either get a page fault or a GPF.
DTrace handles this and stubs out the offending memory access. Heres
a script


build/dtrace -n '
BEGIN {
cnt = 0;
tstart = timestamp;
}
syscall::: {
this->pid = pid;
this->ppid = ppid;
this->execname = execname;
this->arg0 = stringof(arg0);
this->arg1 = stringof(arg1);
this->arg2 = stringof(arg2);
cnt++;
}
tick-1s { printf("count so far: %d", cnt); }
tick-500s { exit(0); }
'


This script will examine all syscalls and try and access the string
for arg0/1/2 - and for most syscalls, there isnt one. So we end up
dereferencing a bad pointer. But only some pointers cause me pain.
Most are handled properly. The address in the title is one such address.
I *think* what we have is the difference between a page fault and a GPF.
Despite a lot of hacking to the code - I cannot easily debug, since
once this exception happens the kernel doesnt recover. I have modified
the script above to only do syscall::chdir: which means I can manually
test via a shell, doing a "cd" command. On my 3-cpu VM, I lose one of the
CPUs and the machine behaves erratically. Now I need to figure out if
we are getting a GPF or some other exception.

I tried memory addresses: 0x00..00f, 0x00..0f0, 0x00..f00, ... in order
to find this. I suspect there is no page table mapping here or its
special in some other way. May need to dig into the kernel GDT or
page table to see what is causing this.

UPDATE: 20150215

After a bunch of digging I found that the GPF interrupt handler had
been commented out. There was a bit more to this than that, because
even when I re-enabled it, I was getting some other spurious issues.
All in all, various bits of hack code and debugging had got in the way
of a clear message.

I have been updating the sources to merge back in the fixes
for the 3.16 kernel, but have a regression on syscall tracing which
can cause spurious panics. I need to fix that before I do a next
release.

Post created by CRiSP v12.0.3a-b6801


Monday, 9 February 2015

no dtrace updates

People have been questioning why there are no dtrace updates.
I hope to be in a position to properly respond shortly. Just before
Christmas, I started work on Debian Jessie (3.16 kernel) and hit a number
of issues. Although I made good progress fixing issues on x32 syscalls on
a x64 system, and systematically fixing other issues, I had to hack the
driver tremendously. These hacks are experiments to figure out why
I could so easily crash the kernel. The usual means of panicing the
kernel did not hold - normally a stray issue causes a kernel message
and I can debug around the issue to isolate the cause.

The issues I hit were all very low level - the cross-cpu calls, the
worker interrupt thread, and the current issue - relating to invalid
pointers when accessed via a D script. I have a "hard" test which wont
pass without crashing the kernel - crashing the kernel really hard,
requiring a VM reboot. This is nearly impossible to debug. The first
thing I had to do was increase the console mode terminal size - when
the panic occurs, the system is totally unresponsive and all I have
is the console output to look out, with no scrolling ability. Having
a bigger console helps - but it seems like the GPF or PageFault interrupt,
when occuring inside the kernel, does not work the same way as
it has on all prior Linux kernels. Looking closely at the
interrupt routines shows some changes in the way this works - enough
to potentially cause a paniccing interrupt to take out the whole
kernel; this makes life tough to debug.

If I am lucky, the area of concern is related to the interrupt
from kernel space. If I am unlucky, it is not this, but something else.
(Am hypothesing that the kernel stacks may be too small).

I have been saving up putting out any updates, despite some
pull requests from people, because I am not happy the driver is in
a consistent state to release. When I have finished this area
of debugging, I can cross-check the other/older kernels, and see if
I have broken anything.

It is very painful dealing with hard-crashing kernels - almost nothing
helps in terms of debugging, so am having to try various tricks to
isolate the instability. These instabilities in theory, exist on other
Linux releases - but I will only know when I have gotten to the bottom
of the issue.


Post created by CRiSP v12.0.3a-b6801


Monday, 1 December 2014

DTrace & Debian/Jessie

Someone reported a bug in dtrace whereby execve() wasnt tracing.
I created a VM and started testing, and can confirm this. Looks
like dtrace is getting confused by the new rewritten syscall assembler.
I have a working version for this in my testbed, but I found that
changes to the IPI code in the kernel are making any dtrace probes
extremely unreliable - looks like a 1:N chance of seeing output (where
N is the number of cpus you have).

I have some similar issues in Ubuntu 14.04 - hopefully similar issues.

Hope to have a new release shortly in a few days.

Post created by CRiSP v12.0.3a-b6801