Saturday 11 June 2011

dtrace progress

Been continuing work to increase resilience of dtrace. One thing
I found was that there are some syscalls which have differing
calling sequence compared to the others (fork, clone, sigreturn, execve
and a few others).

Bear in mind when we think of a kernel - there are multiple
views of the kernel:


- 64b kernel running 64b apps
- 32b kernel running 32b apps
- 64b kernel running 32b apps


The apps get to the kernel via system calls. System calls are implemented
in a variety of ways - depending on the kernel version and the
CPU. (Some older cpus, such as i386, i486 dont support instructions
like SYSCALL, SYSENTER).

So dtrace traps the system calls by patching the system call table.
The code is mostly the same but subtley different for a 32b and 64b
kernel.

But when a 32b app is running on a 64b kernel - the app doesnt know
any different, but the kernel does. The kernel has two system call
tables: the system call, for eg. "open" is a different index on the
two OS's. The two OS's developed differently. i386 kernels have
had to maintain backwards compatibility, but the amd64 kernel did not
and started afresh at the point these cpus became available.

Dtrace handles that.

Except it didnt handle the special syscalls: when a 32b app invokes
fork(), clone(), etc, we usually ended up panicing the kernel.

Most Linux distros are "pure": a 64b distro has 64b apps, so you
rarely see the effect of a 32b app.

Linux/dtrace has a nice interface for system calls. The probe name, e.g.


$ dtrace -n syscall:::


matches all system calls. But the 32b and 64b calls are different probes.
So, you can intercept all 32b syscalls on a 64b system:


$ dtrace -n syscall:x32::


which is useful in many ways.

I have nearly fixed these special syscalls on the 64b kernel - just
have clone() to fix. The symptom of not fixing is a cascade of
kernel OOPs and panics (because the kernel stack layout is not
what it should be).

I hope to release later today a fix for this problem.

Post created by CRiSP v10.0.10a-b6012


No comments:

Post a Comment