Sunday, 15 July 2012

DTrace PID Provider

I've recently been working on the PID provider, because, I had
been putting it off, and people were asking about it.
Most of the code is there, care of the original DTrace code, but
it doesnt work.

As I dive in, and its a deep and scary place to investigate, its slower
becoming clearer to me how it works.

The PID provider essentially does for processes, what normal dtrace
probes do for the kernel (very similar to the FBT provider).

Theres a number of ways to come at the PID provider. Firstly, you could
be launching an executable from inside dtrace, or, you could be targetting
a specific running process, or you could be looking for any process which
hits, for example, a libc call.

This is all very expensive - instead of dealing with the kernels symbol
table, which, although large, is generally smaller than most executables,
and relatively unchanging. To get the correct symbol table of a running
process involves examing /proc/pid/maps, to find the mapped libraries, and
then examining the process memory, to find the symbols of interest.

Lets take an example:

$ dtrace -n pid1234::malloc:entry

We locate the process (pid 1234), find the library where malloc is located,
find the address of malloc() inside the library, and then we *patch it*.
The malloc entry instruction is replaced by a breakpoint instruction.
Very similar to the kernel.

But, before we do this, we need to tell the kernel that this breakpoint
is a DTrace probe, which is handled by the fasttrap.c and fasttrap_isa.c
code. Whilst the above dtrace is running, you can see this "on-demand"
probe by examining "dtrace -l" or looking at /proc/dtrace/fbt.

Now - a number of things can happen: dtrace terminates or the process
terminates. If the process terminates, we need to rip out the probe, since
dtrace has played with and knows this probe exists. The fasttrap provide
intercepts fork/exec/exit system calls and should undo the placed

If dtrace exits, it should undo the patch to the process binary and
restore the original instruction, and, remove the fasttrap/pid provider
probes. (Confusingly, the fasttrap.c code contains the USDT and PID
providers - they are nearly identical, the difference being that for
USDT, the process places its own probes, but for PID provider, a
copy of dtrace [i.e., another process] is doing it).

So far, so good. My DTrace has had a number of bugs in the libdtrace code
(not quite fixed, but getting there), which affected ability to find and
place the probes. We can now place the probes, and its possible to see
this happen. In the above example we used "entry", but we could have
used "return", or just left the last field blank. (In which case every
instruction of the function is defined as a probe point - a very
good, but eventual, test case).

So, eventually the application will hit the breakpoint, and the INT3
breakpoint handler will ask the fasttrap provider to handle the breakpoint.

At this point, things get a little confusing. DTrace is *not* using
INT3, but INT 0x7E. INT 0x7E is a two-byte instruction vs INT3 which is
a one byte one. DTrace (in libdtrace and fasttrap_isa.c) goes to great
lengths to handle this by emulating the instruction which was overwritten.
(This in fact was my original approach to single stepping the kernel,
but gave up as being too hard and a pain to debug; INT3 is a single
byte instruction so its easier to step over the instruction. But
we mustnt temporarily reset the instruction to step over it, because
another thread might hit the same instruction whilst we have a
temporary instruction in place and miss the probe).

Lets go over this again: if we overwrite a user instruction, then
we need to do this with a single byte trap (INT3), because if we dont
and the target instruction is one byte long, then we can corrupt the
subsequent instruction.

We have to be careful that this process may have multiple threads, running
on different CPUs at the same time, who may hit the affected probe point.

I note that the Solaris dtrace code distinguishes an entry point from
a return point and uses different INT traps to affect this. At present
DTrace for Linux isnt supporting these traps, but now I have uncovered
them, I need to understand more about *why* different trap types are
used. From my work on the original kernel code, INT3 seems sufficient
for all types of traps. (There is a potential issue that if we attach
to a process in user space, which is being debugged, that we can get
confusion about whether the INT3 is for dtrace or for the debugger).

Theres some other problematic areas; dtrace locks the process
at certain key points, to avoid race conditions which could cause
trouble (eg we mustnt allow a "kill -9" from someone else kill
the process we are trying to instrument). Dtrace for Linux is
keeping shadow data structures for processes, which the real kernel
knows nothing about. So, again we have to be careful that we keep
the "mirage" effect of security and safety.

I am going to fix the known areas at issue - I have already
demonstrated (to myself!) that we can take a PID provider trap;
releases up until today nearly do that, but they are missing a few
fixes which I will release when I am happy the next release is
better than what is available on my site and github. Hopefully,
a few days away.

First, I need to fix /proc/dtrace/fasttrap - I want to dump
out the key internal data structures, mostly to prove to myself
I understand them - the output will show the tracepoints
and PIDs being monitored, but at the moment, they are deficient since
they only show the USDT placed probes.

Post created by CRiSP v11.0.10a-b6436

No comments:

Post a Comment