Sunday, 22 July 2012

PID Provider: Did you call? #2

So, we can place a PID specific probe in an arbitrary process.
Doing so causes modifications to the target address. When the
address is executed, a breakpoint trap is executed.

Fasttrap (PID provider) looks in its data structures and
maps this to a probe, and logs a record for the originating dtrace

But we have to get past the breakpoint. What the code in fasttrap
does is similar to the in-kernel code. We single step the instruction
which was patched. This is quite complex, because, remember, the
original instruction has a breakpoint placed on top of it. So, fasttrap
arranges to single step the instruction in a scratch buffer.

It took me ages to "get this". What scratch buffer? When doing in-kernel
probes, dtrace4linux has a per-cpu scratch buffer for this purpose.
But we cannot use this, for two reasons. Reason 1: its not visible to the
process in user space. Reason 2: processes may be preempted, so we
cannot guarantee that the scratch buffer would remain unscathed to
complete the action for a process, before another process steps on the same

I spent a long time looking at this, trying to figure out how
Apple/FreeBSD/Solaris does this. On Solaris, each thread in the system
has a scratch buffer in the in-kernel lwp_t structure. Ah! We cannot
force this on Linux, without rebuilding the kernel, but we just need
a private area to dump the scratch instruction into. I was looking
at the idea of jamming a 4K page into the address space of the process,
or leveraging the VDSO system call page, but both require some thought
because of need to garbage collect, or avoid problems with other
users of the area. In the end I decided that the current thread stack
is a good place to do this. Most threads have 1-10MB of stack, and rarely
use more than a small fraction. In fact, nobody uses the bottom area
of the stack, since doing so, might expose the application to random
segmentation violations as it runs out of stack. So, stack space is
allocated to be much larger than any part of a process needs.

So, we can just use the area below the stack. This isnt ideal, but
its simple. Its not ideal, because it means a process, which does not
obey the normal stack frame rules, might be perturbed by what we are
doing. Tough. :-)

Another long problem I had was figuring out what actually happens
during a PID probe. When you use dtrace to plant a probe, the PID provider
constructs dynamically a probe for you. (You can see the probes, e.g.
pidNNN:::). These probes disappear when either the target disappears or
the probing dtrace disappears.

But how? I spent ages looking at the code. I added debug to /proc/dtrace/fasttrap.
Whilst PID provider is in action, you can see three tables which fasttrap

  • 1. A table of the probes themselves (tracepoints). Used to map a trapping probe
    back to the owner process

  • 2. A table of processes being probe-provided. When the target
    process terminates, the owning tracepoints are dismantled.

  • 3. A "provider" table. When you attach to a process to probe it,
    probes are created, but the probes belong to a provider (eg "fbt",
    "syscall", or "pidNNN"). Each process you attach to is effectively a brand
    new provider.

Now I found some other interesting things out. If you probe a process, and
that process dies, your dtrace does not terminate (unless you make provision
for this in your D script). The dtrace hangs, and its up to you to ^C it.

Heres an example of the trace tables in fasttrap:

cat /proc/dtrace/fasttrap
tpoints=1024 procs=256 provs=256 total=9
# PID VirtAddr
TRCP 5748 000000000040087d
TRCP 5748 000000000040087e
TRCP 5748 0000000000400881
TRCP 5748 0000000000400886
TRCP 5748 000000000040088b
TRCP 5748 0000000000400890
TRCP 5748 0000000000400891
PROV 5748 pid 0 0 9 0 0
PROC 5748 1 1

"5748" is the target PID is was tracing. The TRCP entries show the virtual
address in the target process where probes lay in waiting (each instruction
of the "do_nothing2" function I attached to). The other fields in the tables
are not really interesting (look at the source code to see what they are;
I may fix the output to make it more self-describing). The output from
/proc/dtrace/fasttrap is three table dumps (the header line above does
not reflect that).

Once I had this "view" of what the provider was doing, I could immediately
go fix another issue.

I had a lot of trouble with killing the dtrace and the kernel panicing.

Continued in the next blog entry...

Post created by CRiSP v11.0.10a-b6436

No comments:

Post a Comment