Sunday, 22 July 2012

PID Provider: Did you call? #3

When I ^C'ed the dtrace process, I would often panic the kernel .. badly.
A slurry of messages scrolled on the screen telling me an atomic
condition was broken.


The act of tearing down the potentially many probes in a process
is long enough that various windows of vulnerability exist.
If you kill -9 the dtrace process, it will tear down the probes,
and it will do so, with various mutexes set. If fasttrap tries
to dismantle its version of the probes, a deadlock can exist.

So, fasttrap code, during the teardown, uses an optimistic timer
to take out the probes (tracepoints). The mechanism is a classic
kernel function - timeout(). Up until 2 weeks ago, timeout() in
dtrace4linux was a stub implementation.

I had to implement timeout() and quickly knocked up some code,
based on the hrtimer mechanisms in the kernel.

This caused me no end of issues, and took me ages to understand
what was going on.

As dtrace was closing the /dev/dtrace device, tearing down the
probes it had set, the timer would fire, and interrupt the
closing dtrace. The timeout function would dismantle the fasttrap
probes, and assert a mutex, held by the terminated dtrace process.
Classic deadlock. It also showed up a potential problem
in my code (driver/mutex.c), which attempts to call the scheduler if
a mutex appears stuck (which lead to the kernel issues, since
calling the scheduler from a timeout is not the correct thing to do).

I checked the Solaris code, to remind myself how timeout() works.
What I found was interesting. A timer interrupt, in Solaris doesnt
just fire, interrupting the current process. Its fired from
a special context, effectively interrupting a dummy process. This
resolves the deadlock - a timer can never interrupt a mutex protected
block of code - it interrupts in the context of another process. So
the original process can make progress, release the lock and allow
the timeout to make progress.

We are almost done.

There is a piece of code in driver/dtrace.c, which I never understood,
and had commented out:

dtrace_taskq = taskq_create("dtrace_taskq", 1, maxclsyspri,
1, INT_MAX, 0);

It hasnt harmed dtrace4linux having that commented out. The reason
I was looking was that in looking at /proc/dtrace/fasttrap, I could
see what a PID probe looked like. When the target process and dtrace
terminated, these entries were not cleaning up. fasttrap.c does
garbage collection, but it wasnt clear how this happens when locks prevent
progress. Function dtrace_unregister() calls this function to
actually remove one of these fasttrap probes:

(void) taskq_dispatch(dtrace_taskq,
(task_func_t *)dtrace_enabling_reap, NULL, TQ_SLEEP);

What does that mean? I didnt know. Searched on google, none the wiser.
But it slowly dawned on me.

Ever did a ps on Linux, and saw entries like this:

$ ps ax
1 ? Ss 0:01 /sbin/init
2 ? S 0:00 [kthreadd]
3 ? S 0:02 [ksoftirqd/0]
6 ? S 0:00 [migration/0]
7 ? S 0:00 [watchdog/0]

Those processes in square brackets are kernel processes. This is what
taskq_create() is doing - creating a kernel process. This is a tiny
part of dtrace, but a very VERY important part! To avoid timers from
deadlocking with user processes due to mutex contention, we need the
timers to fire from a process which cannot possibly be running dtrace.
So, taskq_create() creates a kernel process, and when the kernel
cannot free a probe (because it looks like it is in use), a timer is fired
to retry the cancellation of the probe.

So, I now needed to implement taskq_create() on Linux. A quick google search
and I found what I wanted - "workqueue"s. This is the mechanism to create
a kernel process and asynchronously handle the callbacks. A quick
piece of coding, and it was looking good.

Continued in part 4....

Post created by CRiSP v11.0.10a-b6436

No comments:

Post a Comment