Saturday, 28 July 2012

Driver unloading

Sometimes, assumptions hurt. I know how Linux works: you can load
a driver (insmod/modprobe), and remove the driver (rmmod).

Pretty simple. "lsmod" can show drivers loaded (usually a lot), and
many are unused.

It makes sense you cannot unload a device driver if its in use, especially
if the device driver implements a filesystem.

Sometimes, the physical world is cruel. No amount of clever coding
in the kernel can prevent you physically/forcibly removing a floppy
disk or memory card. And the modern kernel can handle such horrible
brute force actions.

Lets switch to dtrace. We dont want to unload dtrace whilst its in use.
All the code is there (thank you Sun). The Linux port makes various
checks before allowing the driver to be unloaded.

[Most people dont care about this, but as a developer, I want to load
and unload the driver frequently, without having to reboot the kernel].

So, there are two types of "in-use" actions in dtrace: you are running
dtrace waiting for probes, and, you have done a PID/USDT provider

In the first case, I can make sure I am not running dtrace when reloading.

In the second, well, things can go awry. If you use:

$ dtrace -c cmd -n ....

Then theres two parts of dtrace which are active: the dtrace process
itself, and the cmd or process being traced. (Remember, user level
probes will trap to the kernel).

If we somehow kill -9 the dtrace, then dtrace will leave the probes
in tact until the process exits. If the process hits the probe, then
the probes will be redundant.

In reality, we can unload dtrace whilst probes are active in a user
process - it will terminate with SIGTRAP when the first probe is hit.

What I *didnt realise* (because I am definitely stupid), is that the
module unload code in a driver is a "void" function. It cannot prevent
itself being unloaded. Once the kernel wants to unload you, it will
happen. And if you dont clean up properly, your kernel is likely
to have a problem (GPF or panic).

Dtrace will crash if the device driver is open/in-use, because although
it tries to prevent unloading of the driver, nobody is listening. Duh !

Ok, so we can probably just let dtrace unload and stop worrying.

Or we could prevent unloading whilst active probes exist. After some
investigation, the kernel function try_module_get() is the function
to implement the drivers in-use count (as seen, by lsmod). Interestingly,
it is rarely called. It is *not* called simply because you opened the
device, e.g.

$ sleep 100

Its typically incremented for executables coming from the filesystem. I dont
think its called because a file is open. (Maybe we can panic the system
if we hold on to devices in the system which are unloaded?)

(It might be possible to modify the module reference count on open + close,
but this is almost certainly impossible to get right; consider what happens
on a fork or dup system call - file descriptors can be cloned, but the
underlying driver will never know that).

[And, why do I care? Because as I play with the PID provider and add
new probes, I keep crashing the kernel if dtrace is running or hung. Normal
users shouldnt care about reloading dtrace].

Post created by CRiSP v11.0.10a-b6436

No comments:

Post a Comment