Saturday, 31 December 2011

Happy New Year Dtrace...You ruined my Christmas

I have just spent the last few days tracking down a strange
issue. Having fixed up dtrace to work on Ubuntu 11.10/i386, I found
that reloading the driver would hang/crash the kernel.

This was related to the page-fault interrupt vector. If that was
disabled, then all was well.

Strange. I dont recall this error before....

Although most recent work has been on the 64b version of dtrace,
I had assumed the 32b was in sync and all-was-well. But not so.

Its an interesting trail....I thought the driver reload/reload/...
cycle was fixed. It works well on 64b kernel, but not on the 32b one.

After a lot of searching, and narrowing the problem down to the
page fault interrupt vector, I checked on my rock-solid Ubuntu 8.04
(2.6.28 kernel). And hit the same problem: namely a double (or even
single) driver reload would hang the system.

I spent a lot of time on the Ubuntu 11.10 kernel - the driver
would hang the kernel on the first load after bootup. I eventually
was tinkering with GRUB and turning off the splash screen, and got
to a point where the first load would work, the 2nd would hang.

Prior to this point - I had no way to debug the code. Any attempt
to leave the page fault vector modification in place would hang the
kernel .. or cause a panic in printk(). I eventually considered this
to be a problem where the segment registers were not setup properly.
(The kernel uses the segment registers to access kernel data and
per-cpu items, so, if these are incorrect on a page fault, you arent
going very far). I even cut/pasted the existing kernel assembler code
for page_fault, but had a lot of problems getting something repeatable.

Whilst investigating this, I had to do a lot of "mind-experiments": what
was the CPU up to? Why was it having a hard time?

Well, what I realised is a number of things:

On a SMP system, each CPU has its own IDT register - set to the same
location in memory. We might patch the interrupt vector table, but
there was no guarantee the other CPUs would see these changes atomically.
In addition CPU caching might cause the other cpus to see the old
values of these interrupt vectors, until enough time had passed to
force cache line flushing.

Bear in mind, we are patching vectors in a table, so, the CPU may not
know we had done this. For all we know, the CPU may have cached the
page_fault vector and may not notice our changes. Or, ditto for the other
cpus.

So, google to rescue us. After a short while, I found these two
links:


  • http://stackoverflow.com/questions/2497919/changing-the-interrupt-descriptor-table

  • http://www.codeproject.com/KB/system/soviet_direct_hooking.aspx



The second link hinted at my problem: if you randomly change the interrupt
vector table, then expect problems. The codeproject link didnt suggest
an explanation, but hinted that the way forward was to create a new IDT,
copy the old table to the temp, switch the CPUs away, make the updates, then
switch back.

The first link confirmed this. (Interestingly, the second link is for
info on a Windows kernel, but the first link echoed the same sentiment
on Linux).

After a lot of fine-tuning the code and cleaning up, it now works !

The trick is to switch the CPUs away whilst updating the vector
table, and switch back when the updates are done. Also, to do
the same during the driver teardown code, so we can load/unload
repeatedly.

On the way, I added a /proc/dtrace/idt driver so its easier to visually
see the raw interrupt descriptor table.

One interesting issue here is why the 64b driver didnt suffer the same
problems? It feels like we hit a CPU bug/errata in this area, and
the 64b CPU mode does not suffer this problem (or, the size of the
64b IDT vector entries "move" the problem around).

Now I just need to tidy up the code and release ... the last release
of 2011.

Happy new year.

Post created by CRiSP v10.0.21a-b6142


No comments:

Post a Comment