Thursday, 1 November 2012

DTrace and Xen...continued

Work on the Xen guest Linux kernel with dtrace. Progress is "middling".

As I recounted in a prior blog entry, there are a number of steps
to getting this to work, and it mostly works, but the quality is not
what I want in a usable driver...although I may release it sub-par.

Firstly, the syscall provider works. This took some work to get
the page tables to be writable - using the correct page table APIs,
which in turn map down to the Xen hypervisor calls. A Xen guest
is significantly different from a genuine CPU.

A Xen hypervisor call is like a system call, using a special
gateway to the hypervisor, and allows the hypervisor and guest
to make RPC like calls. Things like page table modifications, APIC
and priviledged instruction emulation go through this layer.

This in turn presents a couple of issues. Firstly, the fbt provider
is having difficulty doing "fbt:::" where we trap every function
in the kernel - the paravirt/hypercall functions must not be intercepted
since they are (possibly) needed to take trap calls. In theory
this is workaroundable by either excluding them from being probe
points (which would be a shame), or by detecting the recursion
and auto-disabling them (which would allow some hypercalls to be
monitored).

The other area of problem is multi-cpus. When we have multiple CPUs,
dtrace invokes the APIC inter-cpu calls to do RPC's to synchronise
the cpus. There is no APIC in a Xen guest, or rather there is a very
fake one. My DTrace code implements IPI calls in parallel to the kernels,
rather than relying on the kernel support, so that we dont
deadlock and so that we can trace the kernels use of these calls.

With IPI calls in a Xen guest, there is a lot of reliance on function
calls to handle the hypervisor communication. The IPI calls in a standard
kernel are the lowest level of operation of the kernel and CPU, implemented
using the NMI interrupt.

The standard smp_call_function() family of functions can be used
in the Xen dtrace, but it possibly exposes a race condition (I have
yet to torture test, but it seems to be easily exposed without
torture testing).

So, its a bit like porting to a totally different CPU architecture,
and I need to understand these pieces a little more.

Once the above issues are resolved, then I need to validate it
isnt broken on older/pre-Xen kernels.

But the end result is being able to use on the Cloud (eg
Amazon EC2), so its definitely an interesting project.

Post created by CRiSP v11.0.12a-b6455


No comments:

Post a Comment