Thursday 29 July 2010

What the heck is CTF? A Dead-end ?

The CTF part of dtrace is a small library of functions which
are part of the dtrace package. To date, I have relatively ignored
it - a few tweaks and it compiles nicely on Linux.


Recently, I have been playing with SDT probes - specifically formulating
a plan to get static probes into the kernel. These static probes
act like high level macros, compared to FBT, which is to do with
planting probes on functions. When a FBT probe fires, you have
access to the raw arguments (arg0, arg1, ...), but you dont really
have access to the structures these arguments may represent.


Lets look at "io:::start". In the Solaris kernel, these probes are
placed in various places to indicate when a file system driver is about
to do I/O. This is a high level probe - you dont care what filesystem is
in effect (UFS, NFS, ZFS), and you dont have to compute which are the
relevant functions to probe - nice and sweet.


But when these probes fire, arg0, arg1 and arg2 are defined to be
pointers to structures representing the buffer, file and device info
of the underlying vnode.


But *how does this work* ?


Beats me !


What happens is that the SDT provider knows about these high level probes
and the structures to be passed to the user space dtrace application.
These structures (struct buf *, fileinfo_t, devinfo_t) are "created"
by grabbing fields from relevant internal structures. DTrace has a thing
called a "translator" which is used to map from internal representation
to the D style structure. This avoids problems with trying to get the
real structures visible into the D application. (One would need kernel
level knowledge to get the #include's correct to even make the structures
visible).


What dtrace does is scan /usr/lib/dtrace/*.d and preload various "include-files"
as your script runs, to make certain constants and structures visible to you.


But how and where does a fileinfo_t structure get created?


I *think* this is done via the CTF (Compact Type Framework) library. CTF is
a simple way to describe structures and members without the full complexity
of DWARF debugging. So, what Sun has done is made sure all libraries
in the system have a special ELF section (.SUNW_ctf) and this section is
read from the libraries (for user space apps, or the kernel for kernel
probes) to find out what structures exist.


Alas, we dont have this ELF section in the executables in Linux.
So we are going to have to be a bit more clever to get access to the
internal structures.


To illustrate what I mean, consider this:


$ cat io.d
#pragma D option quiet

BEGIN
{
printf("%10s %58s %2s\n", "DEVICE", "FILE", "RW");
}

io:::start
{

printf("%10s %58s %2s\n", args[0]->dev_statname,
args[2]->fi_pathname, args[0]->b_flags & B_READ ? "R" : "W");
}


Consider:


  1. Where does B_READ come from? (Answer: /usr/lib/dtrace/io.d)

  2. Where does "dev_statname" come from?

  3. How does dtrace know that args[0] is convertable to a structure containing dev_statname?



The answer to the last two questions, I believe, belongs to the
CTF scanner.


And that is where I am heading off to -- to see how we can do this on Linux.



Post created by CRiSP v10.0.2a-b5878


2 comments:

  1. I'm pretty sure they have a dwarf2ctf converter, although I can't vouch for it's correctness.

    Cheers!

    Colin

    ReplyDelete
  2. I've been studying the way args[n]->member works. I have been looking at the ctfmerge tool - but this seems to be more aligned with accessing FBT probes. (Solaris/Apple dtrace allows you to place a probe on any function and look at the function arguments, along with the appropriate types; I need to decide how to handle this for Linux - definitely useful even if we dont have 100% coverage).

    But at the moment, I am trying to understand how the args[n] stuff works - I am playing with translators, which are ok - but theres something in the driver returning a structure memory block. I need this to emulate the io:::start provider's access to the fileinfo_t, devinfo_t and buf_t fields.

    I'll send out a blog posting when I have something figured out.

    ReplyDelete