Sunday 3 November 2013

libdwarf vs libdw - revisited

I wrote a while back about the problems with the ELF/Dwarf debug
info. It truly is a mess. ELF executables contain debug info, using
the DWARF file format. This is a good specification and handles how
the debugger can gain access to structures and types, location
information for placing breakpoints, and how to find variables on the
stack, especially when no frame pointer is available.

Unfortunately, theres two versions of this library - libdwarf and
libdw, and your head will swim trying to figure out the difference.

Over the weekend I upgraded to Ubuntu 13.10 (gcc-4.8), and dtrace
would no longer compile; specifically - the tail part for creating
the ctfconvert tool (mkctf.sh) would abort with an error - which is
really intractable.

I looked at the code causing the error, but the error is inside a
function (dwarf_loclist). With two families of dwarf libs - one,
has this function, the other doesnt.

I spent some time perusing the FreeBSD source code, the libdwarf
source code, and hacking on ctfconvert. All to no avail. Trying to use
the FreeBSD version of dwarf.c relied on the specific version of
the dwarf libs on FreeBSD, and despite the libdwarf vs libdw
confusion, and the fact that FreeBSD has its own (later?) version
of libdw, its so confusing.

From what I read on the web, libdw is the "new" version to replace
libdwarf. RedHat helped build this. Some distros have one or the other.
So, theres no "correct" thing to do - we currently use libdwarf, but
if we use libdw, we may find issues across other distros and
older and future versions. It truly is a mess. Added to which theres
near zero documentation, except in the comments.

This annoys me; users who download dtrace complain about the build error
(which is actually not really an error - I have modified the warning
to let people know that if ctfconvert/mkctf.sh wont run, they can
still use dtrace).

I realised that there is a way to create a portable libdwarf library
and solve this nicely. I created a very basic tool (tools/readelf.pl)
which parses the output of:


$ readelf --debug-info=dump ...


This dumps out the DWARF debug info in a way that exposes the records
in a dwarf section, and allows a simple parser to spit out all the
struct/union definitions. Instead of directly linking with libdwarf,
I could use (something like) this script as a pipe, to read the
output of readelf. This moves the whole portability issue to that of
readelf itself. Assuming readelf works, then dtrace would be immune
from libdwarf/libdw confusion.

I started looking at this - readelf.pl could grow into a more
concrete "list-the-types" tool, but thats dirty. I started looking
at the failing code, but I hit against the issue of having the
right source code for my Ubuntu libdwarf tool and/or understanding
the impact on prior versions of libdwarf (ie we may break dtrace for
older distros).

Eventually, I realised that the error is being caused because
gcc-4.8 is defaulting to DWARF-4 format (I have only briefly
looked to spot the differences). Since the error is caused by
one file (driver/ctf_struct.c) which is used to spit out all
the kernel struct/unions, for D scripts, its actually a much simpler
fix to force gcc to use DWARF-2 spec - the one we have all used
for many years. This fixes the build error on Ubuntu-13.10. Its a
suboptimal hack (its not really a hack, since nobody cares how
dtrace is compiled - certainly not most users, unless they might
use some future kernel debugger).

So, it is fixed. I still dislike libdwarf and the total confusion
and maybe one day I will finish off proper support for DWARF-4, or
cut my own libdward, or ... whatever.



Post created by CRiSP v11.0.21a-b6628


No comments:

Post a Comment