Wednesday 13 November 2013

strace strace ...

Dtrace...strace...ptrace....

Long before I ever started on dtrace, I used truss, on Solaris.
Brilliant tool. (My hobby/interest has always been tools
to help debug programs or systems). truss has some great
features, but it was very beholden to Solaris.

I wrote my own truss - I called it ptrace; this was in the days
before Linux was a successful and broad operating system. It did
what I would call "hacky" things to enhance what truss did. And
it worked on the various Unix flavors. By the time Linux became
successful and prominent, strace appeared. Strace for many years
was much reduced in functionality and reliability - on Linux - compared
to truss or ptrace (my tool).

I had run out of ideas for ptrace, and I note my last change
was back in 2007.

In 2010 or so I picked up the baton on dtrace.

strace has evolved and so has the Linux kernel; last I looked at the
strace source, it was disappointing - reflecting some deficiencies in
the Linux kernel (in terms of process control and debugging).

More recently I have had a chance to look at strace, and my ptrace, to
reassess the state of the art in user space tracing. (Dtrace, as
good as it is, is somewhat crude for doing what strace/ptrace and truss
do - dtrace doesnt make it easy to create a standalone application
that doesnt need full privs and has a good quality command line
parser; dtrace comes with a truss emulation, but its not refined
and its not good at decoding the arguments to syscalls...but I digress).

Looking at strace, it lacks a facility I was interested in, and to
which ptrace has: stack dumps of the syscalls. I found a package
on google called strace+ but it wont build, and I gave up trying
to figure out what was wrong, such was the brokenness of the build.

So, I re-evaluated my ptrace. Last I had touched it, in 2007, was
a while ago and it just didnt work - reliably. It didnt acknowledge
64 bit processors/processes or a mixed 64/32 bit world. And it didnt
compile anymore.

After a couple of days, I got it up and running again; I started adding
the best bits of strace and other enhancements, and now, its pretty good.
Its reliable (thanks to bug fixes but also the kernel ptrace(2)
enhancements make it much more resilient). On older Linux kernels,
kill -9 of strace or ptrace could hang or kill the processes being traced.
On modern Linux kernels, this horrific situation is resolved.

I have an arsenal of LD_PRELOAD bits - which are useful for
debugging or monitoring specific scenarios, but strace/ptrace/dtrace are
great for pure unobtrusive debugging.

I used the strace source to help fix/understand some of the issues in
my own code. And am now considering adding more functionality - much
more than truss/strace has. FYI, heres the help/usage for ptrace as
it currently stands (some features are broken - I need to fix them -
especially the i386 specific code).

I may or may not release ptrace as source - I dont necessarily have
an interest in maintaining it - as its potentially fast paced
for the situation at hand I am debugging. (ptrace gives
me the luxury of writing C code rather than D code, in user space
to do very specific things - similar to LD_PRELOAD, but in a way
that can rarely accidentally kill the target; and ptrace is more
portable than Dtrace to systems where you dont have root access to
debug scenarios).


ptrace: Trace process execution. (C) 1990-2014 PD Fox, Foxtrot Systems Ltd
Usage: trace [-delay nn] [-d nn] [-gethostid nnn] [-trap] [-gethostname name]
[-llib ...] [-o output] [-fchnt] [-p pid] [-s size]
[-v [!]syscall,...]
[-r [!]fd,...] [-w [!]fd,...] [-size nn] [-stack:nn] [-regs] [command]

-a Print argv on execv calls.
-flush Flush output as we go along.
-func List of functions to trace
-gethostid Intercept gethostid() system call and fake return value.
-gethostname Intercept gethostname() system call and fake return value.
-hex Dump ASCII strings in 1-byte hex
-hex2 Dump ASCII strings in 2-byte hex
-hex4 Dump ASCII strings in 4-byte hex
-name Sort by name.
-nest Allow for nested functions
-time Sort by syscall time.
-tee file Write output to specified file and stdout.
-trap Map SIGTRAP signal
-trace Trace with -l switch
-pc Show PC of system call

-c Display system call counts
-delay nn Sleep for nn msec before each syscall
-d nn Detach after nn calls to gethostid()
-e Dump out exec() functions
-f Follow child processes
-h Display strings in hex/asc (read()/write()).
-llib Preload shared library.
-m Intercept page faults.
-multiline When printing certain arguments, use
multilines to make pretty printing.
-n Print network addresses numerically.
-nosyscalls Don't print syscalls (monitor page faults only)
-o file Write output to specified file.
-p pid Trace specified process.
-ptr Show pointers for arguments.
-q Quiet mode -- dont print output.
-r [!]fd,... Dump read buffers for specified file descriptors.
-regs Dump registers.
-s Trace list of signals.
-size size Specify size of strings to print out.
-stack:nn Dump call stack (depth of nn).
-t Print timestamps. (msec accuracy)
-tt Print timestamps. (usec accuracy)
-v [!]syscall,...
Specify syscalls to [ignore]/trace.
-verbose Add extra detail for some args.
-w [!]fd,... Dump write buffers for specified file descriptors.
-warp YYYYMMDD-HH:MM:SS Warp clock system calls.

Advanced switches:

-nouse_process_vm Avoid Linux 3.4 dependency

Set PTRACE_OPTS to pass in command line arguments.

Version: b6


Post created by CRiSP v11.0.21a-b6648


No comments:

Post a Comment