V 16

Tags dtrace [x] debug [x] Add tag Incoming Links Parrot Attachments Upload files Manage files Parrot dtrace

This page is relevant to you if you can access a system that supports dtrace (currently (Open)Solaris, Mac OS X Leopard, (Free?)BSD, and maybe someday Linux).

Dtrace is a systemic tool to explore a system. Systemic means that we can trace anything on the system that can be probed, not only one process as do dtrace or truss. Multicore and multiprocessor will soon be the norm and dtrace is the only tool that allows to figure out the complex interactions between processes. When used correctly, dtrace is very light so it is commonly used on systems in production to identify bottlenecks. You could dtrace when you develop without affecting what you are doing.

Dtrace can gather statistics both at the user level and at the kernel level but the kernel level is very dependent on what kind of kernel you run and to its version. Fortunatly, the parrot team works at the user level.

There are many articles and ressources on dtrace but few show the very process of investigation where one dtrace request leads directly to another. The exception being the great screencast of the Bryan Cantrill 78 minutes presentation at google techtalk :  http://www.youtube.com/watch?v=6chLw2aodYQ . Do yourself a favor and start by listening it. Bryan Cantrill is the dtrace architect. His talk is very critical of the common characterisations of software. He gives the best definition of the special nature of software that I ever heard. He talks of the difficulty of debugging of software addressed by dtrace, then he gives a live demonstration of dtrace. Many of the dtrace one-liners are Solaris and X Window specific so I can't reproduce them on my MacBook.

Interactive investigations is more useful but more difficult then to use canned scripts. So I will describe such investigations as I learn to do more and more sophisticated ones.

First one liner Dtrace is a tool like awk, so it should not be totally foreign to a Perl programmer. A dtrace script is a sequence of clauses of the form :

probes_spec /predicate/ { block }

In such a clause, first we specify the probes; next, a predicate that is dependent on the conditions we are interested in for the matching probe and a block to gather the statistics. There are three special probes that don't need much explanations : BEGIN, END, ERROR. ERROR happens when too much informations must be gathered on a short period and dtrace drops events to avoid buffers overflow :

sudo dtrace -n 'syscall:::entry /execname == "parrot"/ { @[probefunc] = count() }'

I must be root to run dtrace and -n means that the script is next on the command line. Once dtrace is started, I am running a few Parrot tests to feed events to dtrace. When the tests are finished, I do C and dtrace prints the gathered statistics.

Spécifying the probes to match

A probe or set of probes to match is specified by a provider name, a module name, a probe function name and the probename name. One name is separated from the next one by a colon. A missing name is a joker. A name with one or more stars or a missing name is a form of globbing. We are interested only by system calls that happens for the process parrot so the provider is syscall. I don't list the module name. Maybe there is not any for the syscall provider ayway. The probe name is the system call name for the syscall provider. A system name has an entry and a return, this corresponds to the prob names entry and return.

The predicate returns true for "interesting" events

The predicate is /execname == "parrot"/. This means we are interested only with the processes corresponding to the parrot executable. My tests run one parrot instance after another.

The block

Now, what do we do? we are counting system calls indexed by their names (we get the name with probefunc and we are doing that with the nameless aggregates (yea, you thought you knew that only Damian could get away with that sort of shit). Think an aggregate as a sort of hash, except that it can also get composite keys.

{ @[probefunc] = count() }

The resulting output

dtrace: description 'syscall:::entry ' matched 427 probes


... open 574 fcntl 686 mmap 2545 sigaltstack 2799 sigprocmask 2805 stat

I get all the system calls ordered by the number of calls. I skipped the start of the listing. And we see that we do quite many stat() which is quite an expensive system call. So the next obvious query is what are the pathes stat()ed and if these stat() are successful.

Copying a \0 terminated string with copyinstr

We have seen we got many stat() system calls. Each system call involves context switches. One to enter the kernel and one to return to user land and more if other processes are scheduled.

So it may be interesting to see if we don't do more syscalls than we need. Let's focus on this stat() system call and see how many time each file is stat()ed. arg0 means the first argument of the thing probed, here the first argument to stat() which is a filename. We need to copy explicitely the filename string with copyinstr().

sudo dtrace -n 'syscall::stat:entry /execname == "parrot"/ { @\[copyinstr(arg0)\] = count() } '

Using ustack()

We have seen what files are stat()ed. It would be nice to see the context(s) where stat()s are done for a given file. well, ustack()n lists the C callstack(s) that lead to dtrace. But, too bad, it gives us the address of the routines and not their names. The documentation says that it happens with static calls. I tried to do the same with ruby and it works fine.

TBD : instrumenting parrot Being able to see C routines call is nice but people designing languages on parrot want also to see the parrot routines call. We need to isntrument Parrot for that. Also, we would like to print the Parrot call stack. Well, if there is any, because I read that Parrot is stackless. dtrace is currently able to run C stacks and Java call stack. Getting other call stacks would apparently means modifying dtrace.