This is Part - 2 in series of articles we are writing on Dtrace. If you haven't already - do read Part 1.
Running Dtrace scripts
Before we delve deeper into guts of Dtrace, one of the questions which I often get asked is, How can I profile one off programs (i.e not a application running in background) with Dtrace.
There are basically couple of ways you can fire Dtrace.
-
One of them is obviously straight up
sudo dtrace -[fmp] [name]
which we kind of covered in last tutorial. Needless to say, you can't run dtrace scripts this way and hence is not very useful. -
Second one is, you can supply Dtrace script as argument. This is useful for profiling long running applications (such as unicorn, thin)
- Third one is, which I am very fond is:
What is a predicate?
In last part of tutorial we described what a probe description looks like. For a refresh:
probe description
/predicate/
{
action;
}
Predicate is a way to filter the number of probes fired based on certain critirea. For example, in last tutorial we wrote a Dtrace script to keep track of all system calls being made by all currently running applications.
Lets say, now we want to restrict those probes only to system calls that are being made by Ruby applications. We can do this by introducing a predicate in our Dtrace script.
syscall:::entry
/execname=="ruby"/
{
printf("%s(%d) called %s\n", execname, pid, probefunc);
}
Or you can use pid
predicate:
/*
* Print system calls made by pid 61465
*
*/
syscall:::entry
/pid == 61465/
{
printf("%s(%d) called %s\n", execname, pid, probefunc);
}
Now when I was learning Dtrace, I was mightly confused from where these things called execname
, pid
, probefunc
are coming from? Turns out, Dtrace automatically provides built-in variable
definitions for inspecting processes. They make Dtrace scripts quite powerful.
Built-in Variables
Here is a list of important built-in variables available in your DTrace scripts:
execname
- Name of current process.probeprov
- Name of the probe provider (Read previous post about probes)probemod
- Name of the module in which probe is defined.probefunc
- Name of the functionprobename
- Name of the probearg0..arg9
- The function you are trying to probe likely takes arguments. For example, if you are profiling a Ruby program, you may be interesting in knowing about what kind of objects we are creating. When probes fire arguments accepted by the function (for which probe fired) are automatiaclly available as variables arg0 to arg9 in your DTrace script.cpu
- CPU identifiertid
- Thread identifiercwd
- Name of current working directoryuid
- User IDgid
- Groupd IDpid
- Process IDtimestamp
- Current Timestamperrno
- Set by system call errors
Now, I believe previous scripts would have started making sense. Above list does not include all built-in variables though.
Lets do an exercise and try to print number of bytes being read (from network or file system) by process with pid 61465. The script will look like:
syscall::read:entry
/pid == 61465/
{
printf("%s read %d\n", execname, arg2);
}
If you crack open man 2 read
you will find out that read
system call's third argument contains
number of bytes being read.
Having just ability to print numbers as probes fire is useful but you can easily imagine it would be nice, if we can somehow aggregate the data. For example, how much of your program's memory is taken by string allocation or how much time is being spent in total in the function foo
. Ability to aggregate such data would be really useful.
Aggregates and Collections
Agrregates, Collections and variables are most important tools DTrace has to offer for data gathering, analyzing and printing. We will start with different types of variables available to us.
Variable types
D scripting language has 3 types of variables which you can use to keep track of things.
Block local
You can define a block local variable with :
syscall::read:entry
{
this->currentTime = timestamp;
}
Scope of a block local variable is limited to the block in which it is defined.
Global scalar variables
You can define a global variable like:
BEGIN {
startTime = timestamp;
}
Variables defined like this are global and they are available in all the probe descriptions. Also, if you haven't
guessed it - BEGIN
and END
[2] are two special probes that fire at the beginning of a DTrace script and at the end of it. They can be very
useful for initializing things or printing summary results at the end of execution.
Thread local variables
If you are trying to profile a multi-threaded application you will want to keep DTrace variables separate for different threads. You can do that with thread local variables:
syscall::read:entry
{
self->currentTime = timestamp;
}
Variables declared like this won't be shared between threads and hence can be used reliably.
Associative Arrays
If you are coming from Ruby, think of associative array as a Hash where that keys can be a single value or a tuple of values. In either case once declared type of key shouldn't change. For example, following script will calcuate time spent by each process in a system call.
syscall:::entry
{
callTime[execname, probefunc] = timestamp;
}
syscall:::return {
this->timeElapsed = timestamp - callTime[execname,probefun];
}
You can learn more about these different kind of variables at[1].
Aggregates
Armed with knowledge of variables and collections, we can finally dip our toes in aggregates. Aggregates is a way of gathering data from individual probes without you the programmer doing actual work of summing things up,calculating the median, frequency of occurence etc.
For example, lets write a DTrace script that prints number of top 20 objects Rails creates when it boots up. You are obviously going to need Ruby 2.0 for following script to work.
/* rails_boot.d
* object-create is the name of the probe that fires
* when any new ruby object is created.
* Corresponding function called is - rb_obj_alloc
*/
ruby*:::object-create
{
/* If you look up code of rb_obj_alloc, first argument to
* function contains class which is being instantiated
*/
@objects[copyinstr(arg0)] = count();
}
END {
/* Truncate the aggregate to top 20 results */
trunc(@objects,20);
/* Print the aggregate */
printa(@objects);
}
The output should look something like:
Mime::Type 22
Psych::Nodes::Scalar 27
Bundler::LazySpecification 39
Set 54
Sass::Script::Functions::Signature 55
Rails::Paths::Path 71
Class 73
Pathname 74
Bundler::DepProxy 77
Rails::Initializable::Collection 93
Regexp 95
Rails::Initializable::Initializer 143
Range 332
Gem::Specification 410
Hash 579
Gem::Dependency 1660
Gem::Version 2416
Gem::Requirement 3061
String 4723
Array 4948
Keep in mind, running above script isn't as simple as running sudo dtrace -qs rails_boot.d -c 'rails s'
. Depending on whether
you are using RVM or rbenv all those shell scripts shims create a problem when running dtrace scripts. What I ended up doing is,
created a file called load_env.rb
that had just one line require_relative config/environment
and then ran the Dtrace script with:
Now there are some new keywords in above script and we will to get them in a bit, but first let us talk about aggregates. DTrace allows you to store result of aggregate functions in objects called aggregates, they are defined like:
@name[ keys ] = aggfunc ( args );
Aggregate object name starts with @
and behave very similar to associative arrays and hence can use tuples as keys. Using aggregate functions
frees you up from keeping counter yourself. In above script we are using count
function which keeps track of number of times that particular probe fired.
There are other aggergate functions available as well:
-
count
- The number of times called. -
sum
- The total value of the specified expressions. -
avg
- The arithmetic average of the specified expressions. -
min
- The smallest value among the specified expressions. -
max
- The largest value among the specified expressions. -
lquantize
- A linear frequency distribution, sized by the specified range, of the values of the specified expressions. Increments the value in the highest bucket that is less than the specified expression. -
quantize
- A power-of-two frequency distribution of the values of the specified expressions. Increments the value in the highest power-of-two bucket that is less than the specified expression.
You can find out more about aggregates from [3].
That about sums up content for part 2 of DTrace tutorial. In the next part we are going to cover in built functions, printing and profiling of real world Ruby applications. You should also follow us on twitter @codemancershq if you don't want to miss out Part 3.