Version 2 (modified by cotto, 13 years ago)

add annotated body of a profile

Introduction

Callgrind's output format is very general because it's targeted by a number of different tools. This page describes Callgrind's output format as it pertains to PIR-level profiling for Parrot.

Terms

  • Process under test - This is the process of the program being profiled, i.e. a PIR program of some sort.
  • Profile - This refers to the output file containing profiling information for the process under test.

Format Description

The format consists of a header and a body. The header consists of several key value pairs which describe the purpose of the profile. KCachegrind ignores lines it doesn't know about. In the example below, this feature is used to add comments to an example profile. See the bottom of this page for information about how this example was generated.

Example

Example C Code

The following C code was used to generate a (nearly) trivial profile:

#include <stdio.h>
#include <valgrind/callgrind.h>

void func1();
void func2();
void func3();

int main() {
    CALLGRIND_START_INSTRUMENTATION;
    func1();
    func2();
    func3();
    CALLGRIND_STOP_INSTRUMENTATION;
}

void func1() {
    int i;
    for (i = 0; i < 1000000; i++);
}

void func2() {
    int i;
    for (i = 0; i < 1000000; i++);
    func1();
}

void func3() {
    int i;
    for (i = 0; i < 1000000; i++);
    func2();
    func1();
}

The program was compiled and profiled as follows:

gcc callgrind_test.c -g -o callgrind_test
valgrind --tool=callgrind --compress-strings=no --compress-pos=no --instr-atstart=no ./callgrind_test

Annotated Header from the Profile of above C Code

The above valgrind invocation produced a profile in callgrind.out.PID, where PID is the process ID of the process under test. Below is the header of the profile, annotated with a description of the purpose of each line.

# version of the Callgrind format
version: 1

# what created this profile, informational only
creator: 3.4.1-Debian

# PID of the process under test
pid: 5751

# the full CLI invocation that started the process under test
cmd:  ./callgrind_test

# used when multiple dumps are produced, starts from 1 and increments
part: 1

# parameters used for the cache simulator; not relevant for function profiling
desc: I1 cache:
desc: D1 cache:
desc: L2 cache:

# not really sure
desc: Timerange: Basic block 0 - 7000029

# Why did the process under test stop?  (informational only)
desc: Trigger: Program termination

# (optional) what do positions mean?
# "line" - (default) positions are line numbers in a file
# "instr" - positions are offsets in a binary representing instructions
positions: line

# (required) event types logged by this file
# "Ir" - instruction read access
# "I1mr" - instruction L1 read cache miss
# "I2mr" - instruction L2 read cache miss
events: Ir

# (recommended) the total number of events covered by this profile
# This is intended to allow postprocessing tools to know the total cost in advance.
# "totals" appears at the end of the file and has the same value.
summary: 21000073

Annotated Body from the Profile of above C Code

# "ob" means object file, i.e. the ELF executable where the next set of costs occur
ob=/usr/src/parrot/parrot-svn-committable/callgrind_test
# source file responsible for the next set of costs
fl=/usr/src/parrot/parrot-svn-committable/callgrind_test.c
# name of the function responsible for the next set of costs is "func2"
fn=func2
# on line 21 of the current file, 6 cpu cycles passed
21 6
# on line 23, 6008 cpu cycles passed
23 6008
# on line 24 (before calling), 2 cpu cycles passed
24 2
# the function (in the same file) "func1" was called
cfn=func1
# 'func1' was called nonrecursively from "func2" 2 times during profiling.  Calling the function cost 16 cpu cycles.
calls=2 16
# on line 24, the function itself cost 6018 cpu cycles
24 6018
# on line 25, 4 cpu cycles passed 
25 4

# the next costs apply to the function "main"
fn=main
# on line 9 of the current file (callgrind_test.c), 3 cpu cycles passed
9 3
# on line 10, 1 cycle passed before...
10 1
# the function (in the same file) "func1" was called
cfn=func1
# the function was called 1 time; calling the function cost 16 cycles
calls=1 16
# the call happened on line 10 and cost 3009 cycles
10 3009
# on line 11, 1 cycle passed before...
11 1
# the function (in the same file) "func2" was called
cfn=func2
# the call happened once; calling the function cost 21 cycles
calls=1 21
# the call happened on line 11; the function cost 6019 cycles
11 6019
# on line 12, 1 cycle passed before...
12 1
# the function "func3" was called
cfn=func3
# it was called once and the call cost 27 cycles
calls=1 27
#on line 12 (the function call), 12039 cycles passed
12 12039

#now we're talking about "func1"
fn=func1
# on line 16 of this function, 12 cpu cycles passed over the course of the process
16 12
# on line 18, 12016 cycles passed
18 12016
# on line 19, 8 cycles passed
19 8
# now we're talking about "func3"
fn=func3
# on line 27, 3 cycles passed
27 3
# on line 29, 3004 cycles passed
29 3004
# on line 30, 1 cycle passed...
30 1
# "func2" was called
cfn=func2
# it was called once with the call costing 21 cycles
calls=1 21
# the function cost 6019 cycles
30 6019
# on line 31, 1 cycle passed...
31 1
# "func1" was called
cfn=func1
# it was called once and the call cost 16 cycles
calls=1 16
# the function was called on line 31 and cost 3009 cycles
31 3009
# etc
32 2

# 21073 cycles passed during the part of the process under test that was instrumented
totals: 21073