7 OProfile—system-wide profiler #
OProfile is a profiler for dynamic program analysis. It investigates the behavior of a running program and gathers information. This information can be viewed and gives hints for further optimization.
It is not necessary to recompile or use wrapper libraries to use OProfile. Not even a kernel patch is needed. When profiling an application, you can expect a small overhead, depending on the workload and sampling frequency.
7.1 Conceptual overview #
OProfile consists of a kernel driver and a daemon for collecting data. It uses the hardware performance counters provided on many processors. OProfile is capable of profiling all code including the kernel, kernel modules, kernel interrupt handlers, system shared libraries, and other applications.
Modern processors support profiling through the hardware by performance counters. Depending on the processor, there can be many counters and each of these can be programmed with an event to count. Each counter has a value which determines how often a sample is taken. The lower the value, the more often it is used.
During the post-processing step, all information is collected and instruction addresses are mapped to a function name.
7.2 Installation and requirements #
To use OProfile, install the oprofile
package.
OProfile works on AMD64/Intel 64, IBM Z, and POWER processors.
It is useful to install the *-debuginfo
package for
the respective application you want to profile. To profile
the kernel, you need the debuginfo
package as well.
7.3 Available OProfile utilities #
OProfile contains several utilities to handle the profiling process and its profiled data. The following list is a short summary of programs used in this chapter:
opannotate
Outputs annotated source or assembly listings mixed with profile information. An annotated report can be used in combination with
addr2line
to identify the source file and line where hotspots potentially exist. Seeman addr2line
for more information.operf
Profiler tool. After profiling stops, the data that is by default stored in
CUR_DIR/oprofile_data/samples/current
can be processed byopreport
, for example.ophelp
Lists available events with short descriptions.
opimport
Converts sample database files from a foreign binary format to the format specific to the platform.
opreport
Generates reports from profiled data.
7.4 Using OProfile #
With OProfile, you can profile both the kernel and applications. When
profiling the kernel, tell OProfile where to find the
vmlinuz*
file. Use the --vmlinux
option and point it to vmlinuz*
(generally available in
/boot
). If you need to profile kernel modules,
OProfile does this by default. However, make sure you read
https://oprofile.sourceforge.net/doc/kernel-profiling.html.
Most applications do not need to profile the kernel, therefore you
should use the --no-vmlinux
option to reduce the amount
of information.
7.4.1 Creating a report #
Starting the daemon, collecting data, stopping the daemon, and creating a report for the application COMMAND.
Open a shell and log in as
root
.Decide whether to profile with or without the Linux kernel:
Profile with the Linux kernel. Execute the following commands, because
operf
can only work with uncompressed images:>
cp /boot/vmlinux-`uname -r`.gz /tmp>
gunzip /tmp/vmlinux*.gz>
operf--vmlinux=/tmp/vmlinux* COMMANDProfile without the Linux kernel. Use the following command:
#
operf --no-vmlinux COMMANDTo see which functions call other functions in the output, additionally use the
--callgraph
option and set a maximum DEPTH:#
operf --no-vmlinux --callgraph DEPTH COMMAND
operf
writes its data toCUR_DIR/oprofile_data/samples/current
. After theoperf
command is finished (or is aborted by Ctrl–C), the data can be analyzed withoreport
:#
opreport Overflow stats not available CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt TIMER:0| samples| %| ------------------ 84877 98.3226 no-vmlinux ...
7.4.2 Getting event configurations #
The general procedure for event configuration is as follows:
Use first the events
CPU-CLK_UNHALTED
andINST_RETIRED
to find optimization opportunities.Use specific events to find bottlenecks. To list them, use the command
perf list
.
If you need to profile certain events, first check the available events
supported by your processor with the ophelp
command
(example output generated from Intel Core i5 CPU):
#
ophelp
oprofile: available events for CPU type "Intel Architectural Perfmon" See Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3B (Document 253669) Chapter 18 for architectural perfmon events This is a limited set of fallback events because oprofile does not know your CPU CPU_CLK_UNHALTED: (counter: all)) Clock cycles when not halted (min count: 6000) INST_RETIRED: (counter: all)) number of instructions retired (min count: 6000) LLC_MISSES: (counter: all)) Last level cache demand requests from this core that missed the LLC (min count: 6000) Unit masks (default 0x41) ---------- 0x41: No unit mask LLC_REFS: (counter: all)) Last level cache demand requests from this core (min count: 6000) Unit masks (default 0x4f) ---------- 0x4f: No unit mask BR_MISS_PRED_RETIRED: (counter: all)) number of mispredicted branches retired (precise) (min count: 500)
Specify the performance counter events with the option
--event
. Multiple options are possible. This option
needs an event name (from ophelp
) and a sample rate,
for example:
#
operf --events CPU_CLK_UNHALTED:100000
CPU_CLK_UNHALTED
Setting low sampling rates can seriously impair the system performance while high sample rates can disrupt the system to such a high degree that the data is useless. It is recommended to tune the performance metric for being monitored with and without OProfile and to experimentally determine the minimum sample rate that disrupts the performance the least.
7.5 Generating reports #
Before generating a report, make sure the operf
has
stopped. Unless you have provided an output directory with
--session-dir
, operf
has written its
data to CUR_DIR/oprofile_data/samples/current,
and the reporting tools opreport
and
opannotate
look there by default.
Calling opreport
without any options gives a complete
summary. With an executable as an argument, retrieve profile data only
from this executable. If you analyze applications written in C++, use the
--demangle smart
option.
The opannotate
generates output with annotations from
source code. Run it with the following options:
#
opannotate
--source \ --base-dirs=BASEDIR \ --search-dirs=SEARCHDIR \ --output-dir=annotated/ \ /lib/libfoo.so
The option --base-dir
contains a comma-separated list of
paths which is stripped from debug source files. These paths are
searched before looking in --search-dirs
. The
--search-dirs
option is also a comma-separated list of
directories to search for source files.
Because of compiler optimization, code can disappear and appear in a different place. Use the information in https://oprofile.sourceforge.net/doc/debug-info.html to fully understand its implications.
7.6 More information #
This chapter only provides a short overview. Refer to the following links for more information:
- https://oprofile.sourceforge.net
The project home page.
- Manpages
Details descriptions about the options of the different tools.
/usr/share/doc/packages/oprofile/oprofile.html
Contains the OProfile manual.
- https://developer.intel.com/
Architecture reference for Intel processors.
- https://www.ibm.com/support/knowledgecenter/ssw_aix_71/assembler/idalangref_arch_overview.html
Architecture reference for PowerPC64 processors in IBM iSeries, pSeries, and Blade server systems.