6 Hardware-based performance monitoring with Perf #
Perf is an interface to access the performance monitoring unit (PMU) of a processor and to record and display software events such as page faults. It supports system-wide, per-thread, and KVM virtualization guest monitoring.
You can store resulting information in a report. This report contains information about, for example, instruction pointers or what code a thread was executing.
Perf consists of two parts:
Code integrated into the Linux kernel that instructs the hardware.
The
perf
user space utility that allows you to use the kernel code and helps you analyze gathered data.
6.1 Hardware-based monitoring #
Performance monitoring means collecting information related to how an application or system performs. This information can be obtained either through software-based means or from the CPU or chipset. Perf integrates both of these methods.
Many modern processors contain a performance monitoring unit (PMU). The design and functionality of a PMU is CPU-specific. For example, the number of registers, counters and features supported varies by CPU implementation.
Each PMU model consists of a set of registers: the performance monitor configuration (PMC) and the performance monitor data (PMD). Both can be read, but only PMCs are writable. These registers store configuration information and data.
6.2 Sampling and counting #
Perf supports several profiling modes:
Counting. Count the number of occurrences of an event.
Event-based sampling. A less exact way of counting: A sample is recorded whenever a certain threshold number of events has occurred.
Time-based sampling. A less exact way of counting: A sample is recorded in a defined frequency.
Instruction-based sampling (AMD64 only). The processor follows instructions appearing in a given interval and samples which events they produce. This allows following up on individual instructions and seeing which of them is critical to performance.
6.3 Installing Perf #
The Perf kernel code is already included with the default kernel. To be able to use the user space utility, install the package perf.
6.4 Perf subcommands #
To gather the required information, the perf
tool has
several subcommands. This section gives an overview of the most often used
commands.
To see help in the form of a man page for any of the subcommands, use either
perf help
SUBCOMMAND
or
man perf-
SUBCOMMAND.
perf stat
Start a program and create a statistical overview that is displayed after the program quits.
perf stat
is used to count events.perf record
Start a program and create a report with performance counter information. The report is stored as
perf.data
in the current directory.perf record
is used to sample events.perf report
Display a report that was previously created with
perf record
.perf annotate
Display a report file and an annotated version of the executed code. If debug symbols are installed, the source code is also displayed.
perf list
List event types that Perf can report with the current kernel and with your CPU. You can filter event types by category. For example, to see hardware events only, use
perf list hw
.The man page for
perf_event_open
has short descriptions for the most important events. For example, to find a description of the eventbranch-misses
, search forBRANCH_MISSES
(note the spelling differences):>
man
perf_event_open |grep
-A5 BRANCH_MISSESSometimes, events may be ambiguous. The lowercase hardware event names are not the names of raw hardware events but instead the names of aliases created by Perf. These aliases map to differently named but similarly defined hardware events on each supported processor.
For example, the
cpu-cycles
event is mapped to the hardware eventUNHALTED_CORE_CYCLES
on Intel processors. On AMD processors, however, it is mapped to the hardware eventCPU_CLK_UNHALTED
.Perf also allows measuring raw events specific to your hardware. To look up their descriptions, see the Architecture Software Developer's Manual of your CPU vendor. The relevant documents for AMD64/Intel 64 processors are linked to in Section 6.7, “More information”.
perf top
Display system activity as it happens.
perf trace
This command behaves similarly to
strace
. With this subcommand, you can see which system calls are executed by a particular thread or process and which signals it receives.
6.5 Counting particular types of event #
To count the number of occurrences of an event, such as those displayed by
perf list
, use:
#
perf
stat -e EVENT -a
To count multiple types of events at once, list them separated by commas.
For example, to count cpu-cycles
and
instructions
, use:
#
perf
stat -e cpu-cycles,instructions -a
To stop the session, press Ctrl–C.
You can also count the number of occurrences of an event within a particular time:
#
perf
stat -e EVENT -a -- sleep TIME
Replace TIME by a value in seconds.
6.6 Recording events specific to particular commands #
There are several ways to sample events specific to a particular command:
To create a report for a newly invoked command, use:
#
perf
record COMMANDThen, use the started process normally. When you quit the process, the Perf session also stops.
To create a report for the entire system while a newly invoked command is running, use:
#
perf
record -a COMMANDThen, use the started process normally. When you quit the process, the Perf session also stops.
To create a report for an already running process, use:
#
perf
record -p PIDReplace PID with a process ID. To stop the session, press Ctrl–C.
Now you can view the gathered data (perf.data
)
using:
>
perf
report
This opens a pseudo-graphical interface. To receive help, press H. To quit, press Q.
If you prefer a graphical interface, try the GTK+ interface of Perf:
>
perf
report --gtk
However, the GTK+ interface is limited in functionality.
6.7 More information #
This chapter only provides a short overview. Refer to the following links for more information:
- https://perf.wiki.kernel.org/index.php/Main_Page
The project home page. It also features a tutorial on using
perf
.- https://www.brendangregg.com/perf.html
Unofficial page with many one-line examples of how to use
perf
.- https://web.eece.maine.edu/~vweaver/projects/perf_events/
Unofficial page with several resources, primarily relating to the Linux kernel code of Perf and its API. This page includes, for example, a CPU compatibility table and a programming guide.
- https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf
The Intel Architectures Software Developer's Manual, Volume 3B.
- https://support.amd.com/TechDocs/24593.pdf
The AMD Architecture Programmer's Manual, Volume 2.
- Chapter 7, OProfile—system-wide profiler
Consult this chapter for other performance optimizations.