Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / SUSE Linux Enterprise Server Documentation / System Analysis and Tuning Guide / Kernel monitoring / Hardware-based performance monitoring with Perf
Applies to SUSE Linux Enterprise Server 15 SP6

6 Hardware-based performance monitoring with Perf

Perf is an interface to access the performance monitoring unit (PMU) of a processor and to record and display software events such as page faults. It supports system-wide, per-thread, and KVM virtualization guest monitoring.

You can store resulting information in a report. This report contains information about, for example, instruction pointers or what code a thread was executing.

Perf consists of two parts:

  • Code integrated into the Linux kernel that instructs the hardware.

  • The perf user space utility that allows you to use the kernel code and helps you analyze gathered data.

6.1 Hardware-based monitoring

Performance monitoring means collecting information related to how an application or system performs. This information can be obtained either through software-based means or from the CPU or chipset. Perf integrates both of these methods.

Many modern processors contain a performance monitoring unit (PMU). The design and functionality of a PMU is CPU-specific. For example, the number of registers, counters and features supported varies by CPU implementation.

Each PMU model consists of a set of registers: the performance monitor configuration (PMC) and the performance monitor data (PMD). Both can be read, but only PMCs are writable. These registers store configuration information and data.

6.2 Sampling and counting

Perf supports several profiling modes:

  • Counting.  Count the number of occurrences of an event.

  • Event-based sampling.  A less exact way of counting: A sample is recorded whenever a certain threshold number of events has occurred.

  • Time-based sampling.  A less exact way of counting: A sample is recorded in a defined frequency.

  • Instruction-based sampling (AMD64 only).  The processor follows instructions appearing in a given interval and samples which events they produce. This allows following up on individual instructions and seeing which of them is critical to performance.

6.3 Installing Perf

The Perf kernel code is already included with the default kernel. To be able to use the user space utility, install the package perf.

6.4 Perf subcommands

To gather the required information, the perf tool has several subcommands. This section gives an overview of the most often used commands.

To see help in the form of a man page for any of the subcommands, use either perf helpSUBCOMMAND or man perf-SUBCOMMAND.

perf stat

Start a program and create a statistical overview that is displayed after the program quits. perf stat is used to count events.

perf record

Start a program and create a report with performance counter information. The report is stored as perf.data in the current directory. perf record is used to sample events.

perf report

Display a report that was previously created with perf record.

perf annotate

Display a report file and an annotated version of the executed code. If debug symbols are installed, the source code is also displayed.

perf list

List event types that Perf can report with the current kernel and with your CPU. You can filter event types by category. For example, to see hardware events only, use perf list hw.

The man page for perf_event_open has short descriptions for the most important events. For example, to find a description of the event branch-misses, search for BRANCH_MISSES (note the spelling differences):

> man perf_event_open | grep -A5 BRANCH_MISSES

Sometimes, events may be ambiguous. The lowercase hardware event names are not the names of raw hardware events but instead the names of aliases created by Perf. These aliases map to differently named but similarly defined hardware events on each supported processor.

For example, the cpu-cycles event is mapped to the hardware event UNHALTED_CORE_CYCLES on Intel processors. On AMD processors, however, it is mapped to the hardware event CPU_CLK_UNHALTED.

Perf also allows measuring raw events specific to your hardware. To look up their descriptions, see the Architecture Software Developer's Manual of your CPU vendor. The relevant documents for AMD64/Intel 64 processors are linked to in Section 6.7, “More information”.

perf top

Display system activity as it happens.

perf trace

This command behaves similarly to strace. With this subcommand, you can see which system calls are executed by a particular thread or process and which signals it receives.

6.5 Counting particular types of event

To count the number of occurrences of an event, such as those displayed by perf list, use:

# perf stat -e EVENT -a

To count multiple types of events at once, list them separated by commas. For example, to count cpu-cycles and instructions, use:

# perf stat -e cpu-cycles,instructions -a

To stop the session, press CtrlC.

You can also count the number of occurrences of an event within a particular time:

# perf stat -e EVENT -a -- sleep TIME

Replace TIME by a value in seconds.

6.6 Recording events specific to particular commands

There are several ways to sample events specific to a particular command:

  • To create a report for a newly invoked command, use:

    # perf record COMMAND

    Then, use the started process normally. When you quit the process, the Perf session also stops.

  • To create a report for the entire system while a newly invoked command is running, use:

    # perf record -a COMMAND

    Then, use the started process normally. When you quit the process, the Perf session also stops.

  • To create a report for an already running process, use:

    # perf record -p PID

    Replace PID with a process ID. To stop the session, press CtrlC.

Now you can view the gathered data (perf.data) using:

> perf report

This opens a pseudo-graphical interface. To receive help, press H. To quit, press Q.

If you prefer a graphical interface, try the GTK+ interface of Perf:

> perf report --gtk

However, the GTK+ interface is limited in functionality.

6.7 More information

This chapter only provides a short overview. Refer to the following links for more information:

https://perf.wiki.kernel.org/index.php/Main_Page

The project home page. It also features a tutorial on using perf.

https://www.brendangregg.com/perf.html

Unofficial page with many one-line examples of how to use perf.

https://web.eece.maine.edu/~vweaver/projects/perf_events/

Unofficial page with several resources, primarily relating to the Linux kernel code of Perf and its API. This page includes, for example, a CPU compatibility table and a programming guide.

https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf

The Intel Architectures Software Developer's Manual, Volume 3B.

https://support.amd.com/TechDocs/24593.pdf

The AMD Architecture Programmer's Manual, Volume 2.

Chapter 7, OProfile—system-wide profiler

Consult this chapter for other performance optimizations.