16 Tracing Tools #
SUSE Linux Enterprise Server comes with several tools that help you obtain useful information about your system. You can use the information for various purposes, for example, to debug and find problems in your program, to discover places causing performance drops, or to trace a running process to find out what system resources it uses. Most of the tools are part of the installation media. In some cases, they need to be installed from the SUSE Software Development Kit, which is a separate download.
While a running process is being monitored for system or library calls, the performance of the process is heavily reduced. You are advised to use tracing tools only for the time you need to collect the data.
16.1 Tracing System Calls with strace #
The strace
command traces system calls of a process
and signals received by the process. strace
can either
run a new command and trace its system calls, or you can attach
strace
to an already running command. Each line of the
command's output contains the system call name, followed by its arguments
in parentheses and its return value.
To run a new command and start tracing its system calls, enter the
command to be monitored as you normally do, and add
strace
at the beginning of the command line:
tux >
strace ls
execve("/bin/ls", ["ls"], [/* 52 vars */]) = 0
brk(0) = 0x618000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) \
= 0x7f9848667000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) \
= 0x7f9848666000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT \
(No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=200411, ...}) = 0
mmap(NULL, 200411, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9848635000
close(3) = 0
open("/lib64/librt.so.1", O_RDONLY) = 3
[...]
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) \
= 0x7fd780f79000
write(1, "Desktop\nDocuments\nbin\ninst-sys\n", 31Desktop
Documents
bin
inst-sys
) = 31
close(1) = 0
munmap(0x7fd780f79000, 4096) = 0
close(2) = 0
exit_group(0) = ?
To attach strace
to an already running process, you
need to specify the -p
with the process ID
(PID
) of the process that you want to monitor:
tux >
strace -p `pidof cron`
Process 1261 attached
restart_syscall(<... resuming interrupted call ...>) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
select(5, [4], NULL, NULL, {0, 0}) = 0 (Timeout)
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 5
connect(5, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = 0
sendto(5, "\2\0\0\0\0\0\0\0\5\0\0\0root\0", 17, MSG_NOSIGNAL, NULL, 0) = 17
poll([{fd=5, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=5, revents=POLLIN|POLLHUP}])
read(5, "\2\0\0\0\1\0\0\0\5\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\6\0\0\0"..., 36) = 36
read(5, "root\0x\0root\0/root\0/bin/bash\0", 28) = 28
close(5) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {0x7f772b9ea890, [], SA_RESTORER|SA_RESTART, 0x7f772adf7880}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({60, 0}, 0x7fff87d8c580) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
select(5, [4], NULL, NULL, {0, 0}) = 0 (Timeout)
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 5
connect(5, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = 0
sendto(5, "\2\0\0\0\0\0\0\0\5\0\0\0root\0", 17, MSG_NOSIGNAL, NULL, 0) = 17
poll([{fd=5, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=5, revents=POLLIN|POLLHUP}])
read(5, "\2\0\0\0\1\0\0\0\5\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\6\0\0\0"..., 36) = 36
read(5, "root\0x\0root\0/root\0/bin/bash\0", 28) = 28
close(5)
[...]
The -e
option understands several sub-options and
arguments. For example, to trace all attempts to open or write to a
particular file, use the following:
tux >
strace -e trace=open,write ls ~
open("/etc/ld.so.cache", O_RDONLY) = 3
open("/lib64/librt.so.1", O_RDONLY) = 3
open("/lib64/libselinux.so.1", O_RDONLY) = 3
open("/lib64/libacl.so.1", O_RDONLY) = 3
open("/lib64/libc.so.6", O_RDONLY) = 3
open("/lib64/libpthread.so.0", O_RDONLY) = 3
[...]
open("/usr/lib/locale/cs_CZ.utf8/LC_CTYPE", O_RDONLY) = 3
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
write(1, "addressbook.db.bak\nbin\ncxoffice\n"..., 311) = 311
To trace only network related system calls, use -e
trace=network
:
tux >
strace -e trace=network -p 26520
Process 26520 attached - interrupt to quit
socket(PF_NETLINK, SOCK_RAW, 0) = 50
bind(50, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(50, {sa_family=AF_NETLINK, pid=26520, groups=00000000}, \
[12]) = 0
sendto(50, "\24\0\0\0\26\0\1\3~p\315K\0\0\0\0\0\0\0\0", 20, 0,
{sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
[...]
The -c
calculates the time the kernel spent on each
system call:
tux >
strace -c find /etc -name xorg.conf
/etc/X11/xorg.conf
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
32.38 0.000181 181 1 execve
22.00 0.000123 0 576 getdents64
19.50 0.000109 0 917 31 open
19.14 0.000107 0 888 close
4.11 0.000023 2 10 mprotect
0.00 0.000000 0 1 write
[...]
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 3 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 4 fadvise64
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.000559 3633 33 total
To trace all child processes of a process, use -f
:
tux >
strace -f rcapache2 status
execve("/usr/sbin/rcapache2", ["rcapache2", "status"], [/* 81 vars */]) = 0
brk(0) = 0x69e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) \
= 0x7f3bb553b000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) \
= 0x7f3bb553a000
[...]
[pid 4823] rt_sigprocmask(SIG_SETMASK, [], <unfinished ...>
[pid 4822] close(4 <unfinished ...>
[pid 4823] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid 4822] <... close resumed> ) = 0
[...]
[pid 4825] mprotect(0x7fc42cbbd000, 16384, PROT_READ) = 0
[pid 4825] mprotect(0x60a000, 4096, PROT_READ) = 0
[pid 4825] mprotect(0x7fc42cde4000, 4096, PROT_READ) = 0
[pid 4825] munmap(0x7fc42cda2000, 261953) = 0
[...]
[pid 4830] munmap(0x7fb1fff10000, 261953) = 0
[pid 4830] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
[pid 4830] open("/dev/tty", O_RDWR|O_NONBLOCK) = 3
[pid 4830] close(3)
[...]
read(255, "\n\n# Inform the caller not only v"..., 8192) = 73
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
exit_group(0)
If you need to analyze the output of strace
and the
output messages are too long to be inspected directly in the console
window, use -o
. In that case, unnecessary messages, such
as information about attaching and detaching processes, are suppressed.
You can also suppress these messages (normally printed on the standard
output) with -q
. To add time stamps at the beginning of each line
with a system call, use -t
:
tux >
strace -t -o strace_sleep.txt sleep 1; more strace_sleep.txt
08:44:06 execve("/bin/sleep", ["sleep", "1"], [/* 81 vars */]) = 0
08:44:06 brk(0) = 0x606000
08:44:06 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, \
-1, 0) = 0x7f8e78cc5000
[...]
08:44:06 close(3) = 0
08:44:06 nanosleep({1, 0}, NULL) = 0
08:44:07 close(1) = 0
08:44:07 close(2) = 0
08:44:07 exit_group(0) = ?
The behavior and output format of strace can be largely controlled. For more information, see the relevant manual page (man 1 strace).
16.2 Tracing Library Calls with ltrace #
ltrace
traces dynamic library calls of a process. It
is used in a similar way to strace
, and most of their
parameters have a very similar or identical meaning. By default,
ltrace
uses /etc/ltrace.conf
or
~/.ltrace.conf
configuration files. You can,
however, specify an alternative one with the -F
CONFIG_FILE
option.
In addition to library calls, ltrace
with the
-S
option can trace system calls as well:
tux >
ltrace -S -o ltrace_find.txt find /etc -name \
xorg.conf; more ltrace_find.txt
SYS_brk(NULL) = 0x00628000
SYS_mmap(0, 4096, 3, 34, 0xffffffff) = 0x7f1327ea1000
SYS_mmap(0, 4096, 3, 34, 0xffffffff) = 0x7f1327ea0000
[...]
fnmatch("xorg.conf", "xorg.conf", 0) = 0
free(0x0062db80) = <void>
__errno_location() = 0x7f1327e5d698
__ctype_get_mb_cur_max(0x7fff25227af0, 8192, 0x62e020, -1, 0) = 6
__ctype_get_mb_cur_max(0x7fff25227af0, 18, 0x7f1327e5d6f0, 0x7fff25227af0,
0x62e031) = 6
__fprintf_chk(0x7f1327821780, 1, 0x420cf7, 0x7fff25227af0, 0x62e031
<unfinished ...>
SYS_fstat(1, 0x7fff25227230) = 0
SYS_mmap(0, 4096, 3, 34, 0xffffffff) = 0x7f1327e72000
SYS_write(1, "/etc/X11/xorg.conf\n", 19) = 19
[...]
You can change the type of traced events with the -e
option. The following example prints library calls related to
fnmatch
and strlen
functions:
tux >
ltrace -e fnmatch,strlen find /etc -name xorg.conf
[...]
fnmatch("xorg.conf", "xorg.conf", 0) = 0
strlen("Xresources") = 10
strlen("Xresources") = 10
strlen("Xresources") = 10
fnmatch("xorg.conf", "Xresources", 0) = 1
strlen("xorg.conf.install") = 17
[...]
To display only the symbols included in a specific library, use
-l /path/to/library
:
tux >
ltrace -l /lib64/librt.so.1 sleep 1
clock_gettime(1, 0x7fff4b5c34d0, 0, 0, 0) = 0
clock_gettime(1, 0x7fff4b5c34c0, 0xffffffffff600180, -1, 0) = 0
+++ exited (status 0) +++
You can make the output more readable by indenting each nested call by
the specified number of space with the -n
NUM_OF_SPACES
.
16.3 Debugging and Profiling with Valgrind #
Valgrind is a set of tools to debug and profile your programs so that they can run both faster and with less errors. Valgrind can detect problems related to memory management and threading, or can also serve as a framework for building new debugging tools. It is well known that this tool can incur high overhead, causing, for example, higher runtimes or changing the normal program behavior under concurrent workloads based on timing.
16.3.1 Installation #
Valgrind is not shipped with standard SUSE Linux Enterprise Server distribution. To install it on your system, you need to obtain SUSE Software Development Kit, and either install it and run
zypper
install
VALGRIND
or browse through the SUSE Software Development Kit directory tree, locate the Valgrind package and install it with
rpm
-i
valgrind-VERSION_ARCHITECTURE.rpm
The SDK is a module for SUSE Linux Enterprise and is available via an online channel from
the SUSE Customer Center. Alternatively download it from http://download.suse.com/. (Search for SUSE Linux Enterprise
Software Development Kit
). Refer to 第 14 章 “安装模块、扩展和第三方附加产品” for details.
16.3.2 Supported Architectures #
SUSE Linux Enterprise Server supports Valgrind on the following architectures:
AMD64/Intel 64
POWER
IBM Z
16.3.3 General Information #
The main advantage of Valgrind is that it works with existing compiled executables. You do not need to recompile or modify your programs to use it. Run Valgrind like this:
valgrind
VALGRIND_OPTIONS
your-prog YOUR-PROGRAM-OPTIONS
Valgrind consists of several tools, and each provides specific
functionality. Information in this section is general and valid
regardless of the used tool. The most important configuration option is
--tool
. This option tells Valgrind which tool to run.
If you omit this option, memcheck
is selected
by default. For example, to run find ~
-name
.bashrc with Valgrind's
memcheck
tools, enter the following in the
command line:
valgrind
--tool
=memcheck find ~ -name
.bashrc
A list of standard Valgrind tools with a brief description follows:
memcheck
Detects memory errors. It helps you tune your programs to behave correctly.
cachegrind
Profiles cache prediction. It helps you tune your programs to run faster.
callgrind
Works in a similar way to
cachegrind
but also gathers additional cache-profiling information.exp-drd
Detects thread errors. It helps you tune your multi-threaded programs to behave correctly.
helgrind
Another thread error detector. Similar to
exp-drd
but uses different techniques for problem analysis.massif
A heap profiler. Heap is an area of memory used for dynamic memory allocation. This tool helps you tune your program to use less memory.
lackey
An example tool showing instrumentation basics.
16.3.4 Default Options #
Valgrind can read options at start-up. There are three places which Valgrind checks:
The file
.valgrindrc
in the home directory of the user who runs Valgrind.The environment variable
$VALGRIND_OPTS
The file
.valgrindrc
in the current directory where Valgrind is run from.
These resources are parsed exactly in this order, while later given
options take precedence over earlier processed options. Options specific
to a particular Valgrind tool must be prefixed with the tool name and a
colon. For example, if you want cachegrind
to
always write profile data to the
/tmp/cachegrind_PID.log
,
add the following line to the .valgrindrc
file in
your home directory:
--cachegrind:cachegrind-out-file=/tmp/cachegrind_%p.log
16.3.5 How Valgrind Works #
Valgrind takes control of your executable before it starts. It reads debugging information from the executable and related shared libraries. The executable's code is redirected to the selected Valgrind tool, and the tool adds its own code to handle its debugging. Then the code is handed back to the Valgrind core and the execution continues.
For example, memcheck
adds its code, which
checks every memory access. As a consequence, the program runs much
slower than in the native execution environment.
Valgrind simulates every instruction of your program. Therefore, it not
only checks the code of your program, but also all related libraries
(including the C library), libraries used for graphical environment, and
so on. If you try to detect errors with Valgrind, it also detects errors
in associated libraries (like C, X11, or GTK libraries). Because you
probably do not need these errors, Valgrind can selectively, suppress
these error messages to suppression files. The
--gen-suppressions=yes
tells Valgrind to report these
suppressions which you can copy to a file.
You should supply a real executable (machine code) as a Valgrind
argument. If your application is run, for example, from a shell or Perl
script, you will by mistake get error reports related to
/bin/sh
(or /usr/bin/perl
). In
such cases, you can use
--trace-children=yes
to work
around this issue. However, using the executable itself will avoid any
confusion over this issue.
16.3.6 Messages #
During its runtime, Valgrind reports messages with detailed errors and important events. The following example explains the messages:
tux >
valgrind --tool=memcheck find ~ -name .bashrc
[...]
==6558== Conditional jump or move depends on uninitialised value(s)
==6558== at 0x400AE79: _dl_relocate_object (in /lib64/ld-2.11.1.so)
==6558== by 0x4003868: dl_main (in /lib64/ld-2.11.1.so)
[...]
==6558== Conditional jump or move depends on uninitialised value(s)
==6558== at 0x400AE82: _dl_relocate_object (in /lib64/ld-2.11.1.so)
==6558== by 0x4003868: dl_main (in /lib64/ld-2.11.1.so)
[...]
==6558== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
==6558== malloc/free: in use at exit: 2,228 bytes in 8 blocks.
==6558== malloc/free: 235 allocs, 227 frees, 489,675 bytes allocated.
==6558== For counts of detected errors, rerun with: -v
==6558== searching for pointers to 8 not-freed blocks.
==6558== checked 122,584 bytes.
==6558==
==6558== LEAK SUMMARY:
==6558== definitely lost: 0 bytes in 0 blocks.
==6558== possibly lost: 0 bytes in 0 blocks.
==6558== still reachable: 2,228 bytes in 8 blocks.
==6558== suppressed: 0 bytes in 0 blocks.
==6558== Rerun with --leak-check=full to see details of leaked memory.
The ==6558==
introduces Valgrind's messages and
contains the process ID number (PID). You can easily distinguish
Valgrind's messages from the output of the program itself, and decide
which messages belong to a particular process.
To make Valgrind's messages more detailed, use -v
or
even -v -v
.
You can make Valgrind send its messages to three different places:
By default, Valgrind sends its messages to the file descriptor 2, which is the standard error output. You can tell Valgrind to send its messages to any other file descriptor with the
--log-fd=FILE_DESCRIPTOR_NUMBER
option.The second and probably more useful way is to send Valgrind's messages to a file with
--log-file=FILENAME
. This option accepts several variables, for example,%p
gets replaced with the PID of the currently profiled process. This way you can send messages to different files based on their PID.%q{env_var}
is replaced with the value of the relatedenv_var
environment variable.The following example checks for possible memory errors during the Apache Web server restart, while following children processes and writing detailed Valgrind's messages to separate files distinguished by the current process PID:
tux >
valgrind -v --tool=memcheck --trace-children=yes \ --log-file=valgrind_pid_%p.log rcapache2 restartThis process created 52 log files in the testing system, and took 75 seconds instead of the usual 7 seconds needed to run
sudo systemctl restart apache2
without Valgrind, which is approximately 10 times more.tux >
ls -1 valgrind_pid_*log valgrind_pid_11780.log valgrind_pid_11782.log valgrind_pid_11783.log [...] valgrind_pid_11860.log valgrind_pid_11862.log valgrind_pid_11863.logYou may also prefer to send the Valgrind's messages over the network. You need to specify the
aa.bb.cc.dd
IP address andport_num
port number of the network socket with the--log-socket=AA.BB.CC.DD:PORT_NUM
option. If you omit the port number, 1500 will be used.It is useless to send Valgrind's messages to a network socket if no application is capable of receiving them on the remote machine. That is why
valgrind-listener
, a simple listener, is shipped together with Valgrind. It accepts connections on the specified port and copies everything it receives to the standard output.
16.3.7 Error Messages #
Valgrind remembers all error messages, and if it detects a new error, the error is compared against old error messages. This way Valgrind checks for duplicate error messages. In case of a duplicate error, it is recorded but no message is shown. This mechanism prevents you from being overwhelmed by millions of duplicate errors.
The -v
option will add a summary of all reports (sorted
by their total count) to the end of the Valgrind's execution output.
Moreover, Valgrind stops collecting errors if it detects either 1000
different errors, or 10 000 000 errors in total. If you want to suppress
this limit and wish to see all error messages, use
--error-limit=no
.
Some errors usually cause other ones. Therefore, fix errors in the same order as they appear and re-check the program continuously.
16.4 For More Information #
For a complete list of options related to the described tracing tools, see the corresponding man page (
man 1 strace
,man 1 ltrace
, andman 1 valgrind
).To describe advanced usage of Valgrind is beyond the scope of this document. It is very well documented, see Valgrind User Manual. These pages are indispensable if you need more advanced information on Valgrind or the usage and purpose of its standard tools.