6 Monitoring and logging #
Obtaining and maintaining an overview over the status and health of a cluster's compute nodes helps to ensure a smooth operation. This chapter describes tools that give an administrator an overview of the current cluster status, collect system logs, and gather information on certain system failure conditions.
6.1 ConMan — the console manager #
ConMan is a serial console management program designed to support many console devices and simultaneous users. It supports:
local serial devices
remote terminal servers (via the telnet protocol)
IPMI Serial-Over-LAN (via FreeIPMI)
Unix domain sockets
external processes (for example, using
expect
scripts fortelnet
,ssh
, oripmi-sol
connections)
ConMan can be used for monitoring, logging, and optionally timestamping console device output.
To install ConMan, run zypper in conman
.
conmand
sends unencrypted data
The daemon conmand
sends
unencrypted data over the
network and its connections are not authenticated. Therefore, it should
be used locally only, listening to the port
localhost
. However, the IPMI console does offer
encryption. This makes conman
a good tool for
monitoring many such consoles.
ConMan provides expect-scripts in the
directory /usr/lib/conman/exec
.
Input to conman
is not echoed in interactive mode.
This can be changed by entering the escape sequence
&E
.
When pressing Enter in interactive mode, no line feed is generated. To generate a line feed, press Ctrl–L.
For more information about options, see the ConMan man page.
6.2 Monitoring HPC clusters with Prometheus and Grafana #
Monitor the performance of HPC clusters using Prometheus and Grafana.
Prometheus collects metrics from exporters running on cluster nodes and stores the data in a time series database. Grafana provides data visualization dashboards for the metrics collected by Prometheus. Preconfigured dashboards are available on the Grafana website.
The following Prometheus exporters are useful for High Performance Computing:
- Slurm exporter
Extracts job and job queue status metrics from the Slurm workload manager. Install this exporter on a node that has access to the Slurm command line interface.
- Node exporter
Extracts hardware and kernel performance metrics directly from each compute node. Install this exporter on every compute node you want to monitor.
It is recommended that the monitoring data only be accessible from within a trusted environment (for example, using a login node or VPN). It should not be accessible from the internet without additional security hardening measures for access restriction, access control, and encryption.
Grafana: https://grafana.com/docs/grafana/latest/getting-started/
Grafana dashboards: https://grafana.com/grafana/dashboards
Prometheus: https://prometheus.io/docs/introduction/overview/
Prometheus exporters: https://prometheus.io/docs/instrumenting/exporters/
Slurm exporter: https://github.com/vpenso/prometheus-slurm-exporter
Node exporter: https://github.com/prometheus/node_exporter
6.2.1 Installing Prometheus and Grafana #
Install Prometheus and Grafana on a management server, or on a separate monitoring node.
You have an installation source for Prometheus and Grafana:
The packages are available from SUSE Package Hub. To install SUSE Package Hub, see https://packagehub.suse.com/how-to-use/.
If you have a subscription for SUSE Manager, the packages are available from the SUSE Manager Client Tools repository.
In this procedure, replace MNTRNODE with the host name or IP address of the server where Prometheus and Grafana are installed.
Install the Prometheus and Grafana packages:
monitor#
zypper in golang-github-prometheus-prometheus grafanaEnable and start Prometheus:
monitor#
systemctl enable --now prometheusVerify that Prometheus works:
In a browser, navigate to
MNTRNODE:9090/config
, or:In a terminal, run the following command:
>
wget MNTRNODE:9090/config --output-document=-
Either of these methods should show the default contents of the
/etc/prometheus/prometheus.yml
file.Enable and start Grafana:
monitor#
systemctl enable --now grafana-serverLog in to the Grafana web server at
MNTRNODE:3000
.Use
admin
for both the user name and password, then change the password when prompted.Click
.Find Prometheus and click
.In the
field, enterhttp://localhost:9090
. The default settings for the other fields can remain unchanged.If Prometheus and Grafana are installed on different servers, replace
localhost
with the host name or IP address of the server where Prometheus is installed.Click
.
You can now configure Prometheus to collect metrics from the cluster, and add dashboards to Grafana to visualize those metrics.
6.2.2 Monitoring cluster workloads #
To monitor the status of the nodes and jobs in an HPC cluster, install the Prometheus Slurm exporter to collect workload data, then import a custom Slurm dashboard from the Grafana website to visualize the data. For more information about this dashboard, see https://grafana.com/grafana/dashboards/4323.
You must install the Slurm exporter on a node that has access to the Slurm command line interface. In the following procedure, the Slurm exporter will be installed on a management server.
Section 6.2.1, “Installing Prometheus and Grafana” is complete.
The Slurm workload manager is fully configured.
You have internet access and policies that allow you to download the dashboard from the Grafana website.
In this procedure, replace MGMTSERVER with the host name or IP address of the server where the Slurm exporter is installed, and replace MNTRNODE with the host name or IP address of the server where Grafana is installed.
Install the Slurm exporter:
management#
zypper in golang-github-vpenso-prometheus_slurm_exporterEnable and start the Slurm exporter:
management#
systemctl enable --now prometheus-slurm_exporterImportant: Slurm exporter fails when GPU monitoring is enabledIn Slurm 20.11, the Slurm exporter fails when GPU monitoring is enabled.
This feature is disabled by default. Do not enable it for this version of Slurm.
Verify that the Slurm exporter works:
In a browser, navigate to
MNGMTSERVER:8080/metrics
, or:In a terminal, run the following command:
>
wget MGMTSERVER:8080/metrics --output-document=-
Either of these methods should show output similar to the following:
# HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 1.9521e-05 go_gc_duration_seconds{quantile="0.25"} 4.5717e-05 go_gc_duration_seconds{quantile="0.5"} 7.8573e-05 ...
On the server where Prometheus is installed, edit the
scrape_configs
section of the/etc/prometheus/prometheus.yml
file to add a job for the Slurm exporter:- job_name: slurm-exporter scrape_interval: 30s scrape_timeout: 30s static_configs: - targets: ['MGMTSERVER:8080']
Set the
scrape_interval
andscrape_timeout
to30s
to avoid overloading the server.Restart the Prometheus service:
monitor#
systemctl restart prometheusLog in to the Grafana web server at
MNTRNODE:3000
.In the
field, enter the dashboard ID4323
, then click .From the Procedure 6.1, “Installing Prometheus and Grafana”, then click .
drop-down box, select the Prometheus data source added inReview the Slurm dashboard. The data might take some time to appear.
If you made any changes, click
when prompted, optionally describe your changes, then click .
The Slurm dashboard is now available from the
screen in Grafana.6.2.3 Monitoring compute node performance #
To monitor the performance and health of each compute node, install the Prometheus node exporter to collect performance data, then import a custom node dashboard from the Grafana website to visualize the data. For more information about this dashboard, see https://grafana.com/grafana/dashboards/405.
Section 6.2.1, “Installing Prometheus and Grafana” is complete.
You have internet access and policies that allow you to download the dashboard from the Grafana website.
To run commands on multiple nodes at once,
pdsh
must be installed on the system your shell is running on, and SSH key authentication must be configured for all of the nodes. For more information, see Section 3.2, “pdsh — parallel remote shell program”.
In this procedure, replace the example node names with the host names or IP addresses of the nodes, and replace MNTRNODE with the host name or IP address of the server where Grafana is installed.
Install the node exporter on each compute node. You can do this on multiple nodes at once by running the following command:
management#
pdsh -R ssh -u root -w "NODE1,NODE2" \ "zypper in -y golang-github-prometheus-node_exporter"Enable and start the node exporter. You can do this on multiple nodes at once by running the following command:
management#
pdsh -R ssh -u root -w "NODE1,NODE2" \ "systemctl enable --now prometheus-node_exporter"Verify that the node exporter works:
In a browser, navigate to
NODE1:9100/metrics
, or:In a terminal, run the following command:
>
wget NODE1:9100/metrics --output-document=-
Either of these methods should show output similar to the following:
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 2.3937e-05 go_gc_duration_seconds{quantile="0.25"} 3.5456e-05 go_gc_duration_seconds{quantile="0.5"} 8.1436e-05 ...
On the server where Prometheus is installed, edit the
scrape_configs
section of the/etc/prometheus/prometheus.yml
file to add a job for the node exporter:- job_name: node-exporter static_configs: - targets: ['NODE1:9100'] - targets: ['NODE2:9100']
Add a target for every node that has the node exporter installed.
Restart the Prometheus service:
monitor#
systemctl restart prometheusLog in to the Grafana web server at
MNTRNODE:3000
.In the
field, enter the dashboard ID405
, then click .From the Procedure 6.1, “Installing Prometheus and Grafana”, then click .
drop-down box, select the Prometheus data source added inReview the node dashboard. Click the
drop-down box to select the nodes you want to view. The data might take some time to appear.If you made any changes, click
when prompted. To keep the currently selected nodes next time you access the dashboard, activate . Optionally describe your changes, then click .
The node dashboard is now available from the
screen in Grafana.6.3 Ganglia — system monitoring #
Ganglia is a scalable, distributed monitoring system for high-performance computing systems, such as clusters and grids. It is based on a hierarchical design targeted at federations of clusters.
6.3.1 Using Ganglia #
To use Ganglia, install ganglia-gmetad
on the management server, then start the Ganglia meta-daemon:
rcgmead start
. To make sure the service is started
after a reboot, run: systemctl enable gmetad
. On
each cluster node which you want to monitor, install
ganglia-gmond, start the service rcgmond
start
and make sure it is enabled to start automatically
after a reboot: systemctl enable gmond
. To test
whether the gmond
daemon has
connected to the
meta-daemon, run gstat -a
and check that each node to
be monitored is present in the output.
6.3.2 Ganglia on Btrfs #
When using the Btrfs file system, the monitoring data will be lost after
a rollback of the service gmetad
.
To fix this issue, either install the package
ganglia-gmetad-skip-bcheck or create the file
/etc/ganglia/no_btrfs_check
.
6.3.3 Using the Ganglia Web interface #
Install ganglia-web on the management server.
Enable PHP in Apache2: a2enmod php7
.
Then start Apache2 on this machine: rcapache2
start
and make sure it is started automatically after a
reboot: systemctl enable apache2
. The Ganglia Web
interface is accessible from
http://MANAGEMENT_SERVER/ganglia
.
6.4 rasdaemon — utility to log RAS error tracings #
rasdaemon
is an RAS
(Reliability, Availability and Serviceability) logging tool. It records
memory errors using EDAC (Error Detection and Correction) tracing events.
EDAC drivers in the Linux kernel handle detection of ECC (Error Correction
Code) errors from memory controllers.
rasdaemon
can be used on large
memory systems to track, record, and localize memory errors and how they
evolve over time to detect hardware degradation. Furthermore, it can be used
to localize a faulty DIMM on the mainboard.
To check whether the EDAC drivers are loaded, run the following command:
#
ras-mc-ctl --status
The command should return ras-mc-ctl: drivers are
loaded
. If it indicates that the drivers are not loaded, EDAC
may not be supported on your board.
To start rasdaemon
, run
systemctl start rasdaemon.service
.
To start rasdaemon
automatically at boot time, run systemctl enable
rasdaemon.service
. The daemon logs information to
/var/log/messages
and to an internal database. A
summary of the stored errors can be obtained with the following command:
#
ras-mc-ctl --summary
The errors stored in the database can be viewed with:
#
ras-mc-ctl --errors
Optionally, you can load the DIMM labels silk-screened on the system
board to more easily identify the faulty DIMM. To do so, before starting
rasdaemon
, run:
#
systemctl start ras-mc-ctl start
For this to work, you need to set up a layout description for the board.
There are no descriptions supplied by default. To add a layout
description, create a file with an arbitrary name in the directory
/etc/ras/dimm_labels.d/
. The format is:
Vendor: MOTHERBOARD-VENDOR-NAME Model: MOTHERBOARD-MODEL-NAME LABEL: MC.TOP.MID.LOW