SAP Monitoring #
SUSE® Linux Enterprise Server for SAP Applications · SUSE Linux Enterprise High Availability
This article shows monitoring solutions for SAP administrators to efficiently monitor their SAP systems. The solutions that is described here works for SUSE® Linux Enterprise Server 12 SP3 to 15 SP2.
1 Conceptual overview #
Starting from the idea of improving user experience, SUSE engineering worked on how to monitor High Availability clusters that manage SAP workloads (SAP HANA and SAP NetWeaver).
The exporters shown here export their metrics which can be combined and integrated with Prometheus and Grafana to produce complex dashboards.
SUSE supports Prometheus and Grafana through SUSE Manager 4.0. Some Grafana dashboards for SAP HANA, SAP S/4HANA, SAP NetWeaver, and the cluster monitoring are provided by SUSE via Grafana community dashboards.
2 Terminology #
- Grafana
An interactive visualization and analytics Web application. It provides methods to visualize, explore, and query your metrics, and trigger alerts.
- Prometheus
A systems monitoring and alerting toolkit. It collects and evaluates metrics, displays the result, and triggers possible alerts when an observed condition is true. Metrics can be collected from different targets at given intervals.
3 Installing exporters #
Installation of an exporter follows always the same pattern. Execute the following steps:
Install the package. All package are available in SUSE Linux Enterprise Server for SAP Applications.
(Optional) Copy the configuration file to
/etc/EXPORTER_DIR
. The exact folder name is different for each exporter. This step depends on the exporter. If you skip this step, the default configuration is used.Start the daemon:
systemctl start NAME_OF_DAEMON
The above procedure is automatically done by each of the Salt formulas described in Article “SAP Automation”.
4 SAP HANA database exporter #
SAP HANA database exporter makes it possible to export SAP HANA database
metrics. The tool can export metrics from more than one
database and tenant if the multi_tenant
option is
enabled in the configuration file (enabled by default).
The labels sid
(system identifier),
insnr
(instance number),
database_name
(database name) and
host
(machine hostname) are exported for all the metrics.
4.1 Prerequisites #
A running and reachable SAP HANA database (single or multi-container). It is recommended to run the exporter on the same machine with the SAP HANA database. Ideally, each database should be monitored by one exporter.
One of the following SAP HANA connectors:
Certain metrics are collected in the SAP HANA monitoring views by the SAP Host agent. To have access to all the monitoring metrics, make sure that the SAP Host agent is installed and running.
4.2 Metrics file #
The exporter relies on a metrics file to determine what metrics to export. When the metrics file uses the JSON format, you can use the options listed below.
enabled
(boolean, optional). Determines whether the query is executed or not. If set tofalse
, the metrics for this query are not executed.hana_version_range
(list, optional). The SAP HANA database versions range where the query is available ([1.0.0]
by default). If the current database version is not within the specified range, the query is not executed. If the list has only one element, all versions beyond the specified value (including the defined one) are queried.metrics
(list) A list of metrics for the query.name
(string) A name for the exported metrics.description
(string) A description of the metrics.labels
(list) A list of labels used to split the records.value
(string) A name of the column for the exported value (must match with one of the columns of the query).unit
(string) Used unit for the exported value (`mb` for example).type
(enum{gauge}) Defines the type of the exported metric (gauge
is the only available option).
Below is an example of a metrics file:
{ "SELECT TOP 10 host, LPAD(port, 5) port, SUBSTRING(REPLACE_REGEXPR('\n' IN statement_string WITH ' ' OCCURRENCE ALL), 1,30) sql_string, statement_hash sql_hash, execution_count, total_execution_time + total_preparation_time total_elapsed_time FROM sys.m_sql_plan_cache ORDER BY total_elapsed_time, execution_count DESC;": { "enabled": true, "hana_version_range": ["1.0"] "metrics": [ { "name": "hanadb_sql_top_time_consumers", "description": "Top statements time consumers. Sum of the time consumed in all executions in Microseconds", "labels": ["HOST", "PORT", "SQL_STRING", "SQL_HASH"], "value": "TOTAL_ELAPSED_TIME", "unit": "mu", "type": "gauge" }, { "name": "hanadb_sql_top_time_consumers", "description": "Top statements time consumers. Number of total executions of the SQL Statement", "labels": ["HOST", "PORT", "SQL_STRING", "SQL_HASH"], "value": "EXECUTION_COUNT", "unit": "count", "type": "gauge" } ] } }
4.3 Installing the SAP HANA database exporter #
Use the zypper install prometheus-hanadb_exporter
command to install the exporter.
You can find the latest development repositories at SUSE's Open Build Service.
To install the exporter from the source code, make sure you have Git and Python 3 installed on your system. Run the following commands to install the exporter with the PyHDB SAP HANA connector:
git clone https://github.com/SUSE/hanadb_exporter cd hanadb_exporter # project root folder virtualenv virt source virt/bin/activate pip install pyhdb pip install .
4.4 Configuring the exporter #
Use the following example of the config.json
configuration file as a starting point.
{ "listen_address": "0.0.0.0", "exposition_port": 9668, "multi_tenant": true, "timeout": 30, "hana": { "host": "localhost", "port": 30013, "user": "SYSTEM", "password": "PASSWORD", "ssl": false, "ssl_validate_cert": false }, "logging": { "config_file": "./logging_config.ini", "log_file": "hanadb_exporter.log" } }
Below is a list of key configuration options.
listen_address
IP address of the Prometheus exporter (0.0.0.0 by default).exposition_port
Port through which the Prometheus exporter is accessible (9968 by default).multi_tenant
Export the metrics from other tenants. This requires a connection to the system database (port 30013).timeout
Timeout to connect to the database. The app fails if connection is not established within the specified time (even in daemon mode).hana.host
Address of the SAP HANA database.hana.port
Port through which the SAP HANA database is accessible.hana.userkey
Stored user key (see Section 4.5, “Using the stored user key”). Use this option if you do not want to store the password in the configuration file. Theuserkey
anduser/password
are mutually exclusive. If both are set,hana.userkey
takes priority.hana.user
Existing user with access right to the SAP HANA database.hana.password
Password of an existing user.hana.ssl
Enable SSL connection (false
by default). Only available for thedbapi
connector.hana.ssl_validate_cert
Enable SSL certification validation. This option is required by SAP HANA cloud. Only available for thedbapi
connector.hana.aws_secret_name
Secret name containing the username and password (see Section 4.6, “Using AWS Secrets Manager”. Use this option when SAP HANA database is stored on AWS.aws_secret_name
anduser/password
are mutually exclusive. If both are set,aws_secret_name
takes priority.logging.config_file
Python logging system configuration file (by default, WARN and ERROR level messages are sent to the syslog).logging.log_file
Logging file (/var/log/hanadb_exporter.log
by default)
The logging configuration file follows the Python standard logging system style.
Using the default configuration file, redirects the logs to the file assigned in the json configuration file and to the syslog (only logging level up to WARNING).
4.5 Using the stored user key #
Use this option to keep the database
secure (you can use
user/password
with the
SYSTEM
user for development, as it is faster to set
up). To use the userkey
option, the
dbapi
must be installed (normally stored in
/hana/shared/SID/hdbclient/hdbcli-N.N.N.tar.gz
and installable with pip3). The key is stored in the client itself. To use a different client, you must create a new stored user key for the user running Python. To do that, use the following command (note that the
hdbclient
is the same as the
dbapi
Python package):
/hana/shared/PRD/hdbclient/hdbuserstore set USER_KEY host:30013@SYSTEMDB hanadb_exporter pass
4.6 Using AWS Secrets Manager #
Use the AWS Secrets Manager to store the login credentials outside the configuration file when the SAP HANA database is stored on AWS EC2 instance.
Create a JSON secret file that contains two key-value pairs. The first pair contains the
username
key and the actual database user as the value. The second pair has thepassword
key and the actual password as the value. For example:{ """username": "DATABASE_USER", "password": "DATABASE_PASSWORD" }
Use the actual secret as the secret name, and pass it in the configuration file as a value for the
aws_secret_name
entry.Configure read-only access from EC2 IAM role to the secret by attaching a resource-based policy to the secret. For example:
{ "Version" : "2012-10-17", "Statement" : [ { "Effect": "Allow", "Principal": {"AWS": "arn:aws:iam::123456789012:role/EC2RoleToAccessSecrets"}, "Action": "secretsmanager:GetSecretValue", "Resource": "*", } ] }
Tips and recommendations:
Set
SYSTEMDB
as the default database for the exporter to get the tenants data.Do not use the stored user key created for the backup, because the key is created using the
sidadm
user.Instead of the
SYSTEM
user, use an account limited to accessing the monitoring tables only.In case you use a user account with the monitoring role, this user must exist in all the databases (
SYSTEMDB
and tenants).
4.7 Create a new user with the monitoring role #
Run the following commands to create a user with the monitoring roles (the commands must be executed in all the databases):
su - prdadm hdbsql -u SYSTEM -p pass -d SYSTEMDB #(PRD for the tenant in this example) CREATE USER HANADB_EXPORTER_USER PASSWORD MyExporterPassword NO FORCE_FIRST_PASSWORD_CHANGE; CREATE ROLE HANADB_EXPORTER_ROLE; GRANT MONITORING TO HANADB_EXPORTER_ROLE; GRANT HANADB_EXPORTER_ROLE TO HANADB_EXPORTER_USER;
4.8 Running the exporter #
Start the exporter with the hanadb_exporter -c config.json -m metrics.json
command.
If the config.json
configuration file is stored
in the /etc/hanadb_exporter
directory, the exporter can be started with the following command (note that the identifier matches with the config.json
file without extension):
hanadb_exporter --identifier config
4.9 Running as a service #
To run the hanadb_exporter as
systemd
service, install the exporter using the RPM
package as described in
Section 4.3, “Installing the SAP HANA database exporter”.
Next, create the configuration file as
/etc/hanadb_exporter/my-exporter.json
. You can use the example file above as a starting point (the example file is also available in the
/usr/etc/hanadb_exporter
directory).
You can use the example /usr/etc/hanadb_exporter/metrics.json
metrics file.
Adjust the default logging configuration file /usr/etc/hanadb_exporter/logging_config.ini
.
Start the exporter as a daemon. Because there are multiple hanadb_exporter instances running on one machine, you need to specify the name of the created configuration file, for example:
#
systemctl start prometheus-hanadb_exporter@my-exporter#
systemctl status prometheus-hanadb_exporter@my-exporter#
systemctl enable prometheus-hanadb_exporter@my-exporter
The exporter only exposes a port, without pushing the data to the Prometheus server. This means that the Prometheus server must be configured to periodically pull the data from the exporter. This is done by either adding the hanadb_exporter
job to the Prometheus server configuration, or by adding hanadb_exporter
to an existing job. For example:
- job_name: hana_db static_configs: - targets: - "HOSTNAME:PORT"
Use the following command to open the port for hanadb_exporter
.
#
firewall-cmd --zone=ZONE --add-port=PORT/tcp --permanent#
firewall-cmd --reload#
firewall-cmd --list-all --zone=ZONE
Replace ZONE with the actual interface used for the exporter, and PORT with the actual port number of hanadb_exporter
(default is 9968).
5 High Availability cluster exporter #
Enables monitoring of Pacemaker, Corosync, SBD, DRBD and other components of High Availability clusters. Collects metrics to easily monitor cluster status and health.
Link: https://github.com/ClusterLabs/ha_cluster_exporter.
Pacemaker cluster summary, nodes and resources stats
Corosync ring errors and quorum votes
Health status of SBD devices.
DRBD resources and connections status.
5.1 Installation #
To install the High Availability cluster exporter on SUSE Linux Enterprise, run the zypper
install prometheus-ha_cluster_exporter
command.
5.1.1 Enabling systemd
service #
The High Availability cluster exporter RPM packages comes with the
ha_cluster_exporter.service
systemd
service. To
enable and start it, use the following command:
systemctl --now enable prometheus-ha_cluster_exporter
5.2 Using High Availability cluster exporter #
You can run the exporter on any of the cluster nodes. Although it is not strictly required, it is advisable to run the exporter on all nodes.
The generated metrics are stored in the /metrics
path.
By default, the metrics can be accessed through the web interface on port
9664.
Although the exporter can run outside an High Availability cluster node, it cannot export any metric it is not able to collect. In this case, the exporter displays a warning message.
5.3 Configuring High Availability cluster exporter #
Before you proceed, make sure that the Prometheus server and the firewall are configured as described in Important: Configure the Prometheus server and Important: Configure firewall
The provided default configuration is designed specifically for the latest
version of
SUSE Linux Enterprise.
If necessary, any of the supported parameters can be modified either via
command-line flags or via a configuration file. Use the
ha_cluster_exporter --help
command for more details on
configuring parameters from the command line. Refer to the
ha_cluster_exporter.yaml
file for an example
configuration.
It is also possible to specify CLI flags via the
/etc/sysconfig/prometheus-ha_cluster_exporter
file.
- web.listen-address
Address to listen on for web interface and telemetry (default 9664).
- web.telemetry-path
Directory for storing metrics data (default
/metrics
).- web.config.file
Path to a the web configuration file (default
/etc/ha_cluster_exporter.web.yaml
).- log.level
Logging verbosity (default
info
).- version
Print version information.
- crm-mon-path
Path to the
crm_mon
executable (default/usr/sbin/crm_mon
).- cibadmin-path
Path to the
cibadmin
executable (default/usr/sbin/cibadmin
).- corosync-cfgtoolpath-path
Path to the
corosync-cfgtool
executable (default/usr/sbin/corosync-cfgtool
).- corosync-quorumtool-path
Path to the
corosync-quorumtool
executable (default/usr/sbin/corosync-quorumtool
).- sbd-path
Path to the
sbd
executable (default/usr/sbin/sbd
).- sbd-config-path
Path to the sbd configuration (default
/etc/sysconfig/sbd/
).- drbdsetup-path
Path to the
drbdsetup
executable (default/sbin/drbdsetup
).- drbdsplitbrain-path
Path to the
drbd splitbrain
hooks temporary files (default/var/run/drbd/splitbrain
).
5.4 TLS and basic authentication #
The High Availability cluster exporter supports TLS and basic authentication. To use TLS
or basic authentication, specify a configuration file using the
--web.config.file
parameter. The format of the file is
described in
https://github.com/prometheus/exporter-toolkit/blob/master/docs/web-configuration.md.
5.5 Metrics specification #
The following provides an overview of metrics generated by the High Availability cluster exporter.
Pacemaker. The Pacemaker subsystem collects an atomic snapshot of the High Availability cluster directly from the XML CIB of Pacemaker using crm_mon.
- ha_cluster_pacemaker_config_last_change
A Unix timestamp in seconds converted to a floating number, corresponding to the last time Pacemaker configuration changed.
- ha_cluster_pacemaker_fail_count
The fail count per node and resource ID.
- ha_cluster_pacemaker_location_constraints
Resource location constraints.
Labels #constraint
A unique string identifier of the constraintnode
The node the constraint applies toresource
The resource the constraint applies torole
The resource role the constraint applies to (if any)
- ha_cluster_pacemaker_migration_threshold
The number of migration threshold for each node and resource ID set by a Pacemaker cluster.
- ha_cluster_pacemaker_nodes
The status of each node in the cluster (one line for the status of every node). 1 indicates the node is in the status specified by the status label, 0 means it is not.
Labels #node
The name of the node (normally the hostname)status
Possible values:standby
,standby_onfail
,maintenance
,pending
,unclean
,shutdown
,expected_up
,dc
type
Possible values:member
,ping
,remote
- ha_cluster_pacemaker_node_attributes
This metric exposes in its labels raw, opaque, cluster metadata, called node attributes that often leveraged by Resource Agents. The value of each line is always 1.
Labels #node
The name of the node (normally the hostname)name
The name of the attributevalue
The value of the attribute
- ha_cluster_pacemaker_resources
The status of each resource in the cluster (one line for the status of each resource). 1 means the resource is in the status specified by the status label, 0 means that it is not.
Labels #agent
The name of the resource agent for the resourceclone
The name of the clone this resource belongs to (if any)group
The name of the group this resource belongs to, (if any)managed
Can be eithertrue
orfalse
node
The name of the node hosting the resourceresource
The unique resource namerole
Possible values:started
,stopped
,master
,slave
or one ofstarting
,stopping
,migrating
,promoting
,demoting
- ha_cluster_pacemaker_stonith_enabled
Whether or not stonith is enabled in the cluster. The value is either 1 or 0.
Corosync. The Corosync subsystem collects cluster quorum votes and ring status by parsing the output of corosync-quorumtool and corosync-cfgtool.
- ha_cluster_corosync_member_votes
The number of votes each member node has contributed to the current quorum.
Labels #node_id
The internal corosync identifier associated with the nodenode
The name of the node (normally the hostname)local
Indicates whether the node is local
- ha_cluster_corosync_quorate
Indicates whether the cluster is quorate. The value is either 1 or 0
- ha_cluster_corosync_quorum_votes
Cluster quorum votes (one line per type).
Labels #type
Possible values:expected_votes
,highest_expected
,total_votes
,quorum
.
- ha_cluster_corosync_ring_errors
The total number of faulty Corosync rings.
- ha_cluster_corosync_rings
The status of each Corosync ring. 1 is healthy, 0 is faulty.
Labels #ring_id
The internal Corosync ring identifier (normally corresponds to the first member node to join)node_id
The internal Corosync identifier of the local nodenumber
The ring numberaddress
the IP address locally linked to this ring
SBD.
The SBD subsystems collect statistics of each device by parsing its
configuration and the output of sbd --dump
.
- ha_cluster_sbd_devices
The SBD devices in the cluster (one line per device). The line is either absent or has the value of 1.
Labels #device
The path of the SBD devicestatus
Possible values:healthy
,unhealthy
- ha_cluster_sbd_timeouts
The SBD timeouts for each SBD device.
Labels #device
The path of the SBD devicetype
Possible values:watchdog
,msgwait
DRBD.
The DRDB subsystem runs a special drbdsetup
command to
get the current status of a DRDB cluster in the JSON format.
- ha_cluster_drbd_connections
The DRBD resource connections (one line per
resource
and perpeer_node_id
). The line is either absent or has the value of 1.Labels #resource
The resource the connection is forpeer_node_id
The id of the node this connection is forpeer_role
Possible values:primary
,secondary
unknown
volume
The volume numberpeer_disk_state
Possible valuesattaching
,failed
,negotiating
,inconsistent
,outdated
,unknown
,consistent
,uptodate
The total number of lines for this metric is the cardinality of
resource
multiplied by the cardinality ofpeer_node_id
.- ha_cluster_drbd_connections_sync
The DRBD disk connections in sync percentage. Values are floating numbers between 0 and 100.00.
Labels #resource
The resource the connection is forpeer_node_id
The id of the node this connection is forvolume
The volume number
- ha_cluster_drbd_connections_received
Volume of net data received from the partner via the network connection in KiB (one line per
resource
and perpeer_node_id
). The value is an integer greater than or equal to 0.Labels #resource
The resource the connection is forpeer_node_id
The id of the node this connection is forvolume
The volume number
- ha_cluster_drbd_connections_pending
Number of requests sent to the partner that have not yet been received (one line per
resource
and perpeer_node_id
). The value is an integer greater than or equal to 0.Labels #resource
The resource the connection is forpeer_node_id
The id of the node this connection is forvolume
The volume number
- ha_cluster_drbd_connections_unacked
Number of requests received by the partner but have not yet been acknowledged (one line per
resource
and perpeer_node_id
). The value is an integer greater than or equal to 0.Labels #resource
The resource the connection is forpeer_node_id
The id of the node this connection is forvolume
The volume number
- ha_cluster_drbd_resources
The DRBD resources (one line per name and per volume). The line is either absent or has the value of 1.
Labels #resource
The name of the resourcerole
Possible values:primary
,secondary
,unknown
volume
The volume numberdisk_state
Possible values:attaching
,failed
,negotiating
,inconsistent
,outdated
,outdated
,unknown
,consistent
,uptodate
The total number of lines for the metric is the cardinality of
name
multiplied by the cardinality ofvolume
.- ha_cluster_drbd_written
Amount of data in KiB written to the DRBD resource (one line per resource and per volume) The value is an integer greater than or equal to 0.
Labels #resource
The name of the resourcevolume
The volume number
- ha_cluster_drbd_read
Amount of data in KiB read from the DRBD resource (one line per resource and per volume) The value is an integer greater than or equal to 0.
Labels #resource
The name of the resourcevolume
The volume number
- ha_cluster_drbd_al_writes
Number of updates of the activity log area of the meta data (one line per resource and per volume). The value is an integer greater than or equal to 0.
Labels #resource
The name of the resourcevolume
The volume number
- ha_cluster_drbd_bm_writes
Number of updates of the bitmap area of the metadata (one line per resource and per volume). The value is an integer greater than or equal to 0.
Labels #resource
The name of the resourcevolume
The volume number
- ha_cluster_drbd_upper_pending
Number of block I/O requests forwarded to DRBD but not yet answered by DRBD (one line per resource and per volume). The value is an integer greater than or equal to 0.
Labels #resource
The name of the resourcevolume
The volume number
- ha_cluster_drbd_lower_pending
Number of open requests to the local I/O sub-system issued by DRBD (one line per resource and per volume). The value is an integer greater than or equal to 0.
Labels #resource
The name of the resourcevolume
The volume number
- ha_cluster_drbd_quorum
Quorum status of the DRBD resource according to the configured quorum policies (one line per resource and per volume). The value is 1 when quorate, or 0 when inquorate.
Labels #resource
The name of the resourcevolume
The volume number
- ha_cluster_drbd_split_brain
Signals when there is a split brain occurring per resource and volume. The line is either absent or has the value of 1. To make this metric work you must setup a DRBD custom split-brain handler.
Labels #resource
The name of the resourcevolume
The volume number
Scrape. The scrape subsystem is a generic namespace dedicated to internal instrumentation of the exporter itself.
- ha_cluster_scrape_duration_seconds
The duration of a collector scrape in seconds.
Labels #collector
collector names that correspond to the subsystem they collect metrics from
Example:
# TYPE ha_cluster_scrape_duration_seconds gauge ha_cluster_scrape_duration_seconds{collector="pacemaker"} 1.234
- ha_cluster_scrape_success
Indicates whether a collector succeeded. Collectors can fail gracefully, but that does not prevent them from running. If certain metrics cannot be scraped, the value of this metric is 0. In this case, the exporter logs for more details.
Labels #collector
collector names that correspond to the subsystem they collect metrics fromExample:
# TYPE ha_cluster_scrape_success gauge ha_cluster_scrape_success{collector="pacemaker"} 1
6 SAP host exporter #
Enables the monitoring of SAP NetWeaver, SAP HANA, and other applications. The
gathered metrics are the data that can be obtained by running the
sapcontrol
command.
Link: https://github.com/SUSE/sap_host_exporter.
SAP start service process list
SAP enqueue server metrics
SAP application server dispatcher metrics
SAP internal alerts
7 For more information #
Some
.md
files are included in RPM packages. They contain documentation from upstream sources. This can be helpful in isolated data centers without Internet connection.
8 Legal notice #
Copyright© 2006– 2024 SUSE LLC and contributors. All rights reserved.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”.
For SUSE trademarks, see https://www.suse.com/company/legal/. All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.
All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors, nor the translators shall be held liable for possible errors or the consequences thereof.