适用于 SUSE Enterprise Storage 6

7 监控和警告

在 SUSE Enterprise Storage 6 中,DeepSea 不再在 Salt Master 上部署监控和警告堆栈。用户必须定义用于 Prometheus 和 Alertmanager 的 Prometheus 角色,以及用于 Grafana 的 Grafana 角色。当为多个节点分配 Prometheus 或 Grafana 角色后,即部署好了高度可用的设置。

  • Prometheus 是监控和警告工具箱。

  • Alertmanager 用于处理 Prometheus 服务器发送的警告。

  • Grafana 是虚拟化和警告软件。

  • prometheus-node_exporter 是所有 Salt Minion 上运行的服务。

DeepSea 会自动设置 Prometheus 配置和抓取目标(导出守护进程)。它还会部署默认警告列表,例如 health error10% OSDs downpgs inactive

7.1 Pillar 变量

Salt Pillar 是用于向 Minion 提供信息和配置值的键值存储区。它可用于所有 Minion,每个 Minion 的内容都不同。Salt Pillar 会预填充默认值,您可以通过以下两种不同的方式进行自定义:

  • /srv/pillar/ceph/stack/global.yml:可更改所有节点的 Pillar 值。

  • /srv/pillar/ceph/stack/CLUSTER_NAME/minions/HOST:可更改特定 Minion 的配置。

下面的 Pillar 变量默认可用于所有节点:

  monitoring:
  alertmanager:
    config: salt://path/to/config
    additional_flags: ''
  grafana:
    ssl_cert: False # self-signed certs are created by default
    ssl_key: False # self-signed certs are created by default
  prometheus:
    # pass additional configration to prometheus
    additional_flags: ''
    alert_relabel_config: []
    rule_files: []
    # per exporter config variables
    scrape_interval:
      ceph: 10
      node_exporter: 10
      prometheus: 10
      grafana: 10
    relabel_config:
      alertmanager: []
      ceph: []
      node_exporter: []
      prometheus: []
      grafana: []
    metric_relabel_config:
      ceph: []
      node_exporter: []
      prometheus: []
      grafana: []
    target_partition:
      ceph: '1/1'
      node_exporter: '1/1'
      prometheus: '1/1'
      grafana: '1/1'

7.2 Grafana

所有流量均通过 Grafana 加密。您可以提供自己的 SSL 证书,或者创建一个自我签名证书。

Grafana 使用下列变量:

  • ssl_cert

  • ssl_key

有关提供自己的 SSL 证书的详细信息,请参见第 22.9.1.2 节 “CA 签名的证书”;若要创建自己的证书,请参见第 22.9.1.1 节 “自我签名证书”

7.3 Prometheus

基于导出程序的配置,可通过 Pillar 传递。这些组映射到提供数据的导出程序。节点导出程序存在于所有节点上,Ceph 由 Ceph Manager 节点导出,Prometheus 和 Grafana 由相应的 Prometheus 和 Grafana 节点导出。

Prometheus 使用下列变量:

  • scrape_interval:更改抓取间隔,即抓取导出程序的频率。

  • target_partition:当部署了多个 Prometheus 实例时,分割抓取目标,让某些 Prometheus 实例仅抓取一部分导出程序实例。

  • relabel_config:在抓取目标前动态重写目标的标签集。每个抓取配置可以配置多个重新添加标签步骤。

  • metrics_relabel_config:在引入前的最后一步时应用于示例。

7.4 Alertmanager

Alertmanager 处理 Prometheus 服务器发送的警告。它负责重复信息删除、分组,并将它们路由到正确的接收器。它还负责禁止警告。通过命令行标志和用于定义禁止规则、通知路由及通知接收器的配置文件来配置 Alertmanager。

7.4.1 配置文件

每个部署的 Alertmanager 配置都是不同的。因此,DeepSea 不会提供任何相关的默认值。您需要提供自己的 alertmanager.yml 配置文件。默认情况下, alertmanager 包会安装 /etc/prometheus/alertmanager.yml 配置文件,该文件可作为示例配置使用。如果您更喜欢通过 DeepSea 来管理您的 Alertmanager 配置,请将下面的键添加到 Pillar 中,例如,添加到 /srv/pillar/ceph/stack/ceph/minions/YOUR_SALT_MASTER_MINION_ID.sls 文件中:

有关完整的 Alertmanager 配置文件示例,请参见附录 B “SUSE Enterprise Storage 6 的默认警告

monitoring:
 alertmanager_config:
   /path/to/your/alertmanager/config.yml

Alertmanager 配置文件采用 YAML 格式撰写。它遵循下面所述的模式。方括号中的参数是可选参数。对于非列表参数,将使用默认值。模式中将使用下列通用占位符:

DURATION

与正则表达式 [0-9]+(ms|[smhdwy]) 匹配的时长

LABELNAME

与正则表达式 [a-zA-Z_][a-zA-Z0-9_]* 匹配的字符串

LABELVALUE

采用 Unicode 字符的字符串。

FILEPATH

当前工作目录中的有效路径。

BOOLEAN

接受“true”或“false”值的布尔。

STRING

常规字符串。

SECRET

属于机密的常规字符串,例如密码。

TMPL_STRING

使模板在使用前显示为展开状态的字符串。

TMPL_SECRET

使模板在使用前显示为展开状态的机密字符串。

例 7.1︰ 全局配置

global: 配置中的参数在所有其他配置环境中均有效。它们还可用作其他配置段落的默认值。

global:
# the time after which an alert is declared resolved if it has not been updated
[ resolve_timeout: DURATION | default = 5m ]

# The default SMTP From header field.
[ smtp_from: TMPL_STRING ]
# The default SMTP smarthost used for sending emails, including port number.
# Port number usually is 25, or 587 for SMTP over TLS
# (sometimes referred to as STARTTLS).
# Example: smtp.example.org:587
[ smtp_smarthost: STRING ]
# The default host name to identify to the SMTP server.
[ smtp_hello: STRING | default = "localhost" ]
[ smtp_auth_username: STRING ]
# SMTP Auth using LOGIN and PLAIN.
[ smtp_auth_password: SECRET ]
# SMTP Auth using PLAIN.
[ smtp_auth_identity: STRING ]
# SMTP Auth using CRAM-MD5.
[ smtp_auth_secret: SECRET ]
# The default SMTP TLS requirement.
[ smtp_require_tls: BOOL | default = true ]

# The API URL to use for Slack notifications.
[ slack_api_url: STRING ]
[ victorops_api_key: STRING ]
[ victorops_api_url: STRING | default = "https://victorops.example.com/integrations/alert/" ]
[ pagerduty_url: STRING | default = "https://pagerduty.example.com/v2/enqueue" ]
[ opsgenie_api_key: STRING ]
[ opsgenie_api_url: STRING | default = "https://opsgenie.example.com/" ]
[ hipchat_api_url: STRING | default = "https://hipchat.example.com/" ]
[ hipchat_auth_token: SECRET ]
[ wechat_api_url: STRING | default = "https://wechat.example.com/cgi-bin/" ]
[ wechat_api_secret: SECRET ]
[ wechat_api_corp_id: STRING ]

# The default HTTP client configuration
[ http_config: HTTP_CONFIG ]

# Files from which custom notification template definitions are read.
# The last component may use a wildcard matcher, e.g. 'templates/*.tmpl'.
templates:
[ - FILEPATH ... ]

# The root node of the routing tree.
route: ROUTE

# A list of notification receivers.
receivers:
- RECEIVER ...

# A list of inhibition rules.
inhibit_rules:
[ - INHIBIT_RULE ... ]
例 7.2︰ ROUTE

ROUTE 块定义路由树中的节点。对于未指定的参数,会继承其父节点的设置。每条警告都会进入路由树中所配置的顶层路由,该层路由需要匹配所有警告。然后,警告会遍历子节点。如果 continue 选项设置为“false”,会在找到第一个匹配的子节点后停止遍历。如果匹配的节点上将该选项设置为“true”,警告将继续匹配后续同级节点。如果警告未匹配节点的任何子节点,将根据当前节点的配置参数处理警告。

[ receiver: STRING ]
[ group_by: '[' LABELNAME, ... ']' ]

# If an alert should continue matching subsequent sibling nodes.
[ continue: BOOLEAN | default = false ]

# A set of equality matchers an alert has to fulfill to match a node.
match:
 [ LABELNAME: LABELVALUE, ... ]

# A set of regex-matchers an alert has to fulfill to match a node.
match_re:
 [ LABELNAME: REGEX, ... ]

# Time to wait before sending a notification for a group of alerts.
[ group_wait: DURATION | default = 30s ]

# Time to wait before sending a notification about new alerts
# added to a group of alerts for which an initial notification has
# already been sent.
[ group_interval: DURATION | default = 5m ]

# Time to wait before re-sending a notification
[ repeat_interval: DURATION | default = 4h ]

# Possible child routes.
routes:
 [ - ROUTE ... ]
例 7.3︰ INHIBIT_RULE

如果存在匹配一组匹配程序的来源警告,禁止规则会将匹配另一组匹配程序的目标警告静音。对于 equal 列表中的标签名称,两个警告需要共用相同的标签值。

这样两个警告便可匹配,从而自行禁止。请勿撰写警告与来源和目标均匹配的禁止规则。

# Matchers that need to be fulfilled for the alerts to be muted.
target_match:
 [ LABELNAME: LABELVALUE, ... ]
target_match_re:
 [ LABELNAME: REGEX, ... ]

# Matchers for which at least one alert needs to exist so that the
# inhibition occurs.
source_match:
 [ LABELNAME: LABELVALUE, ... ]
source_match_re:
 [ LABELNAME: REGEX, ... ]

# Labels with an equal value in the source and target
# alert for the inhibition to take effect.
[ equal: '[' LABELNAME, ... ']' ]
例 7.4︰ HTTP_CONFIG

HTTP_CONFIG 配置接收器用于与 API 服务通讯的 HTTP 客户端。

请注意,basic_authbearer_tokenbearer_token_file 选项是互斥的。

# Sets the 'Authorization' header with the user name and password.
basic_auth:
[ username: STRING ]
[ password: SECRET ]

# Sets the 'Authorization' header with the bearer token.
[ bearer_token: SECRET ]

# Sets the 'Authorization' header with the bearer token read from a file.
[ bearer_token_file: FILEPATH ]

# TLS settings.
tls_config:
# CA certificate to validate the server certificate with.
[ ca_file: FILEPATH ]
# Certificate and key files for client cert authentication to the server.
[ cert_file: FILEPATH ]
[ key_file: FILEPATH ]
# ServerName extension to indicate the name of the server.
# http://tools.ietf.org/html/rfc4366#section-3.1
[ server_name: STRING ]
# Disable validation of the server certificate.
[ insecure_skip_verify: BOOLEAN | default = false]

# Optional proxy URL.
[ proxy_url: STRING ]
例 7.5︰ RECEIVER

接收器是一个或多个通知集成的命名配置。

我们建议不要添加新接收器,而应使用 Webhook 接收器实施自定义通知集成(请参见例 7.15 “WEBHOOK_CONFIG)。

# The unique name of the receiver.
name: STRING

# Configurations for several notification integrations.
email_configs:
[ - EMAIL_CONFIG, ... ]
hipchat_configs:
[ - HIPCHAT_CONFIG, ... ]
pagerduty_configs:
[ - PAGERDUTY_CONFIG, ... ]
pushover_configs:
[ - PUSHOVER_CONFIG, ... ]
slack_configs:
[ - SLACK_CONFIG, ... ]
opsgenie_configs:
[ - OPSGENIE_CONFIG, ... ]
webhook_configs:
[ - WEBHOOK_CONFIG, ... ]
victorops_configs:
[ - VICTOROPS_CONFIG, ... ]
wechat_configs:
[ - WECHAT_CONFIG, ... ]
例 7.6︰ EMAIL_CONFIG
# Whether to notify about resolved alerts.
[ send_resolved: BOOLEAN | default = false ]

# The email address to send notifications to.
to: TMPL_STRING

# The sender address.
[ from: TMPL_STRING | default = global.smtp_from ]

# The SMTP host through which emails are sent.
[ smarthost: STRING | default = global.smtp_smarthost ]

# The host name to identify to the SMTP server.
[ hello: STRING | default = global.smtp_hello ]

# SMTP authentication details.
[ auth_username: STRING | default = global.smtp_auth_username ]
[ auth_password: SECRET | default = global.smtp_auth_password ]
[ auth_secret: SECRET | default = global.smtp_auth_secret ]
[ auth_identity: STRING | default = global.smtp_auth_identity ]

# The SMTP TLS requirement.
[ require_tls: BOOL | default = global.smtp_require_tls ]

# The HTML body of the email notification.
[ html: TMPL_STRING | default = '{{ template "email.default.html" . }}' ]
# The text body of the email notification.
[ text: TMPL_STRING ]

# Further headers email header key/value pairs. Overrides any headers
# previously set by the notification implementation.
[ headers: { STRING: TMPL_STRING, ... } ]
例 7.7︰ HIPCHAT_CONFIG
# Whether or not to notify about resolved alerts.
[ send_resolved: BOOLEAN | default = false ]

# The HipChat Room ID.
room_id: TMPL_STRING
# The authentication token.
[ auth_token: SECRET | default = global.hipchat_auth_token ]
# The URL to send API requests to.
[ api_url: STRING | default = global.hipchat_api_url ]

# A label to be shown in addition to the sender's name.
[ from:  TMPL_STRING | default = '{{ template "hipchat.default.from" . }}' ]
# The message body.
[ message:  TMPL_STRING | default = '{{ template "hipchat.default.message" . }}' ]
# Whether this message will trigger a user notification.
[ notify:  BOOLEAN | default = false ]
# Determines how the message is treated by the alertmanager and rendered inside HipChat. Valid values are 'text' and 'html'.
[ message_format:  STRING | default = 'text' ]
# Background color for message.
[ color:  TMPL_STRING | default = '{{ if eq .Status "firing" }}red{{ else }}green{{ end }}' ]

# Configuration of the HTTP client.
[ http_config: HTTP_CONFIG | default = global.http_config ]
例 7.8︰ PAGERDUTY_CONFIG

routing_keyservice_key 选项是互斥的。

# Whether or not to notify about resolved alerts.
[ send_resolved: BOOLEAN | default = true ]

# The PagerDuty integration key (when using 'Events API v2').
routing_key: TMPL_SECRET
# The PagerDuty integration key (when using 'Prometheus').
service_key: TMPL_SECRET

# The URL to send API requests to.
[ url: STRING | default = global.pagerduty_url ]

# The client identification of the Alertmanager.
[ client:  TMPL_STRING | default = '{{ template "pagerduty.default.client" . }}' ]
# A backlink to the notification sender.
[ client_url:  TMPL_STRING | default = '{{ template "pagerduty.default.clientURL" . }}' ]

# The incident description.
[ description: TMPL_STRING | default = '{{ template "pagerduty.default.description" .}}' ]

# Severity of the incident.
[ severity: TMPL_STRING | default = 'error' ]

# A set of arbitrary key/value pairs that provide further details.
[ details: { STRING: TMPL_STRING, ... } | default = {
 firing:       '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
 resolved:     '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
 num_firing:   '{{ .Alerts.Firing | len }}'
 num_resolved: '{{ .Alerts.Resolved | len }}'
} ]

# The HTTP client's configuration.
[ http_config: HTTP_CONFIG | default = global.http_config ]
例 7.9︰ PUSHOVER_CONFIG
# Whether or not to notify about resolved alerts.
[ send_resolved: BOOLEAN | default = true ]

# The recipient user key.
user_key: SECRET

# Registered application’s API token.
token: SECRET

# Notification title.
[ title: TMPL_STRING | default = '{{ template "pushover.default.title" . }}' ]

# Notification message.
[ message: TMPL_STRING | default = '{{ template "pushover.default.message" . }}' ]

# A supplementary URL displayed together with the message.
[ url: TMPL_STRING | default = '{{ template "pushover.default.url" . }}' ]

# Priority.
[ priority: TMPL_STRING | default = '{{ if eq .Status "firing" }}2{{ else }}0{{ end }}' ]

# How often the Pushover servers will send the same notification (at least 30 seconds).
[ retry: DURATION | default = 1m ]

# How long your notification will continue to be retried (unless the user
# acknowledges the notification).
[ expire: DURATION | default = 1h ]

# Configuration of the HTTP client.
[ http_config: HTTP_CONFIG | default = global.http_config ]
例 7.10︰ SLACK_CONFIG
# Whether or not to notify about resolved alerts.
[ send_resolved: BOOLEAN | default = false ]

# The Slack webhook URL.
[ api_url: SECRET | default = global.slack_api_url ]

# The channel or user to send notifications to.
channel: TMPL_STRING

# API request data as defined by the Slack webhook API.
[ icon_emoji: TMPL_STRING ]
[ icon_url: TMPL_STRING ]
[ link_names: BOOLEAN | default = false ]
[ username: TMPL_STRING | default = '{{ template "slack.default.username" . }}' ]
# The following parameters define the attachment.
actions:
[ ACTION_CONFIG ... ]
[ color: TMPL_STRING | default = '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}' ]
[ fallback: TMPL_STRING | default = '{{ template "slack.default.fallback" . }}' ]
fields:
[ FIELD_CONFIG ... ]
[ footer: TMPL_STRING | default = '{{ template "slack.default.footer" . }}' ]
[ pretext: TMPL_STRING | default = '{{ template "slack.default.pretext" . }}' ]
[ short_fields: BOOLEAN | default = false ]
[ text: TMPL_STRING | default = '{{ template "slack.default.text" . }}' ]
[ title: TMPL_STRING | default = '{{ template "slack.default.title" . }}' ]
[ title_link: TMPL_STRING | default = '{{ template "slack.default.titlelink" . }}' ]
[ image_url: TMPL_STRING ]
[ thumb_url: TMPL_STRING ]

# Configuration of the HTTP client.
[ http_config: HTTP_CONFIG | default = global.http_config ]
例 7.11︰ SLACK_CONFIGACTION_CONFIG
# Provide a button to tell Slack you want to render a button.
type: TMPL_STRING
# Label for the button.
text: TMPL_STRING
# http or https URL to deliver users to. If you specify invalid URLs, the message will be posted with no button.
url: TMPL_STRING
#  If set to 'primary', the button will be green, indicating the best forward action to take
#  'danger' turns the button red, indicating a destructive action.
[ style: TMPL_STRING [ default = '' ]
例 7.12︰ SLACK_CONFIGFIELD_CONFIG
# A bold heading without markup above the value text.
title: TMPL_STRING
# The text of the field. It can span across several lines.
value: TMPL_STRING
# A flag indicating if value is short enough to be displayed together with other values.
[ short: BOOLEAN | default = slack_config.short_fields ]
例 7.13︰ OPSGENIE_CONFIG
# Whether or not to notify about resolved alerts.
[ send_resolved: BOOLEAN | default = true ]

# The API key to use with the OpsGenie API.
[ api_key: SECRET | default = global.opsgenie_api_key ]

# The host to send OpsGenie API requests to.
[ api_url: STRING | default = global.opsgenie_api_url ]

# Alert text (maximum is 130 characters).
[ message: TMPL_STRING ]

# A description of the incident.
[ description: TMPL_STRING | default = '{{ template "opsgenie.default.description" . }}' ]

# A backlink to the sender.
[ source: TMPL_STRING | default = '{{ template "opsgenie.default.source" . }}' ]

# A set of arbitrary key/value pairs that provide further detail.
[ details: { STRING: TMPL_STRING, ... } ]

# Comma separated list of team responsible for notifications.
[ teams: TMPL_STRING ]

# Comma separated list of tags attached to the notifications.
[ tags: TMPL_STRING ]

# Additional alert note.
[ note: TMPL_STRING ]

# Priority level of alert, one of P1, P2, P3, P4, and P5.
[ priority: TMPL_STRING ]

# Configuration of the HTTP.
[ http_config: HTTP_CONFIG | default = global.http_config ]
例 7.14︰ VICTOROPS_CONFIG
# Whether or not to notify about resolved alerts.
[ send_resolved: BOOLEAN | default = true ]

# The API key for talking to the VictorOps API.
[ api_key: SECRET | default = global.victorops_api_key ]

# The VictorOps API URL.
[ api_url: STRING | default = global.victorops_api_url ]

# A key used to map the alert to a team.
routing_key: TMPL_STRING

# Describes the behavior of the alert (one of 'CRITICAL', 'WARNING', 'INFO').
[ message_type: TMPL_STRING | default = 'CRITICAL' ]

# Summary of the alerted problem.
[ entity_display_name: TMPL_STRING | default = '{{ template "victorops.default.entity_display_name" . }}' ]

# Long explanation of the alerted problem.
[ state_message: TMPL_STRING | default = '{{ template "victorops.default.state_message" . }}' ]

# The monitoring tool the state message is from.
[ monitoring_tool: TMPL_STRING | default = '{{ template "victorops.default.monitoring_tool" . }}' ]

# Configuration of the HTTP client.
[ http_config: HTTP_CONFIG | default = global.http_config ]
例 7.15︰ WEBHOOK_CONFIG

您可以使用 Webhook 接收器来配置通用接收器。

# Whether or not to notify about resolved alerts.
[ send_resolved: BOOLEAN | default = true ]

# The endpoint for sending HTTP POST requests.
url: STRING

# Configuration of the HTTP client.
[ http_config: HTTP_CONFIG | default = global.http_config ]

Alertmanager 会发送以下 JSON 格式的 HTTP POST 请求:

{
 "version": "4",
 "groupKey": STRING, // identifycation of the group of alerts (to deduplicate)
 "status": "<resolved|firing>",
 "receiver": STRING,
 "groupLabels": OBJECT,
 "commonLabels": OBJECT,
 "commonAnnotations": OBJECT,
 "externalURL": STRING, // backlink to Alertmanager.
 "alerts": [
   {
     "status": "<resolved|firing>",
     "labels": OBJECT,
     "annotations": OBJECT,
     "startsAt": "<rfc3339>",
     "endsAt": "<rfc3339>",
     "generatorURL": STRING // identifies the entity that caused the alert
   },
   ...
 ]
}

Webhook 接收器允许与以下通知机制集成:

  • DingTalk (https://github.com/timonwong/prometheus-webhook-dingtalk)

  • IRC Bot (https://github.com/multimfi/bot)

  • JIRAlert (https://github.com/free/jiralert)

  • Phabricator / Maniphest (https://github.com/knyar/phalerts)

  • prom2teams:将通知转发给 Microsoft Teams (https://github.com/idealista/prom2teams)

  • SMS:支持多个提供者 (https://github.com/messagebird/sachet)

  • Telegram bot (https://github.com/inCaller/prometheus_bot)

  • SNMP 陷阱 (https://github.com/SUSE/prometheus-webhook-snmp)

例 7.16︰ WECHAT_CONFIG
# Whether or not to notify about resolved alerts.
[ send_resolved: BOOLEAN | default = false ]

# The API key to use for the WeChat API.
[ api_secret: SECRET | default = global.wechat_api_secret ]

# The WeChat API URL.
[ api_url: STRING | default = global.wechat_api_url ]

# The corp id used to authenticate.
[ corp_id: STRING | default = global.wechat_api_corp_id ]

# API request data as defined by the WeChat API.
[ message: TMPL_STRING | default = '{{ template "wechat.default.message" . }}' ]
[ agent_id: STRING | default = '{{ template "wechat.default.agent_id" . }}' ]
[ to_user: STRING | default = '{{ template "wechat.default.to_user" . }}' ]
[ to_party: STRING | default = '{{ template "wechat.default.to_party" . }}' ]
[ to_tag: STRING | default = '{{ template "wechat.default.to_tag" . }}' ]

7.4.2 自定义警告

您可以定义自定义警告条件,以将通知发送到外部服务。Prometheus 使用自己的表达式语言来定义自定义警告。下面是一个含警告的规则示例:

groups:
- name: example
 rules:
  # alert on high deviation from average PG count
  - alert: high pg count deviation
   expr: abs(((ceph_osd_pgs > 0) - on (job) group_left avg(ceph_osd_pgs > 0) by (job)) / on (job) group_left avg(ceph_osd_pgs > 0) by (job)) > 0.35
   for: 5m
   labels:
    severity: warning
    type: ses_default
   annotations:
   description: >
    OSD {{ $labels.osd }} deviates by more then 30% from average PG count

for 子句(可选)指定 Prometheus 从第一次遇到新表达式输出矢量元素到将警告视为触发需等待的时间。在此情况下,Prometheus 将检查警告是否持续保持活跃状态 5 分钟,然后再触发警告。待处理的元素视为处于活跃状态,但尚未触发。

labels 子句指定要附加至警告的一组额外的标签。冲突的标签将会被重写。可以为标签设置模板(有关设置模板的更多详细信息,请参见第 7.4.2.1 节 “模板”)。

annotations 子句指定信息性标签。您可以使用注释存储额外的信息,例如警告描述或 Runbook 链接。可以为注释设置模板(有关设置模板的更多详细信息,请参见第 7.4.2.1 节 “模板”)。

要向 SUSE Enterprise Storage 6 添加自定义警告,请

  • 将包含自定义警告的 YAML 文件放在 /etc/prometheus/alerts 目录中

或者

  • 在 Pillar 中的 monitoring:custom_alerts 键下提供自定义警告文件的路径列表。DeepSea 阶段 2 或 salt SALT_MASTER state.apply ceph.monitoring.prometheus 命令会将您的警告文件添加到正确的位置。

    例 7.17︰ 向 SUSE Enterprise Storage 添加自定义警告

    包含自定义警告的文件存放在 Salt Master 上的 /root/my_alerts/my_alerts.yml 中。如果您将

    monitoring:
     custom_alerts:
       - /root/my_alerts/my_alerts.yml

    添加到 /srv/pillar/ceph/cluster/YOUR_SALT_MASTER_MINION_ID.sls 文件中,DeepSea 将会创建 /etc/prometheus/alerts/my_alerts.yml 文件并重启动 Prometheus。

7.4.2.1 模板

您可以使用模板来提供标签和注释值。 $labels 变量包含警告实例的标签键/值对,而 $value 则存放警告实例的计算值。

下面的示例将插入一个触发的元素标签和值:

{{ $labels.LABELNAME }}
{{ $value }}

7.4.2.2 运行时检查警告

如果您需要确认哪些警告处于活跃状态,可以采用以下几种方式:

  • 导航到 Prometheus 的警告选项卡。该选项卡会显示所定义警告处于活跃状态的确切标签集。Prometheus 还会存储待处理和触发警告的合成时间系列。它们的格式如下:

    ALERTS{alertname="ALERT_NAME", alertstate="pending|firing", ADDITIONAL_ALERT_LABELS}

    如果警告处于活跃状态(待处理或触发),则示例值为 1。当警告处于非活跃状态时,该系列会标记为“stale”。

  • 在 Prometheus Web 界面(URL 地址为 http://PROMETHEUS_HOST_IP:9090/alerts)中,检查警告及其状态(非活跃、待处理或触发)。

  • 在 Alertmanager Web 界面(URL 地址为 http://PROMETHEUS_HOST_IP:9093/#/alerts)中,检查警告并根据需要将其禁止。

7.4.3 SNMP 陷阱接收器

如果您想通过 SNMP 陷阱接收有关 Prometheus 警告的通知,可以通过 DeepSea 安装 Prometheus Alertmanager SNMP 陷阱接收器。要实现此目的,您需要在 Pillar 中的 monitoring:alertmanager_receiver_snmp:enabled 键下启用相应设置。接收器的配置必须在 monitoring:alertmanager_receiver_snmp:config 键下设置。DeepSea 阶段 2 或 salt SALT_MASTER state.apply ceph.monitoring.alertmanager 命令将会在相应位置安装并配置接收器。

例 7.18︰ SNMP 陷阱配置
monitoring:
 alertmanager:
   receiver:
      snmp:
        enabled: True
        config:
          host: localhost
          port: 9099
          snmp_host: snmp.foo-bar.com
          snmp_community: private
          metrics: True

请参见 https://github.com/SUSE/prometheus-webhook-snmp#global-configuration-file 上的接收器手册,以了解有关配置选项的更多详细信息。

打印此页