Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / SUSE OpenStack Cloud Crowbar 9 Documentation / Operations Guide Crowbar / SUSE OpenStack Cloud Monitoring
Applies to SUSE OpenStack Cloud Crowbar 9

4 SUSE OpenStack Cloud Monitoring

As more and more applications are deployed on cloud systems and cloud systems are growing in complexity, managing the cloud infrastructure is becoming increasingly difficult. SUSE OpenStack Cloud Crowbar Monitoring helps mastering this challenge by providing a sophisticated Monitoring as a Service solution that is operated on top of OpenStack-based cloud computing platforms.

The component architecture of OpenStack provides for high flexibility, yet it increases the burden of system operation because multiple services must be handled. SUSE OpenStack Cloud Crowbar Monitoring offers an integrated view of all services and assembles and presents related metrics and log data in one convenient access point. While being flexible and scalable to instantly reflect changes in the OpenStack platform, SUSE OpenStack Cloud Crowbar Monitoring provides the ways and means required to ensure multi-tenancy, high availability, and data security. The high availability architecture of SUSE OpenStack Cloud Crowbar Monitoring ensures an optimum level of operational performance eliminating the risk of component failures and providing for reliable crossover.

SUSE OpenStack Cloud Crowbar Monitoring covers all aspects of a Monitoring as a Service solution:

  • Central management of monitoring and log data from medium and large-size OpenStack deployments.

  • Storage of metrics and log data in a resilient way.

  • Multi-tenancy architecture to ensure the secure isolation of metrics and log data.

  • Horizontal and vertical scalability to support constantly evolving cloud infrastructures. When physical and virtual servers are scaled up or down to varying loads, the monitoring and log management solution can be adapted accordingly.

4.1 About SUSE OpenStack Cloud Crowbar Monitoring

The monitoring solution of SUSE OpenStack Cloud Crowbar Monitoring addresses the requirements of large-scale public and private clouds where high numbers of physical and virtual servers need to be monitored and huge amounts of monitoring data need to be managed. SUSE OpenStack Cloud Crowbar Monitoring consolidates metrics, alarms, and notifications, as well as health and status information from multiple systems, thus reducing the complexity and allowing for a higher level analysis of the monitoring data.

SUSE OpenStack Cloud Crowbar Monitoring covers all aspects of a Monitoring as a Service solution:

  • Storage of monitoring data in a resilient way.

  • Multi-tenancy architecture for submitting and streaming metrics. The architecture ensures the secure isolation of metrics data.

  • Horizontal and vertical scalability to support constantly evolving cloud infrastructures. When physical and virtual servers are scaled up or down to varying loads, the monitoring solution can be adapted accordingly.

SUSE OpenStack Cloud Crowbar Monitoring offers various features which support you in proactively managing your cloud resources. A large number of metrics in combination with early warnings about problems and outages assists you in analyzing and troubleshooting any issue you encounter in your environment.

The monitoring features include:

  • A monitoring overview which allows you to access all monitoring information.

  • Metrics dashboards for visualizing your monitoring data.

  • Alerting features for monitoring.

In the following sections, you will find information on the monitoring overview and the metrics dashboards as well as details on how to define and handle alarms and notifications.

4.1.1 Accessing SUSE OpenStack Cloud Crowbar Monitoring

For accessing SUSE OpenStack Cloud Crowbar Monitoring and performing monitoring tasks, you must have access to the OpenStack platform as a user with the monasca-user or monasca-read-only-user role in the monasca tenant.

Log in to OpenStack horizon with your user name and password. The functions you can use in OpenStack horizon depend on your access permissions. To access logs and metrics, switch to the monasca tenant in horizon. This allows you to access all monitoring data for SUSE OpenStack Cloud Crowbar Monitoring.

SUSE OpenStack Cloud horizon Dashboard—Monitoring
Figure 4.1: SUSE OpenStack Cloud horizon Dashboard—Monitoring

4.1.2 Overview

SUSE OpenStack Cloud Crowbar Monitoring provides one convenient access point to your monitoring data. Use Monitoring > Overview to keep track of your services and servers and quickly check their status. The overview also indicates any irregularities in the log data of the system components you are monitoring.

On the Overview page, you can:

  • View the status of your services, servers, and log data at a glance. As soon as you have defined an alarm for a service, a server, or log data and metrics data has been received, there is status information displayed on the Overview page. Different colors are used for the different statuses.

4.2 Architecture

The following illustration provides an overview of the main components of SUSE OpenStack Cloud Crowbar Monitoring and their interaction:

cmm-architecture.png
OpenStack

SUSE OpenStack Cloud Crowbar Monitoring relies on OpenStack as technology for building cloud computing platforms for public and private clouds. OpenStack consists of a series of interrelated projects delivering various components for a cloud infrastructure solution and allowing for the deployment and management of Infrastructure as a Service (IaaS) platforms.

Monitoring Service

The Monitoring Service is the central SUSE OpenStack Cloud Crowbar Monitoring component. It is responsible for receiving, persisting, and processing monitoring and log data, as well as providing the data to the users.

The Monitoring Service relies on monasca, an open source Monitoring as a Service solution. It uses monasca for high-speed metrics querying and integrates the Threshold Engine (streaming alarm engine) and the Notification Engine of monasca.

The Monitoring Service consists of the following components:

Monitoring API

A RESTful API for monitoring. It is primarily focused on the following areas:

  • Metrics: Store and query massive amounts of metrics in real time.

  • Statistics: Provide statistics for metrics.

  • Alarm Definitions: Create, update, query, and delete alarm definitions.

  • Alarms: Query and delete the alarm history.

  • Notification Methods: Create and delete notification methods and associate them with alarms. Users can be notified directly when alarms are triggered, for example, via email.

Message Queue

A component that primarily receives published metrics from the Monitoring API, alarm state transition messages from the Threshold Engine, and log data from the Log API. The data is consumed by other components, such as the Persister, the Notification Engine, and the Log Persister. The Message Queue is also used to publish and consume other events in the system. It is based on Kafka, a high-performance, distributed, fault-tolerant, and scalable message queue with durability built-in. For administrating the Message Queue, SUSE OpenStack Cloud Crowbar Monitoring uses Zookeeper, a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

Persister

A monasca component that consumes metrics and alarm state transitions from the Message Queue and stores them in the Metrics and Alarms Database (InfluxDB).

Notification Engine

A monasca component that consumes alarm state transition messages from the Message Queue and sends notifications for alarms, such as emails.

Threshold Engine

A monasca component that computes thresholds on metrics and publishes alarms to the Message Queue when they are triggered. The Threshold Engine is based on Apache Storm, a free and open distributed real-time computation system.

Metrics and Alarms Database

An InfluxDB database used for storing metrics and the alarm history.

Config Database

A MariaDB database used for storing configuration information, alarm definitions, and notification methods.

Log API

A RESTful API for log management. It gathers log data from the Log Agents and forwards it to the Message Queue.

The SUSE OpenStack Cloud Crowbar Monitoring log management is based on Logstash, a tool for receiving, processing, and publishing all kinds of logs. It provides a powerful pipeline for querying and analyzing logs. Elasticsearch is used as the back-end datastore, and Kibana as the front-end tool for retrieving and visualizing the log data.

Log Transformer

A Logstash component that consumes the log data from the Message Queue, performs transformation and aggregation operations on the data, and publishes the data that it creates back to the Message Queue.

Log Metrics

A monasca component that consumes log data from the Message Queue, filters the data according to severity, and generates metrics for specific severities, for example, for errors or warnings. The generated metrics are published to the Message Queue and can be further processed by the Threshold Engine like any other metrics.

Log Persister

A Logstash component that consumes the transformed and aggregated log data from the Message Queue and stores it in the Log Database.

Kibana Server

A Web browser-based analytics and search interface to the Log Database.

Log Database

An Elasticsearch database for storing the log data.

horizon Plugin

SUSE OpenStack Cloud Crowbar Monitoring comes with a plugin for the OpenStack horizon dashboard. The plugin extends the main dashboard in OpenStack with a view for monitoring. This enables SUSE OpenStack Cloud Crowbar Monitoring users to access the monitoring functions from a central Web-based graphical user interface. For details, refer to the OpenStack horizon documentation.

Based on OpenStack horizon, the monitoring data is visualized on a comfortable and easy-to-use dashboard which fully integrates with the following applications:

Grafana (for metrics data)

An open source application for visualizing large-scale measurement data.

Kibana (for log data)

An open source analytics and visualization platform designed to work with Elasticsearch.

Metrics Agent

A Metrics Agent is required for retrieving metrics data from the host on which it runs and sending the metrics data to the Monitoring Service. The agent supports metrics from a variety of sources as well as a number of built-in system and service checks.

A Metrics Agent can be installed on each virtual or physical server to be monitored.

The agent functionality is fully integrated into the source code base of the monasca project. For details, refer to the monasca Wiki.

Log Agent

A Log Agent is needed for collecting log data from the host on which it runs and forwarding the log data to the Monitoring Service for further processing. It can be installed on each virtual or physical server from which log data is to be retrieved.

The agent functionality is fully integrated into the source code base of the monasca project. For details, refer to the monasca Wiki.

4.2.1 Agents and Services

Service NameDescription
zookeeper Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
storm-nimbus Storm is a distributed real-time computation system for processing large volumes of high-velocity data. The Storm Nimbus daemon is responsible for distributing code around a cluster, assigning tasks to machines, and monitoring for failures.
storm-supervisor The Storm supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it.
mariadb MariaDB database service. SUSE OpenStack Cloud Crowbar Monitoring stores configuration information in this database.
kafka Message queue service.
influxdb InfluxDB database service. SUSE OpenStack Cloud Crowbar Monitoring stores metrics and alarms in this database.
elasticsearch Elasticsearch database service. SUSE OpenStack Cloud Crowbar Monitoring stores the log data in this database.
memcached Memcached service. SUSE OpenStack Cloud Crowbar Monitoring uses it for caching authentication and authorization information required for the communication between the Log API and OpenStack keystone.
openstack-monasca-notification Notification Engine.
openstack-monasca-thresh Threshold Engine.
openstack-monasca-log-transformer Log Transformer.
apache2 Log and monitoring API.
openstack-monasca-persister Persister.
openstack-monasca-agent Metrics Agent.
kibana Kibana server.
openstack-monasca-log-persister Log Persister.
openstack-monasca-log-metrics Log Metrics.
openstack-monasca-log-agent Log Agent.

4.3 Basic Usage Scenario

The monitoring solution of SUSE OpenStack Cloud Crowbar Monitoring addresses the requirements of large-scale public and private clouds where high numbers of physical and virtual servers need to be monitored and huge amounts of monitoring data need to be managed. SUSE OpenStack Cloud Crowbar Monitoring consolidates metrics, alarms, and notifications, as well as health and status information from multiple systems, thus reducing the complexity and allowing for a higher level analysis of the monitoring data.

SUSE OpenStack Cloud Crowbar Monitoring covers all aspects of a Monitoring as a Service solution:

  • Storage of monitoring data in a resilient way.

  • Multi-tenancy architecture for submitting and streaming metrics. The architecture ensures the secure isolation of metrics data.

  • horizontal and vertical scalability to support constantly evolving cloud infrastructures. When physical and virtual servers are scaled up or down to varying loads, the monitoring solution can be adapted accordingly.

The basic usage scenario of setting up and using the monitoring features of SUSE OpenStack Cloud Crowbar Monitoring looks as follows:

MetricsLogs.png

The Monitoring Service operator is responsible for providing the monitoring features to the application operators and the OpenStack operator. This enables the application operators and the OpenStack operator to focus on operation and ensure the quality of their services without having to carry out the tedious tasks implied by setting up and administrating their own system monitoring software. The Monitoring Service operator uses the features for monitoring the operation of SUSE OpenStack Cloud Crowbar Monitoring.

As the Monitoring Service operator, you have the following responsibilities:

  • Deploying the Monitoring Service, thus providing the monitoring features to the application operators, and the monitoring and log management features to the OpenStack operator.

  • Regular maintenance of the components and services for SUSE OpenStack Cloud Crowbar Monitoring.

  • Backup of the SUSE OpenStack Cloud Crowbar Monitoring databases, configuration files, and customized dashboards.

  • Monitoring of SUSE OpenStack Cloud Crowbar Monitoring for quality assurance.

Application operators monitor the virtual machines on which they provide services to end users or services they need for their development activities. They ensure that the physical and virtual servers on which their services are provided are up and running as required.

The OpenStack operator is responsible for administrating and maintaining the underlying OpenStack platform. The monitoring and log management services of SUSE OpenStack Cloud Crowbar Monitoring enable you to ensure the availability and quality of your platform. You use SUSE OpenStack Cloud Crowbar Monitoring for:

  • Monitoring physical and virtual servers, hypervisors, and OpenStack services.

  • Monitoring middleware components, for example, database services.

  • Retrieving and analyzing the log data of the OpenStack services and servers, the middleware components, and the operating system.

4.3.1 Metrics

A Metrics Agent can be installed and configured on each physical and virtual server where cloud resources are to be monitored. The agent is responsible for querying metrics and sending the data to the Monitoring Service for further processing.

Metrics are self-describing data structures that are uniquely identified by a name and a set of dimensions. Each dimension consists of a key/value pair that allows for a flexible and concise description of the data to be monitored, for example, region, availability zone, service tier, or resource ID.

The Metrics Agent supports various types of metrics including the following:

  • System metrics, for example, CPU usage, consumed disk space, or network traffic.

  • Host alive checks. The agent can perform active checks on a host to determine whether it is alive using ping (ICMP) or SSH.

  • Process checks. The agent can check and monitor a process, for example, the number of instances, memory size, or number of threads.

  • HTTP endpoint checks. The agent can perform up/down checks on HTTP endpoints by sending an HTTP request and reporting success or failure to the Monitoring Service.

  • Service checks. The agent can check middleware services, for example, MySQL, Kafka, or RabbitMQ.

  • OpenStack services. The agent can perform specific checks on each process that is part of an OpenStack service.

  • Log metrics. The agent can check and monitor the number of critical log entries in the log data retrieved from the cloud resources.

4.3.2 Data Visualization and Analysis

All SUSE OpenStack Cloud Crowbar Monitoring user groups work with a graphical user interface that is seamlessly integrated into their cloud infrastructure. Based on OpenStack horizon, the user interface enables access to all monitoring functionality and the resulting large-scale monitoring data.

A comfortable dashboard visualizes the health and status of the cloud resources. It allows SUSE OpenStack Cloud Crowbar Monitoring users to experiment with many ways of analyzing the performance of their cloud resources in real-time. They cannot only view but also share and explore visualizations of their monitoring data.

4.3.3 Alarms and Notifications

SUSE OpenStack Cloud Crowbar Monitoring supports GUI-based alarm and notification management. Template-based alarm definitions allow for monitoring a dynamically changing set of resources without the need for reconfiguration. While the number of underlying virtual machines is changing, for example, this ensures the efficient monitoring of scalable cloud services. Alarm definitions allow you to specify expressions that are evaluated based on the metrics data that is received. Alarm definitions can be combined to form compound alarms. Compound alarms allow you to track and process even more complex events. Notifications can be configured in order to inform SUSE OpenStack Cloud Crowbar Monitoring users when an alarm is triggered.

4.4 Key Features

SUSE OpenStack Cloud Crowbar Monitoring is an out-of-the-box solution for monitoring OpenStack-based cloud environments. It is provided as a cloud service to users. SUSE OpenStack Cloud Crowbar Monitoring meets different challenges, ranging from small-scale deployments to high-availability deployments and deployments with high levels of scalability.

The core of SUSE OpenStack Cloud Crowbar Monitoring is monasca, an open source Monitoring as a Service solution that integrates with OpenStack. The key features of SUSE OpenStack Cloud Crowbar Monitoring form an integral part of the monasca project. SUSE OpenStack Cloud Crowbar Monitoring extends the source code base of the project through active contributions.

Compared to the monasca community edition, SUSE OpenStack Cloud Crowbar Monitoring provides the following added value:

  • Packaging as a commercial enterprise solution

  • Enterprise-level support

The key features of SUSE OpenStack Cloud Crowbar Monitoring address public as well as private cloud service providers. They include:

  • Monitoring

  • Metrics

  • Log management

  • Integration with OpenStack

4.4.1 Monitoring

SUSE OpenStack Cloud Crowbar Monitoring is a highly scalable and fault tolerant monitoring solution for OpenStack-based cloud infrastructures.

The system operator of the cloud infrastructure and the service providers do not have to care for system monitoring software any longer. They use SUSE OpenStack Cloud Crowbar Monitoring to check whether their services and servers are working appropriately.

SUSE OpenStack Cloud Crowbar Monitoring provides comprehensive and configurable metrics with reasonable defaults for monitoring the status, capacity, throughput, and latency of cloud systems. SUSE OpenStack Cloud Crowbar Monitoring users can set their own warnings and critical thresholds and can combine multiple warnings and thresholds to support the processing of complex events. Combined with a notification system, these alerting features enable them to quickly analyze and resolve problems in the cloud infrastructure.

4.4.2 Metrics

The Metrics agent is responsible for querying metrics and sending them to the Monitoring Service for further processing.

Metrics are self-describing data structures that are uniquely identified by a name and a set of dimensions. Each dimension consists of a key/value pair that allows for a flexible and concise description of the data to be monitored, for example, region, availability zone, service tier, or resource ID.

The Metrics Agent supports various types of metrics including the following:

  • System metrics, for example, CPU usage, consumed disk space, or network traffic.

  • Host alive checks. The agent can perform active checks on a host to determine whether it is alive using ping (ICMP) or SSH.

  • Process checks. The agent can check and monitor a process, for example, the number of instances, memory size, or number of threads.

  • HTTP endpoint checks. The agent can perform up/down checks on HTTP endpoints by sending an HTTP request and reporting success or failure to the Monitoring Service.

  • Service checks. The agent can check middleware services, for example, MySQL, Kafka, or RabbitMQ.

  • OpenStack services. The agent can perform specific checks on each process that is part of an OpenStack service.

  • Log metrics. The agent can check and monitor the number of critical log entries in the log data retrieved from the cloud resources.

Your individual agent configuration determines which metrics are available for monitoring your services and servers. For details on installing and configuring a Metrics Agent, see Deployment Guide using Crowbar.

As soon as an agent is available, you have access to the SUSE OpenStack Cloud Crowbar Monitoring monitoring features. You work with a graphical user interface that is seamlessly integrated into your cloud infrastructure. Based on OpenStack horizon, the user interface enables access to all monitoring functionality and the resulting large-scale monitoring data. A comfortable dashboard visualizes the health and status of your cloud resources.

SUSE OpenStack Cloud Crowbar Monitoring provides functions for alarm and notification management. Template-based alarm definitions allow for monitoring a dynamically changing set of resources without the need for reconfiguration. While the number of underlying virtual machines is changing, for example, this ensures the efficient monitoring of scalable cloud services. Alarm definitions allow you to specify expressions that are evaluated based on the metrics data that is received. Alarm definitions can be combined to form compound alarms. Compound alarms allow you to track and process even more complex events. Notifications can be configured in order to inform SUSE OpenStack Cloud Crowbar Monitoring users when an alarm is triggered.

4.4.3 Log Management

With the increasing complexity of cloud infrastructures, it is becoming more and more difficult and time-consuming for the system operator to gather, store, and query the large amounts of log data manually. To cope with these problems, SUSE OpenStack Cloud Crowbar Monitoring provides centralized log management features.

SUSE OpenStack Cloud Crowbar Monitoring stores the log data in a central database. This forms the basis for visualizing the log data for the SUSE OpenStack Cloud Crowbar Monitoring users. Advanced data analysis and visualization of the log data is supported in a variety of charts, tables, and maps. Visualizations can easily be combined in dynamic dashboards that display changes to search queries in real time.

The log data from a large number of sources can be accessed from a single dashboard. Integrated search, filter, and graphics options enable system operators to isolate problems and narrow down potential root causes. SUSE OpenStack Cloud Crowbar Monitoring thus provides valuable insights into the log data, even with large amounts of data resulting from highly complex environments.

Based on OpenStack horizon, the customizable dashboards are seamlessly integrated into your cloud infrastructure. They enable user access to all log management functionality.

GUI-based alarm and notification management is also supported for log data. Based on a template mechanism, you can configure alarms and notifications to monitor the number of critical log events over time. Compound alarms can be created to analyze more complex log events. This automation of log handling guarantees that you can identify problems in your their infrastructure early and find the root cause quickly.

4.4.4 Integration with OpenStack

SUSE OpenStack Cloud Crowbar Monitoring is integrated with OpenStack core services. These include:

  • OpenStack horizon dashboard for visualizing monitoring metrics and log data

  • OpenStack user management

  • OpenStack security and access control

4.5 Components

The following illustration provides an overview of the main components of SUSE OpenStack Cloud Crowbar Monitoring:

structure_new.png

SUSE OpenStack Cloud Crowbar Monitoring relies on OpenStack as technology for building cloud computing platforms for public and private clouds. OpenStack consists of a series of interrelated projects delivering various components for a cloud infrastructure solution and allowing for the deployment and management of Infrastructure as a Service (IaaS) platforms.

For details on OpenStack, refer to the OpenStack documentation.

4.5.1 Monitoring Service

The Monitoring Service is the central SUSE OpenStack Cloud Crowbar Monitoring component. It is responsible for receiving, persisting, and processing metrics and log data, as well as providing the data to the users.

The Monitoring Service relies on monasca. It uses monasca for high-speed metrics querying and integrates the streaming alarm engine and the notification engine of monasca. For details, refer to the monasca Wiki.

4.5.2 Horizon Plugin

SUSE OpenStack Cloud Crowbar Monitoring comes with a plugin for the OpenStack horizon dashboard. The horizon plugin extends the main dashboard in OpenStack with a view for monitoring. This enables SUSE OpenStack Cloud Crowbar Monitoring users to access the monitoring and log management functions from a central Web-based graphical user interface. Metrics and log data are visualized on a comfortable and easy-to-use dashboard.

For details, refer to the OpenStack horizon documentation.

Based on OpenStack horizon, the monitoring data is visualized on a comfortable and easy-to-use dashboard which fully integrates with the following applications:

  • Grafana (for metrics data). An open source application for visualizing large-scale measurement data.

  • Kibana (for log data). An open source analytics and visualization platform designed to work with Elasticsearch.

4.5.3 Metrics Agent

A Metrics Agent is required for retrieving metrics data from the host on which it runs and sending the metrics data to the Monitoring Service. The agent supports metrics from a variety of sources as well as a number of built-in system and service checks.

A Metrics Agent can be installed on each virtual or physical server to be monitored.

The agent functionality is fully integrated into the source code base of the monasca project. For details, refer to the monasca Wiki.

4.5.4 Log Agent

A Log Agent is needed for collecting log data from the host on which it runs and forwarding the log data to the Monitoring Service for further processing. It can be installed on each virtual or physical server from which log data is to be retrieved.

The agent functionality is fully integrated into the source code base of the monasca project. For details, refer to the monasca Wiki.

4.6 Users and Roles

CMM users can be grouped by their role. The following user roles are distinguished:

  • An application operator acts as a service provider in the OpenStack environment. He books virtual machines in OpenStack to provide services to end users or to host services that he needs for his own development activities. CMM helps application operators to ensure the quality of their services in the cloud.

    For details on the tasks of the application operator, refer to the Application Operator's Guide.

  • The OpenStack operator is a special application operator. He is responsible for administrating and maintaining the underlying OpenStack platform and ensures the availability and quality of the OpenStack services (e.g. heat, nova, cinder, swift, glance, or keystone).

    For details on the tasks of the OpenStack operator, refer to the OpenStack Operator's Guide.

  • The Monitoring Service operator is responsible for administrating and maintaining CMM. He provides the cloud monitoring services to the other users and ensures the quality of the Monitoring Service.

    For details on the tasks of the Monitoring Service operator, refer to the Monitoring Service Operator's Guide.

4.6.1 User Management

CMM is fully integrated with keystone, the identity service which serves as the common authentication and authorization system in OpenStack.

The CMM integration with keystone requires any CMM user to be registered as an OpenStack user. All authentication and authorization in CMM is done through keystone. If a user requests monitoring data, for example, CMM verifies that the user is a valid user in OpenStack and allowed to access the requested metrics.

CMM users are created and administrated in OpenStack:

  • Each user assumes a role in OpenStack to perform a specific set of operations. The OpenStack role specifies a set of rights and privileges.

  • Each user is assigned to at least one project in OpenStack. A project is an organizational unit that defines a set of resources which can be accessed by the assigned users.

    Application operators in CMM can monitor the set of resources that is defined for the projects to which they are assigned.

For details on user management, refer to the OpenStack documentation.

4.7 Operation and Maintenance

Regular operation and maintenance includes:

  • Configuring data retention for the InfluxDB database. This can be configured in the monasca barclamp. For details, see Deployment Guide using Crowbar.

  • Configuring data retention for the Elasticsearch database. This can be configured in the monasca barclamp. For details, see Deployment Guide using Crowbar.

  • Removing metrics data from the InfluxDB database.

  • Removing log data from the Elasticsearch database.

  • Handling log files of agents and services.

  • Backup and recovery of databases, configuration files, and dashboards.

4.7.1 Removing Metrics Data

Metrics data is stored in the Metrics and Alarms InfluxDB Database. InfluxDB features an SQL-like query language for querying data and performing aggregations on that data.

The Metrics Agent configuration defines the metrics and types of measurement for which data is stored. For each measurement, a so-called series is written to the InfluxDB database. A series consists of a timestamp, the metrics, and the value measured.

Every series can be assigned key tags. In the case of SUSE OpenStack Cloud Crowbar Monitoring, this is the _tenant_id tag. This tag identifies the OpenStack project for which the metrics data has been collected.

From time to time, you may want to delete outdated or unnecessary metrics data from the Metrics and Alarms Database, for example, to save space or remove data for metrics you are no longer interested in. To delete data, you use the InfluxDB command line interface, the interactive shell that is provided for the InfluxDB database.

Proceed as follows to delete metrics data from the database:

  1. Create a backup of the database.

  2. Determine the ID of the OpenStack project for the data to be deleted:

    Log in to the OpenStack dashboard and go to Identity > Projects!m. The monasca project initially provides all metrics data related to SUSE OpenStack Cloud Crowbar Monitoring.

    In the course of the productive operation of SUSE OpenStack Cloud Crowbar Monitoring, additional projects may be created, for example, for application operators.

    The Project ID field shows the relevant tenant ID.

  3. Log in to the host where the Monitoring Service is installed.

  4. Go to the directory where InfluxDB is installed:

    cd /usr/bin
  5. Connect to InfluxDB using the InfluxDB command line interface as follows:

    ./influx -host <host_ip>

    Replace <host_ip> with the IP address of the machine on which SUSE OpenStack Cloud Crowbar Monitoring is installed.

    The output of this command is, for example, as follows:

    Connected to http://localhost:8086 version 1.1.1
    InfluxDB shell version: 1.1.1
  6. Connect to the InfluxDB database of SUSE OpenStack Cloud Crowbar Monitoring (mon):

    > show databases
    name: databases
    name
    ----
    mon
    _internal
    
    > use mon
    Using database mon
  7. Check the outdated or unnecessary data to be deleted.

    • You can view all measurements for a specific project as follows:

      SHOW MEASUREMENTS WHERE _tenant_id = '<project ID>'
    • You can view the series for a specific metrics and project, for example, as follows:

      SHOW SERIES FROM "cpu.user_perc" WHERE _tenant_id = '<project ID>'
  8. Delete the desired data.

    • When a project is no longer relevant or a specific tenant is no longer used, delete all series for the project as follows:

      DROP SERIES WHERE _tenant_id = '<project ID>'

      Example:

      DROP SERIES WHERE _tenant_id = '27620d7ee6e948e29172f1d0950bd6f4'
    • When a metrics is no longer relevant for a project, delete all series for the specific project and metrics as follows:

      DROP SERIES FROM "<metrics>" WHERE _tenant_id = '<project ID>'

      Example:

      DROP SERIES FROM "cpu.user_perc" WHERE _tenant_id = '27620d7e'
  9. Restart the influxdb service, for example, as follows:

    sudo systemctl restart influxdb

4.7.2 Removing Log Data

Log data is stored in the Elasticsearch database. Elasticsearch stores the data in indices. One index per day is created for every OpenStack project.

By default, the indices are stored in the following directory on the host where the Monitoring Service is installed:

/var/data/elasticsearch/<cluster-name>/nodes/<node-name>

Example:

/var/data/elasticsearch/elasticsearch/nodes/0

Note
Note

If your system uses a different directory, look up the path.data parameter in the Elasticsearch configuration file, /etc/elasticsearch/elasticsearch.yml.

If you want to delete outdated or unnecessary log data from the Elasticsearch database, proceed as follows:

  1. Make sure that curl is installed. If this is not the case, install the package with

    sudo zypper in curl
  2. Create a backup of the Elasticsearch database.

  3. Determine the ID of the OpenStack project for the data to be deleted:

    Log in to the OpenStack dashboard and go to Identity > Projects. The monasca project initially provides a ll metrics data related to SUSE OpenStack Cloud Crowbar Monitoring.

    In the course of the productive operation of SUSE OpenStack Cloud Crowbar Monitoring, additional projects may be created.

    The Project ID field shows the relevant ID.

  4. Log in to the host where the Monitoring Service is installed.

  5. Make sure that the data you want to delete exists by executing the following command:

    curl -XHEAD -i 'http://localhost:<port>/<projectID-date>'

    For example, if Elasticsearch is listening at port 9200 (default), the ID of the OpenStack project is abc123, and you want to check the index of 2015, July 1st, the command is as follows:

    curl -XHEAD -i 'http://localhost:9200/abc123-2015-07-01'

    If the HTTP response is 200, the index exists; if the response is 404, it does not exist.

  6. Delete the index as follows:

    curl -XDELETE -i 'http://localhost:<port>/<projectID-date>'

    Example:

    curl -XDELETE -i 'http://localhost:9200/abc123-2015-07-01'

    This command either returns an error, such as IndexMissingException, or acknowledges the successful deletion of the index.

Note
Note

Be aware that the -XDELETE command immediately deletes the index file!

Both, for -XHEAD and -XDELETE, you can use wildcards for processing several indices. For example, you can delete all indices of a specific project for the whole month of July, 2015:

curl -XDELETE -i 'http://localhost:9200/abc123-2015-07-*'
Note
Note

Take extreme care when using wildcards for the deletion of indices. You could delete all existing indices with one single command!

4.7.3 Log File Handling

In case of trouble with the SUSE OpenStack Cloud Crowbar Monitoring services, you can study their log files to find the reason. The log files are also useful if you need to contact your support organization. For storing the log files, the default installation uses the /var/log directory on the hosts where the agents or services are installed.

You can use systemd, a system and session manager for LINUX, and journald, a LINUX logging interface, for addressing dispersed log files.

The SUSE OpenStack Cloud Crowbar Monitoring installer automatically puts all SUSE OpenStack Cloud Crowbar Monitoring services under the control of systemd. journald provides a centralized management solution for the logging of all processes that are controlled by systemd. The logs are collected and managed in a so-called journal controlled by the journald daemon.

For details on the systemd and journald utilities, refer to the https://documentation.suse.com/sles/15-SP1/single-html/SLES-admin/#part-system.

4.7.4 Backup and Recovery

Typical tasks of the Monitoring Service operator are to make regular backups, particularly of the data created during operation.

At regular intervals, you should make a backup of all:

  • Databases.

  • Configuration files of the individual agents and services.

  • Monitoring and log dashboards you have created and saved.

SUSE OpenStack Cloud Crowbar Monitoring does not offer integrated backup and recovery mechanisms. Instead, use the mechanisms and procedures of the individual components.

4.7.4.1 Databases

You need to create regular backups of the following databases on the host where the Monitoring Service is installed:

  • Elasticsearch database for historic log data.

  • InfluxDB database for historic metrics data.

  • MariaDB database for historic configuration information.

It is recommended that backup and restore operations for databases are carried out by experienced operators only.

Preparations

Before backing up and restoring a database, we recommend stopping the Monitoring API and the Log API on the monasca-server node, and check that all data is processed. This ensures that no data is written to a database during a backup and restore operation. After backing up and restoring a database, restart the APIs.

To stop the Monitoring API and the Log API, use the following command:

systemctl stop apache2

To check that all Kafka queues are empty, list the existing consumer groups and check the LAG column for each group. It should be 0. For example:

 kafka-consumer-groups.sh --zookeeper 192.168.56.81:2181 --list
 kafka-consumer-groups.sh --zookeeper 192.168.56.81:2181 --describe \
  --group 1_metrics | column -t -s ','
 kafka-consumer-groups.sh --zookeeper 192.168.56.81:2181 --describe \
  --group transformer-logstash-consumer | column -t -s ','
 kafka-consumer-groups.sh --zookeeper 192.168.56.81:2181 --describe \
  --group thresh-metric | column -t -s ','

To restart the Monitoring API and the Log API, use the following command:

systemctl start apache2
Elasticsearch Database

For backing up and restoring your Elasticsearch database, use the Snapshot and Restore module of Elasticsearch.

To create a backup of the database, proceed as follows:

  1. Make sure that curl is installed, zypper in curl.

  2. Log in to the host where the Monitoring Service is installed.

  3. Create a snapshot repository. You need the Elasticsearch bind address for all commands. run grep network.bind_host /etc/elasticsearch/elasticsearch.yml to find the bind address, and replace IP in the following commands with this address. For example:

    curl -XPUT http://IP:9200/_snapshot/my_backup -d '{
       "type": "fs",
       "settings": {
            "location": "/mount/backup/elasticsearch1/my_backup",
            "compress": true
       }
     }'

    The example registers a shared file system repository ("type": "fs") that uses the /mount/backup/elasticsearch1 directory for storing snapshots.

    Note
    Note

    The directory for storing snapshots must be configured in the elasticsearch/repo_dir setting in the monasca barclamp (see Section 12.6, “Deploying monasca (Optional)”). The directory must be manually mounted before creating the snapshot. The elasticsearch user must be specified as the owner of the directory.

    compress is turned on to compress the metadata files.

  4. Check whether the repository was created successfully:

    curl -XGET http://IP:9200/_snapshot/my_backup

    This example response shows a successfully created repository:

    {
       "my_backup": {
         "type": "fs",
         "settings": {
           "compress": "true",
           "location": "/mount/backup/elasticsearch1/my_backup"
         }
       }
     }
  5. Create a snapshot of your database that contains all indices. A repository can contain multiple snapshots of the same database. The name of a snapshot must be unique within the snapshots created for your database, for example:

    curl -XPUT http://IP:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true

    The example creates a snapshot named snapshot_1 for all indices in the my_backup repository.

To restore the database instance, proceed as follows:

  1. Close all indices of your database, for example:

    curl -XPOST http://IP:9200/_all/_close
  2. Restore all indices from the snapshot you have created, for example:

    curl -XPOST http://IP:9200/_snapshot/my_backup/snapshot_1/_restore

    The example restores all indices from snapshot_1 that is stored in the my_backup repository.

For additional information on backing up and restoring an Elasticsearch database, refer to the Elasticsearch documentation.

InfluxDB Database

For backing up and restoring your InfluxDB database, you can use the InfluxDB shell. The shell is part of your InfluxDB distribution. If you installed InfluxDB via a package manager, the shell is, by default, installed in the /usr/bin directory.

To create a backup of the database, proceed as follows:

  1. Log in to the InfluxDB database as a user who is allowed to run the influxdb service, for example:

    su influxdb -s /bin/bash
  2. Back up the database, for example:

    influxd backup -database mon /mount/backup/mysnapshot

    monasca is using mon as the name of the database The example creates the backup for the database in /mount/backup/mysnapshot.

Before restoring the database, make sure that all database processes are shut down. To restore the database, you can then proceed as follows:

  1. If required, delete all files not included in the backup by dropping the database before you carry out the restore operation. A restore operation restores all files included in the backup. Files created or merged at a later point in time are not affected. For example:

    influx -host IP -execute 'drop database mon;'

    Replace IP with the IP address that the database is listening to. You can run influxd config and look up the IP address in the [http] section.

  2. Stop the InfluxDB database service:

     systemctl stop influxdb
  3. Log in to the InfluxDB database as a user who is allowed to run the influxdb service:

    su influxdb -s /bin/bash
  4. Restore the metastore:

    influxd restore -metadir /var/opt/influxdb/meta /mount/backup/mysnapshot
  5. Restore the database, for example:

    influxd restore -database mon -datadir /var/opt/influxdb/data /mount/backup/mysnapshot

    The example restores the backup from /mount/backup/mysnapshot to /var/opt/influxdb/influxdb.conf.

  6. Ensure that the file permissions for the restored database are set correctly:

    chown -R influxdb:influxdb /var/opt/influxdb
  7. Start the InfluxDB database service:

    systemctl start influxdb

For additional information on backing up and restoring an InfluxDB database, refer to the InfluxDB documentation.

MariaDB Database

For backing up and restoring your MariaDB database, you can use the mysqldump utility program. mysqldump performs a logical backup that produces a set of SQL statements. These statements can later be executed to restore the database.

To back up your MariaDB database, you must be the owner of the database or a user with superuser privileges, for example:

mysqldump -u root -p mon > dumpfile.sql

In addition to the name of the database, you have to specify the name and the location where mysqldump stores its output.

To restore your MariaDB database, proceed as follows:

  1. Log in to the host where the Monitoring Service is installed as a user with root privileges.

  2. Make sure that the mariadb service is running:

    systemctl start mariadb
  3. Log in to the database you have backed up as a user with root privileges, for example:

    mysql -u root -p mon
  4. Remove and then re-create the database:

     DROP DATABASE mon;
     CREATE DATABASE mon;
  5. Exit mariadb:

    \q
  6. Restore the database, for example:

    mysql -u root -p mon < dumpfile.sql

For additional information on backing up and restoring a MariaDB database with mysqldump, refer to the MariaDB documentation.

4.7.4.2 Configuration Files

Below you find a list of the configuration files of the agents and the individual services included in the Monitoring Service. Back up these files at least after you have installed and configured SUSE OpenStack Cloud Crowbar Monitoring and after each change in the configuration.

/etc/influxdb/influxdb.conf
 /etc/kafka/server.properties
 /etc/my.cnf
 /etc/my.cnf.d/client.cnf
 /etc/my.cnf.d/mysql-clients.cnf
 /etc/my.cnf.d/server.cnf
 /etc/monasca/agent/agent.yaml
 /etc/monasca/agent/conf.d/*
 /etc/monasca/agent/supervisor.conf
 /etc/monasca/api-config.conf
 /etc/monasca/log-api-config.conf
 /etc/monasca/log-api-config.ini
 /etc/monasca-log-persister/monasca-log-persister.conf
 /etc/monasca-log-transformer/monasca-log-transformer.conf
 /etc/monasca-log-agent/agent.conf
 /etc/monasca-notification/monasca-notification.yaml
 /etc/monasca-persister/monasca-persister.yaml
 /etc/monasca-thresh/thresh.yaml
 /etc/elasticsearch/elasticsearch.yml
 /etc/elasticsearch/logging.yml
 /etc/kibana/kibana.yml
Recovery

If you need to recover the configuration of one or more agents or services, the recommended procedure is as follows:

  1. If necessary, uninstall the agents or services, and install them again.

  2. Stop the agents or services.

  3. Copy the backup of your configuration files to the correct location according to the table above.

  4. Start the agents or services again.

4.7.4.3 Dashboards

Kibana can persist customized log dashboard designs to the Elasticsearch database, and allows you to recall them. For details on saving, loading, and sharing log management dashboards, refer to the Kibana documentation.

Grafana allows you to export a monitoring dashboard to a JSON file, and to re-import it when necessary. For backing up and restoring the exported dashboards, use the standard mechanisms of your file system. For details on exporting monitoring dashboards, refer to the Getting Started tutorial of Grafana.

4.8 Working with Data Visualizations

The user interface for monitoring your services, servers, and log data integrates with Grafana, an open source application for visualizing large-scale monitoring data. Use the options at the top border of the Overview page to access Grafana.

CMM ships with preconfigured metrics dashboards. You can instantly use them for monitoring your environment. You can also use them as a starting point for building your own dashboards.

4.8.1 Preconfigured Metrics Dashboard for OpenStack

As an OpenStack operator, you use the Dashboard option on the Overview page to view the metrics data on the OpenStack services. The Monitoring Service operator uses the Dashboard option to view the metrics data on the Monitoring Service.

grafana-os.png

To monitor OpenStack, the preconfigured dashboard shows the following:

  • Status of the main OpenStack Services (UP or DOWN). Information on nova, neutron, glance, cinder, swift, and keystone is displayed.

  • Information on system resources.

    The dashboard shows metrics data on CPU usage: the percentage of time the CPU is used in total (cpu.percent), at user level (cpu.user_perc), and at system level (cpu.system_perc), as well as the percentage of time the CPU is idle when no I/O requests are in progress (cpu.wait_perc).

    The dashboard shows metrics data on memory usage: the number of megabytes of total memory (mem.total_mb), used memory (mem.used_mb), total swap memory (mem.swap_total_mb), and used swap memory (mem.swap_used_mb), as well as the number of megabytes used for the page cache (mem.used_cache).

    The dashboard shows metrics data on the percentage of disk space that is being used on a device (disk.space_used_perc).

    The dashboard shows metrics data on the CMM system load over different periods (load.avg_1_min, load.avg_5_min, and load.avg_15_min).

  • The network usage of CMM.

    The dashboard shows the number of network bytes received and sent per second (net.in_bytes_sec and net.out_bytes_sec).

4.8.2 Building Dashboards

Each metrics dashboard is composed of one or more panels that are arranged in one or more rows. A row serves as a logical divider within a dashboard. It organizes your panels in groups. The panel is the basic building block for visualizing your metrics data.

For building dashboards, you have two options:

  • Start from scratch and create a new dashboard.

  • Take the dashboard that is shipped with CMM as a starting point and customize it.

The following sections provide introductory information on dashboards, rows, and panels, and make you familiar with the first steps involved in building a dashboard. For additional information, you can also refer to the Grafana documentation.

4.8.3 Creating Dashboards

To create a new dashboard, you use Open Dashboard in the top right corner of your dashboard window. The option provides access to various features for administrating dashboards. Use New to create an empty dashboard that serves as a starting point for adding rows and panels.

grafana-cmm.png

On the left side of an empty dashboard, there is a green rectangle displayed. Hover over this rectangle to access a Row menu. To insert your first panel, you can use the options in the Add Panel submenu. See below for details on the available panel types.

As soon as you have inserted an empty panel, you can add additional rows. For this purpose, use the Add Row option on the right side of the dashboard.

4.8.4 Editing Rows

Features for editing rows can be accessed via the green rectangle that is displayed to the left of each row.

In addition to adding panels to a row, you can collapse or remove a row, move the position of the row within your dashboard, or set the row height. Row settings allows you, for example, to insert a row title or to hide the Row menu so that the row can no longer be edited.

4.8.5 Editing Panels

Grafana distinguishes between three panel types:

grafana-cmm.png

Panels of type Graph are used to visualize metrics data. A query editor is provided to define the data to be visualized. The editor allows you to combine multiple queries. This means that any number of metrics and data series can be visualized in one panel.

Panels of type Singlestat are also used to visualize metrics data, yet they reduce a single query to a single number. The single number can be, for example, the minimum, maximum, average, or sum of values of the data series. The single number can be translated into a text value, if required.

Panels of type Text are used to insert static text. The text may, for example, provide information for the dashboard users. Text panels are not connected to any metrics data.

As soon as you have added a panel to your dashboard, you can access the options for editing the panel content. For this purpose, click the panel title and use Edit:

  • For panels of type Text, a simple text editor is displayed for entering text. Plain text, HTML, and markdown format are supported.

  • For panels of type Graph and Singlestat, a query editor is displayed to define which data it to be shown. You can add multiple metrics, and apply functions to the metrics. The query results will be visualized in your panel in real time.

A large number of display and formatting features are provided to customize how the content is presented in a panel. Click the panel title to access the corresponding options. The menu that is displayed also allows you to duplicate or remove a panel. To change the size of a panel, click the + and - icons.

You can move panels on your dashboard by simply dragging and dropping them within and between rows.

By default, the time range for panels is controlled by dashboard settings. Use the time picker in the top right corner of your dashboard window to define relative or absolute time ranges. You can also set an auto-refresh interval, or manually refresh the data that is displayed.

4.8.6 Saving and Sharing Dashboards

CMM allows you to save your metrics dashboards locally. Saving a dashboard means exporting it to a JSON file. The JSON file can be edited, it can be shared with other users, and it can be imported to CMM again.

To save a dashboard, use Save in the top right corner of your dashboard window. The option allows you to directly view the JSON syntax and export the dashboard to a JSON file. The JSON file can be forwarded to other users, if required. To import a JSON file, use Open dashboard in the top left corner of the dashboard window.

4.9 Defining Alarms

You have to define alarms to monitor your cloud resources. An alarm definition specifies the metrics to be collected and the threshold at which an alarm is to be triggered for a cloud resource. If the specified threshold is reached or exceeded, the alarm is triggered and notifications can be sent to inform users. By default, an alarm definition is evaluated every minute.

To handle a large variety of monitoring requirements, you can create either simple alarm definitions that refer to one metrics only, or compound alarm definitions that combine multiple metrics and allow you to track and process more complex events.

Example for a simple alarm definition that checks whether the system-level load of the CPU exceeds a threshold of 90 percent:

cpu.system_perc{hostname=monasca} > 90

Example for a simple alarm definition that checks the average time of the system-level load of the CPU over a period of 480 seconds. The alarm is triggered only if this average is greater than 95 percent:

avg(cpu.system_perc{hostname=monasca}, 120) > 95 times 4

Example for a compound alarm definition that evaluates two metrics. The alarm is triggered if either the system-level load of the CPU exceeds a threshold of 90 percent, or if the disk space that is used by the specified service exceeds a threshold of 90 percent:

avg(cpu.system_perc{hostname=monasca}) > 90 OR
max(disk.space_used_perc{service=monitoring}) > 90

To create, edit, and delete alarms, use Monitoring > Alarm Definitions.

The elements that define an alarm are grouped into Details, Expression, and Notifications. They are described in the following sections.

4.9.1 Details

For an alarm definition, you specify the following details:

  • Name. Mandatory identifier of the alarm. The name must be unique within the project for which you define the alarm.

  • Description. Optional. A short description that depicts the purpose of the alarm.

  • Severity. The following severities for an alarm are supported: Low (default), Medium, High, or Critical.

    The severity affects the status information on the Overview page. If an alarm that is defined as Critical is triggered, the corresponding resource is displayed in a red box. If an alarm that is defined as Low, Medium, or High is triggered, the corresponding resource is displayed in a yellow box only.

    The severity level is subjective. Choose a level that is appropriate for prioritizing the alarms in your environment.

Creating an Alarm Definition
Figure 4.2: Creating an Alarm Definition
Expression

The expression defines how to evaluate a metrics. The expression syntax is based on a simple expressive grammar. For details, refer to the monasca API documentation.

Image

To define an alarm expression, proceed as follows:

  1. Select the metrics to be evaluated.

  2. Select a statistical function for the metrics: min to monitor the minimum values, max to monitor the maximum values, sum to monitor the sum of the values, count for the monitored number, or avg for the arithmetic average.

  3. Enter one or multiple dimensions in the Add a dimension field to further qualify the metrics.

    Dimensions filter the data to be monitored. They narrow down the evaluation to specific entities. Each dimension consists of a key/value pair that allows for a flexible and concise description of the data to be monitored, for example, region, availability zone, service tier, or resource ID.

    The dimensions available for the selected metrics are displayed in the Matching Metrics section. Type the name of the key you want to associate with the metrics in the Add a dimension field. You are offered a select list for adding the required key/value pair.

  4. Enter the threshold value at which an alarm is to be triggered, and combine it with a relational operator <, >, <=, or >=.

    The unit of the threshold value is related to the metrics for which you define the threshold, for example, the unit is percentage for cpu.idle_perc or MB for disk.total_used_space_mb.

  5. Switch on the Deterministic option if you evaluate a metrics for which data is received only sporadically. The option should be switched on, for example, for all log metrics. This ensures that the alarm status is OK and displayed as a green box on the Overview page although metrics data has not yet been received.

    Do not switch on the option if you evaluate a metrics for which data is received regularly. This ensures that you instantly notice, for example, that a host machine is offline and that there is no metrics data for the agent to collect. On the Overview page, the alarm status therefore changes from OK to UNDETERMINED and is displayed as a gray box.

  6. Enter one or multiple dimensions in the Match by field if you want these dimensions to be taken into account for triggering alarms.

    Example: If you enter hostname as dimension, individual alarms will be created for each host machine on which metrics data is collected. The expression you have defined is not evaluated as a whole but individually for each host machine in your environment.

    If Match by is set to a dimension, the number of alarms depends on the number of dimension values on which metrics data is received. An empty Match by field results in exactly one alarm.

    To enter a dimension, you can simply type the name of the dimension in the Match by field. The dimensions you enter cannot be changed once the alarm definition is saved.

  7. Build a compound alarm definition to combine multiple metrics in one expression. Using the logical operators AND or OR, any number of sub-expressions can be combined.

    Use the Add button to create a second expression, and choose either AND or OR as Operator to connect it to the one you have already defined. Proceed with the second expression as described in Step 1 to Step 6 above.

    The following options are provided for creating and organizing compound alarm definitions:

    • Create additional sub-expressions using the Add button.

    • Finish editing a sub-expression using the Submit button.

    • Delete a sub-expression using the Remove button.

    • Change the position of a sub-expression using the Up or Down button.

Note
Note

You can also edit the expression syntax directly. For this purpose, save your alarm definition and update it using the Edit Alarm Definition option.

By default, an alarm definition is evaluated every minute. When updating the alarm definition, you can change this interval. For syntax details, refer to the monasca API documentation on Alarm Definition Expressions.

4.9.2 Notifications

You can enable notifications for an alarm definition. As soon as an alarm is triggered, the enabled notifications will be sent.

Image

The Notifications tab allows you to select the notifications from the ones that are predefined in your environment. For a selected notification, you specify whether you want to send it for a status transition to Alarm, OK, and/or Undetermined.

4.10 Defining Notifications

Notifications define how users are informed when a threshold value defined for an alarm is reached or exceeded. In the alarm definition, you can assign one or multiple notifications.

For a notification, you specify the following elements:\o/

  • Name. A unique identifier of the notification. The name is offered for selection when defining an alarm.

  • Type. Email is the notification method supported by SUSE OpenStack Cloud Crowbar Monitoring. If you want to use WebHook or PagerDuty, contact your SUSE OpenStack Cloud Crowbar Monitoring support for further information.

  • Address. The email address to be notified when an alarm is triggered.

    Note
    Note

    Generic top-level domains such as business domain names are not supported in email addresses (for example, user@xyz.company).

To create, edit, and delete notifications, use Monitoring > Notifications.

4.11 Status of Services, Servers, and Log Data

An alarm definition for a service, server, or log data is evaluated over the interval specified in the alarm expression. The alarm definition is re-evaluated in each subsequent interval. The following alarm statuses are distinguished:

  • Alarm. The alarm expression has evaluated to true. An alarm has been triggered for the cloud resource.

  • OK. The alarm expression has evaluated to false. There is no need to trigger an alarm.

  • Undetermined. No metrics data has been received within the defined interval.

As soon as you have defined an alarm for a cloud resource, there is status information displayed for it on the Overview page:

The color of the boxes in the three sections indicates the status:

  • A green box for a service or server indicates that it is up and running. A green box for a log path indicates that a defined threshold for errors or warnings, for example, has not yet been reached or exceeded. There are alarms defined for the services, servers, or log paths, but no alarms have been triggered.

  • A red box for a service, server, or log path indicates that there is a severe problem that needs to be checked. One or multiple alarms defined for a service, a server, or log data have been triggered.

  • A yellow box indicates a problem. One or multiple alarms have already been triggered, yet, the severity of these alarms is low.

  • A gray box indicates that alarms have been defined. Yet, metrics data has not been received.

The status information on the Overview page results from one or multiple alarms that have been defined for the corresponding resource. If multiple alarms are defined, the severity of the individual alarms controls the status color.

You can click a resource on the Overview page to display details on the related alarms. The details include the status of each alarm and the expression that is evaluated. For each alarm, you can drill down on the alarm history. To narrow down the problem, the history presents detailed information on the status transitions.

4.12 Supported Metrics

The sections below describe the metrics supported by SUSE OpenStack Cloud Crowbar Monitoring:

  • Standard metrics for general monitoring of servers and networks.

  • Additional metrics for monitoring specific servers and services.

4.12.1 Standard Metrics

SUSE OpenStack Cloud Crowbar Monitoring supports the following standard metrics for monitoring servers and networks. These metrics usually do not require specific settings. The metrics are grouped by metrics types. Each metrics type references a set of related metrics.

cpu.yaml

Metrics on CPU usage, e.g. the percentage of time the CPU is idle when no I/O requests are in progress, or the percentage of time the CPU is used at system level or user level.

disk.yaml

Metrics on disk space, e.g. the percentage of disk space that is used on a device, or the total amount of disk space aggregated across all the disks on a particular node.

load.yaml

Metrics on the average system load over different periods (e.g. 1 minute, 5 minutes, or 15 minutes).

memory.yaml

Metrics on memory usage, e.g. the number of megabytes of total memory or free memory, or the percentage of free swap memory.

network.yaml

Metrics on the network, e.g. the number of network bytes received or sent per second, or the number of network errors on incoming or outgoing network traffic per second.

These metrics are configured automatically on all machines and nodes that have the monasca-agent role assigned. This applies not only to network.yaml but also to all metrics covered in this chapter.

4.12.2 Additional Metrics

In addition to the standard metrics, SUSE OpenStack Cloud Crowbar Monitoring automatically adds the following additional metrics to the monasca agent configuration on the OpenStack Controller.

http_check.yaml

HTTP endpoint checks perform up/down checks on HTTP endpoints. Based on a list of URLs, the agent sends an HTTP request and reports success or failure to the Monitoring Service.

The following barclamps will automatically create an HTTP check for the API services they deploy if the monasca barclamp is active:

  • barbican

  • cinder

  • glance

  • heat

  • keystone

  • Magnum

  • manila

  • neutron

  • nova

  • sahara

  • swift

By default, the monitoring dashboard is configured to display the service status for the following services:

  • cinder

  • glance

  • keystone

  • neutron

  • nova

  • swift

The status visualization for additional services can be added manually.

postgres.yaml

Postgres checks gather various CRUD and system statistics for a database hosted by a PostgreSQL DBMS.

The following barclamps will automatically create Postgres checks for their service database if the monasca barclamp is active:

  • barbican

  • cinder

  • glance

  • heat

  • keystone

  • Magnum

  • manila

  • neutron

  • nova

  • sahara