Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE OpenStack Cloud 8

3 Third-Party Integrations Edit source

3.1 Splunk Integration Edit source

This documentation demonstrates the possible integration between the SUSE OpenStack Cloud 8 centralized logging solution and Splunk including the steps to set up and forward logs.

The SUSE OpenStack Cloud 8 logging solution provides a flexible and extensible framework to centralize the collection and processing of logs from all of the nodes in a cloud. The logs are shipped to a highly available and fault tolerant cluster where they are transformed and stored for better searching and reporting. The SUSE OpenStack Cloud 8 logging solution uses the ELK stack (Elasticsearch, Logstash and Kibana) as a production grade implementation and can support other storage and indexing technologies. The Logstash pipeline can be configured to forward the logs to an alternative target if you wish.

This documentation demonstrates the possible integration between the SUSE OpenStack Cloud 8 centralized logging solution and Splunk including the steps to set up and forward logs.

3.1.1 What is Splunk? Edit source

Splunk is software for searching, monitoring, and analyzing machine-generated big data, via a web-style interface. Splunk captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards and visualizations. It is commercial software (unlike Elasticsearch) and more details about Splunk can be found at https://www.splunk.com.

3.1.2 Configuring Splunk to receive log messages from SUSE OpenStack Cloud 8 Edit source

This documentation assumes that you already have Splunk set up and running. For help with installing and setting up Splunk, refer to Splunk Tutorial.

There are different ways in which a log message (or "event" in Splunk's terminology) can be sent to Splunk. These steps will set up a TCP port where Splunk will listen for messages.

  1. On the Splunk web UI, click on the Settings menu in the upper right-hand corner.

  2. In the Data section of the Settings menu, click Data Inputs.

  3. Choose the TCP option.

  4. Click the New button to add an input.

  5. In the Port field, enter the port number you want to use.

    Note
    Note

    If you are on a less secure network and want to restrict connections to this port, use the Only accept connection from field to restrict the traffic to a specific IP address.

  6. Click the Next button.

  7. Specify the Source Type by clicking on the Select button and choosing linux_messages_syslog from the list.

  8. Click the Review button.

  9. Review the configuration and click the Submit button.

  10. A success message will be displayed.

3.1.3 Forwarding log messages from SUSE OpenStack Cloud 8 Centralized Logging to Splunk Edit source

When you have Splunk set up and configured to receive log messages, you can configure SUSE OpenStack Cloud 8 to forward the logs to Splunk.

  1. Log in to the Cloud Lifecycle Manager.

  2. Check the status of the logging service:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts logging-status.yml

    If everything is up and running, continue to the next step.

  3. Edit the logstash config file at the location below:

    ~/openstack/ardana/ansible/roles/logging-server/templates/logstash.conf.j2

    At the bottom of the file will be a section for the Logstash outputs. Add details about your Splunk environment details.

    Below is an example, showing the placement in bold:

    # Logstash outputs
    #------------------------------------------------------------------------------
    output {
      # Configure Elasticsearch output
      # http://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
      elasticsearch {
        index => %{[@metadata][es_index]}
        hosts => ["{{ elasticsearch_http_host }}:{{ elasticsearch_http_port }}"]
        flush_size => {{ logstash_flush_size }}
        idle_flush_time => 5
        workers => {{ logstash_threads }}
      }
       # Forward Logs to Splunk on the TCP port that matches the one specified in Splunk Web UI.
     tcp {
       mode => "client"
       host => "<Enter Splunk listener IP address>"
       port => TCP_PORT_NUMBER
     }
    }
    Note
    Note

    If you are not planning on using the Splunk UI to parse your centralized logs, there is no need to forward your logs to Elasticsearch. In this situation, comment out the lines in the Logstash outputs pertaining to Elasticsearch. However, you can continue to forward your centralized logs to multiple locations.

  4. Commit your changes to git:

    ardana > cd ~/openstack/ardana/ansible
    ardana > git add -A
    ardana > git commit -m "Logstash configuration change for Splunk integration"
  5. Run the configuration processor:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost config-processor-run.yml
  6. Update your deployment directory:

    ardana > cd ~/openstack/ardana/ansible
    ardana > ansible-playbook -i hosts/localhost ready-deployment.yml
  7. Complete this change with a reconfigure of the logging environment:

    ardana > cd ~/scratch/ansible/next/ardana/ansible
    ardana > ansible-playbook -i hosts/verb_hosts logging-configure.yml
  8. In your Splunk UI, confirm that the logs have begun to forward.

3.1.4 Searching for log messages from the Spunk dashboard Edit source

To both verify that your integration worked and to search your log messages that have been forwarded you can navigate back to your Splunk dashboard. In the search field, use this string:

source="tcp:TCP_PORT_NUMBER"

Find information on using the Splunk search tool at http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/WelcometotheSearchTutorial.

3.2 Nagios Integration Edit source

SUSE OpenStack Cloud cloud operators that are using Nagios or Icinga-based monitoring systems may wish to integrate them with the built-in monitoring infrastructure of SUSE OpenStack Cloud. Integrating with the existing monitoring processes and procedures will reduce support overhead and avoid duplication. This document describes the different approaches that can be taken to create a well-integrated monitoring dashboard using both technologies.

Note
Note

This document refers to Nagios but the proposals will work equally well with Icinga, Icinga2, or other Nagios clone monitoring systems.

3.2.1 SUSE OpenStack Cloud monitoring and reporting Edit source

SUSE OpenStack Cloud comes with a monitoring engine (Monasca) and a separate management dashboard (Operations Console). Monasca is extremely scalable, designed to cope with the constant change in monitoring sources and services found in a cloud environment. Monitoring agents running on hosts (physical and virtual) submit data to the Monasca message bus via a RESTful API. Threshold and notification engines then trigger alarms when predefined thresholds are passed. Notification methods are flexible and extensible. Typical examples of notification methods would be emails generated or creating alarms in PagerDuty.

While extensible, Monasca is largely focused on monitoring cloud infrastructures rather than traditional environments such as server hardware, network links, switches, etc. For more details about the monitoring service, see Section 12.1, “Monitoring”.

The Operations Console (Ops Console) provides cloud administrators a clear web interfaces to view alarm status, management alarm workflow, and configure alarms and thresholds. For more details about the Ops Console, see Book “User Guide”, Chapter 1 “Using the Operations Console”, Section 1.1 “Operations Console Overview”.

3.2.2 Nagios monitoring and reporting Edit source

Nagios is an industry leading open source monitoring service with extensive plugins and agents. Nagios checks are either run directly from the monitoring server or run on a remote host via an agent and with results submitted back to the monitoring server. While Nagios has proven extremely flexible and scalable, it requires significant explicit configuration. Using Nagios to monitor guest virtual machines becomes more challenging because virtual machines can be ephemeral which means new virtual machines are created and destroyed regularly. Configuration automation (Chef, Puppet, Ansible etc) can create a more dynamic Nagios setup but they still require the Nagios service to be restarted every time a new host is added.

A key benefit of Nagios style monitoring is that it allows for SUSE OpenStack Cloud to be monitored externally, from a user or service perspective. For example, checks can be created to monitor availability of all the API endpoints from external locations or even to create and destroy instances to ensure the entire system is working as expected.

3.2.3 Adding Monasca Edit source

Many private cloud operators already have existing monitoring solutions such as Nagios and Icinga. We recommend that you extend your existing solutions into Monasca or forward Monasca alerts to your existing solution to maximize coverage and reduce risk.

3.2.4 Integration Approaches Edit source

Integration between Nagios and Monasca can occur at two levels, at the individual check level or at the management interfaces. Both options are discussed in the following sections.

Running Nagios-style checks in the Monasca agents

The Monasca agent is installed on all SUSE OpenStack Cloud servers and includes the ability to execute Nagios-style plugins as well as its own plugin scripts. For this configuration check, plugins need to be installed on the required server then added to the Monasca configuration under /etc/monasca/agent/conf.d. Care should be taken as plugins that take a long time (greater than 10 seconds) to run can result in the Monasca agent failing to run its own checks in the allotted time and therefore stopping all client monitoring. Issues have been seen with hardware monitoring plugins that can take greater than 30 seconds and any plugins relying on name resolution when DNS services are not available. Details on the required Monasca configuration can be found at https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md#nagios-wrapper.

Use Case:

  • Local host checking. As an operator I want to run a local monitoring check on my host to check physical hardware. Check status and alert management will be based around the Operations Console, not Nagios.

Limitation

  • As mentioned earlier, care should be taken to ensure checks do not introduce load or delays in the Monasca agent check cycle. Additionally, depending on the operating system the node is running, plugins or dependencies may not be available.

Using Nagios as a central dashboard

It is possible to create a Nagios-style plugin that will query the Monasca API endpoint for an alarm status to create Nagios alerts and alarms based on Monasca alarms and filters. Monasca alarms appear in Nagios using two approaches, one listing checks by service and the other listing checks by physical host.

In the top section of the Nagios-style plugin, services can be created under a dummy host, monasca_endpoint. Each service retrieves all alarms based on defined dimensions. For example the ardana-compute check will return all alarms with the compute (Nova) dimension.

In the bottom section, the physical servers making up the SUSE OpenStack Cloud cluster can be defined and checks can be run. For example, one could check the server hardware from the Nagios server using a third party plugin and the another could retrieve all monasca alarms related to that host.

To build this configuration, a custom Nagios plugin (Please see example plugin at: https://github.com/openstack/python-monascaclient/tree/stable/pike/examples) was created with the following options:

check_monasca –c CREDENTIALS -d DIMENSION -v VALUE

Examples:

To check alarms on test-ccp-comp001-mgmt you would use:

check_monasca –c service.osrc –d hostname –v test-ccp-comp001-mgmt

To check all Network related alarms, you would use:

check_monasca –c service.osrc –d service –v networking

Use Cases:

  • Multiple clouds, integrating SUSE OpenStack Cloud monitoring with existing monitoring capabilities or viewing Monasca alerts in Nagios, fully integrating Monasca alarms with Nagios alarms and workflow.

  • In a predominantly Nagios or Icinga-based monitoring environment, Monasca alarm status can be integrated into existing processes and workflows. This approach works best for checks associated with physical servers running the SUSE OpenStack Cloud services.

  • With multiple SUSE OpenStack Cloud clusters, all of their alarms can be consolidated into a single view, the current version of Operations Console is for a single cluster only.

Limitations

  • Nagios has a more traditional configuration model that requires checks to belong to predefined services and hosts, this is not well suited in highly dynamic cloud environments where the lifespan of virtual instances can be very short. One possible solution is with Icinga2 which has an API available to dynamically add host and service definitions, the check plugin could be extended to create alarm definitions dynamically as they occur.

    The key disadvantage is that multiple alarms can appear as a single service. For example, suppose there are 3 warnings against one service. If the operator acknowledges this alarm and subsequently a 4th warning alarm occurs, it would not generate an alert and could get missed.

    Care has to be taken that alarms are not missed. If the defined checks are only looking for checks in an ALARM status they will not report undetermined checks that might indicate other issues.

Using Operations Console as central dashboard

Nagios has the ability to run custom scripts in response to events. It is therefore possible to write a plugin to update Monasca whenever a Nagios alert occurs. The Operations Console could then be used as a central reporting dashboard for both Monasca and Nagios alarms. The external Nagios alarms can have their own check dimension and could be displayed as a separate group in the Operations Console.

Use Cases

  • Using Operations Console the central monitoring tool.

Limitations

  • The alarm could not be acknowledged from the Operations Console so Nagios could send repetitive notifications unless configured to take this into account.

SUSE OpenStack Cloud-specific Nagios Plugins

Several OpenStack plugin packages exist (see https://launchpad.net/ubuntu/+source/nagios-plugins-openstack) that are useful to run from external sources to ensure the overall system is working as expected. Monasca requires some OpenStack components to be working in order to work at all. For example, if Keystone were unavailable, Monasca could not authenticate client or console requests. An external service check could highlight this.

3.2.5 Common integration issues Edit source

Alarm status differences

Monasca and Nagios treat alarms and status in different ways and for the two systems to talk there needs to be a mapping between them. The following table details the alarm parameters available for each:

SystemStatusSeverityDetails
NagiosOK Plugin returned OK with given thresholds
WARNING Plugin returned WARNING based on thresholds
CRITICAL Plugin returned CRITICAL alarm
UNKNOWN Plugin failed
MonascaOK No alarm triggered
ALARMLOWAlarm state, LOW impact
ALARMMEDIUMAlarm state, MEDIUM impact
ALARMHIGHAlarm state, HIGH impact
UNDETERMINED No metrics received

In the plugin described here, the mapping was created with this flow:

Monasca OK -> Nagios OK
Monasca ALARM ( LOW or MEDIUM ) -> Nagios Warning
Monasca ALARM ( HIGH ) -> Nagios Critical

Alarm workflow differences

In both, system alarms can be acknowledged in the dashboards to indicate they are being worked on (or ignored). Not all the scenarios above will provide the same level of workflow integration.

3.3 Operations Bridge Integration Edit source

The SUSE OpenStack Cloud 8 monitoring solution (Monasca) can easily be integrated with your existing monitoring tools. Integrating SUSE OpenStack Cloud 8 Monasca with Operations Bridge using the Operations Bridge Connector simplifies monitoring and managing events and topology information.

The integration provides the following functionality:

  • Forwarding of SUSE OpenStack Cloud Monasca alerts and topology to Operations Bridge for event correlation

  • Customization of forwarded events and topology

For more information about this connector please see https://software.microfocus.com/en-us/products/operations-bridge-suite/overview.

3.4 Monitoring Third-Party Components With Monasca Edit source

3.4.1 Monasca Monitoring Integration Overview Edit source

Monasca, the SUSE OpenStack Cloud 8 monitoring service, collects information about your cloud's systems, and allows you to create alarm definitions based on these measurements. Monasca-agent is the component that collects metrics such as metric storage and alarm thresholding and forwards them to the monasca-api for further processing.

With a small amount of configuration, you can use the detection and check plugins that are provided with your cloud to monitor integrated third-party components. In addition, you can write custom plugins and integrate them with the existing monitoring service.

Find instructions for customizing existing plugins to monitor third-party components in the Section 3.4.4, “Configuring Check Plugins”.

Find instructions for installing and configuring new custom plugins in the Section 3.4.3, “Writing Custom Plugins”.

You can also use existing alarm definitions, as well as create new alarm definitions that relate to a custom plugin or metric. Instructions for defining new alarm definitions are in the Section 3.4.6, “Configuring Alarm Definitions”.

You can use the Operations Console and Monasca CLI to list all of the alarms, alarm-definitions, and metrics that exist on your cloud.

3.4.2 Monasca Agent Edit source

The Monasca agent (monasca-agent) collects information about your cloud using the installed plugins. The plugins are written in Python, and determine the monitoring metrics for your system, as well as the interval for collection. The default collection interval is 30 seconds, and we strongly recommend not changing this default value.

The following two types of custom plugins can be added to your cloud.

  • Detection Plugin. Determines whether the monasca-agent has the ability to monitor the specified component or service on a host. If successful, this type of plugin configures an associated check plugin by creating a YAML configuration file.

  • Check Plugin. Specifies the metrics to be monitored, using the configuration file created by the detection plugin.

Monasca-agent is installed on every server in your cloud, and provides plugins that monitor the following.

  • System metrics relating to CPU, memory, disks, host availability, etc.

  • Process health metrics (process, http_check)

  • SUSE OpenStack Cloud 8-specific component metrics, such as apache rabbitmq, kafka, cassandra, etc.

Monasca is pre-configured with default check plugins and associated detection plugins. The default plugins can be reconfigured to monitor third-party components, and often only require small adjustments to adapt them to this purpose. Find a list of the default plugins here: https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md#detection-plugins

Often, a single check plugin will be used to monitor multiple services. For example, many services use the http_check.py detection plugin to detect the up/down status of a service endpoint. Often the process.py check plugin, which provides process monitoring metrics, is used as a basis for a custom process detection plugin.

More information about the Monasca agent can be found in the following locations

3.4.3 Writing Custom Plugins Edit source

When the pre-built Monasca plugins do not meet your monitoring needs, you can write custom plugins to monitor your cloud. After you have written a plugin, you must install and configure it.

When your needs dictate a very specific custom monitoring check, you must provide both a detection and check plugin.

The steps involved in configuring a custom plugin include running a detection plugin and passing any necesssary parameters to the detection plugin so the resulting check configuration file is created with all necessary data.

When using an existing check plugin to monitor a third-party component, a custom detection plugin is needed only if there is not an associated default detection plugin.

Check plugin configuration files

Each plugin needs a corresponding YAML configuration file with the same stem name as the plugin check file. For example, the plugin file http_check.py (in /usr/lib/python2.7/site-packages/monasca_agent/collector/checks_d/) should have a corresponding configuration file, http_check.yaml (in /etc/monasca/agent/conf.d/http_check.yaml). The stem name http_check must be the same for both files.

Permissions for the YAML configuration file must be read+write for mon-agent user (the user that must also own the file), and read for the mon-agent group. Permissions for the file must be restricted to the mon-agent user and monasca group. The following example shows correct permissions settings for the file http_check.yaml.

ardana > ls -alt /etc/monasca/agent/conf.d/http_check.yaml
-rw-r----- 1 monasca-agent monasca 10590 Jul 26 05:44 http_check.yaml

A check plugin YAML configuration file has the following structure.

init_config:
    key1: value1
    key2: value2

instances:
    - name: john_smith
      username: john_smith
      password: 123456
    - name: jane_smith
      username: jane_smith
      password: 789012

In the above file structure, the init_config section allows you to specify any number of global key:value pairs. Each pair will be available on every run of the check that relates to the YAML configuration file.

The instances section allows you to list the instances that the related check will be run on. The check will be run once on each instance listed in the instances section. Ensure that each instance listed in the instances section has a unique name.

Custom detection plugins

Detection plugins should be written to perform checks that ensure that a component can be monitored on a host. Any arguments needed by the associated check plugin are passed into the detection plugin at setup (configuration) time. The detection plugin will write to the associated check configuration file.

When a detection plugin is successfully run in the configuration step, it will write to the check configuration YAML file. The configuration file for the check is written to the following directory.

/etc/monasca/agent/conf.d/

Writing process detection plugin using the ServicePlugin class

The monasca-agent provides a ServicePlugin class that makes process detection monitoring easy.

Process check

The process check plugin generates metrics based on the process status for specified process names. It generates process.pid_count metrics for the specified dimensions, and a set of detailed process metrics for the specified dimensions by default.

The ServicePlugin class allows you to specify a list of process name(s) to detect, and uses psutil to see if the process exists on the host. It then appends the process.yml configuration file with the process name(s), if they do not already exist.

The following is an example of a process.py check ServicePlugin.

import monasca_setup.detection

class MonascaTransformDetect(monasca_setup.detection.ServicePlugin):
    """Detect Monasca Transform daemons and setup configuration to monitor them."""
    def __init__(self, template_dir, overwrite=False, args=None):
        log.info("      Watching the monasca transform processes.")
        service_params = {
            'args': {},
            'template_dir': template_dir,
            'overwrite': overwrite,
            'service_name': 'monasca-transform',
            'process_names': ['monasca-transform','pyspark',
                              'transform/lib/driver']
        }
        super(MonascaTransformDetect, self).__init__(service_params)

Writing a Custom Detection Plugin using Plugin or ArgsPlugin classes

A custom detection plugin class should derive from either the Plugin or ArgsPlugin classes provided in the /usr/lib/python2.7/site-packages/monasca_setup/detection directory.

If the plugin parses command line arguments, the ArgsPlugin class is useful. The ArgsPlugin class derives from the Plugin class. The ArgsPlugin class has a method to check for required arguments, and a method to return the instance that will be used for writing to the configuration file with the dimensions from the command line parsed and included.

If the ArgsPlugin methods do not seem to apply, then derive directly from the Plugin class.

When deriving from these classes, the following methods should be implemented.

  • _detect - set self.available=True when conditions are met that the thing to monitor exists on a host.

  • build_config - writes the instance information to the configuration and return the configuration.

  • dependencies_installed (default implementation is in ArgsPlugin, but not Plugin) - return true when python dependent libraries are installed.

The following is an example custom detection plugin.

import ast
import logging

import monasca_setup.agent_config
import monasca_setup.detection

log = logging.getLogger(__name__)


class HttpCheck(monasca_setup.detection.ArgsPlugin):
    """Setup an http_check according to the passed in args.
       Despite being a detection plugin this plugin does no detection and will be a noop without   arguments.
       Expects space separated arguments, the required argument is url. Optional parameters include:
       disable_ssl_validation and match_pattern.
    """

    def _detect(self):
        """Run detection, set self.available True if the service is detected.
        """
        self.available = self._check_required_args(['url'])

    def build_config(self):
        """Build the config as a Plugins object and return.
        """
        config = monasca_setup.agent_config.Plugins()
        # No support for setting headers at this time
        instance = self._build_instance(['url', 'timeout', 'username', 'password',
                                         'match_pattern', 'disable_ssl_validation',
                                         'name', 'use_keystone', 'collect_response_time'])

        # Normalize any boolean parameters
        for param in ['use_keystone', 'collect_response_time']:
            if param in self.args:
                instance[param] = ast.literal_eval(self.args[param].capitalize())
        # Set some defaults
        if 'collect_response_time' not in instance:
            instance['collect_response_time'] = True
        if 'name' not in instance:
            instance['name'] = self.args['url']

        config['http_check'] = {'init_config': None, 'instances': [instance]}

        return config

Installing a detection plugin in the OpenStack version delivered with SUSE OpenStack Cloud

Install a plugin by copying it to the plugin directory (/usr/lib/python2.7/site-packages/monasca_agent/collector/checks_d/).

The plugin should have file permissions of read+write for the root user (the user that should also own the file) and read for the root group and all other users.

The following is an example of correct file permissions for the http_check.py file.

-rw-r--r-- 1 root root 1769 Sep 19 20:14 http_check.py

Detection plugins should be placed in the following directory.

/usr/lib/monasca/agent/custom_detect.d/

The detection plugin directory name should be accessed using the monasca_agent_detection_plugin_dir Ansible variable. This variable is defined in the roles/monasca-agent/vars/main.yml file.

monasca_agent_detection_plugin_dir: /usr/lib/monasca/agent/custom_detect.d/

Example: Add Ansible monasca_configure task to install the plugin. (The monasca_configure task can be added to any service playbook.) In this example, it is added to ~/openstack/ardana/ansible/roles/_CEI-CMN/tasks/monasca_configure.yml.

---
- name: _CEI-CMN | monasca_configure |
    Copy Ceilometer Custom plugin
  become: yes
  copy:
    src: ardanaceilometer_mon_plugin.py
    dest: "{{ monasca_agent_detection_plugin_dir }}"
    owner: root
    group: root
    mode: 0440

Custom check plugins

Custom check plugins generate metrics. Scalability should be taken into consideration on systems that will have hundreds of servers, as a large number of metrics can affect performance by impacting disk performance, RAM and CPU usage.

You may want to tune your configuration parameters so that less-important metrics are not monitored as frequently. When check plugins are configured (when they have an associated YAML configuration file) the agent will attempt to run them.

Checks should be able to run within the 30-second metric collection window. If your check runs a command, you should provide a timeout to prevent the check from running longer than the default 30-second window. You can use the monasca_agent.common.util.timeout_command to set a timeout for in your custom check plugin python code.

Find a description of how to write custom check plugins at https://github.com/openstack/monasca-agent/blob/master/docs/Customizations.md#creating-a-custom-check-plugin

Custom checks derive from the AgentCheck class located in the monasca_agent/collector/checks/check.py file. A check method is required.

Metrics should contain dimensions that make each item that you are monitoring unique (such as service, component, hostname). The hostname dimension is defined by default within the AgentCheck class, so every metric has this dimension.

A custom check will do the following.

  • Read the configuration instance passed into the check method.

  • Set dimensions that will be included in the metric.

  • Create the metric with gauge, rate, or counter types.

Metric Types:

  • gauge: Instantaneous reading of a particular value (for example, mem.free_mb).

  • rate: Measurement over a time period. The following equation can be used to define rate.

    rate=delta_v/float(delta_t)
  • counter: The number of events, increment and decrement methods, for example, zookeeper.timeouts

The following is an example component check named SimpleCassandraExample.

import monasca_agent.collector.checks as checks
from monasca_agent.common.util import timeout_command

CASSANDRA_VERSION_QUERY = "SELECT version();"


class SimpleCassandraExample(checks.AgentCheck):

    def __init__(self, name, init_config, agent_config):
        super(SimpleCassandraExample, self).__init__(name, init_config, agent_config)

    @staticmethod
    def _get_config(instance):
        user = instance.get('user')
        password = instance.get('password')
        service = instance.get('service')
        timeout = int(instance.get('timeout'))

        return user, password, service, timeout

    def check(self, instance):
        user, password, service, node_name, timeout = self._get_config(instance)

        dimensions = self._set_dimensions({'component': 'cassandra', 'service': service}, instance)

        results, connection_status = self._query_database(user, password, timeout, CASSANDRA_VERSION_QUERY)

        if connection_status != 0:
            self.gauge('cassandra.connection_status', 1, dimensions=dimensions)
        else:
            # successful connection status
            self.gauge('cassandra.connection_status', 0, dimensions=dimensions)

    def _query_database(self, user, password, timeout, query):
        stdout, stderr, return_code = timeout_command(["/opt/cassandra/bin/vsql", "-U", user, "-w", password, "-A", "-R",
                                                       "|", "-t", "-F", ",", "-x"], timeout, command_input=query)
        if return_code == 0:
            # remove trailing newline
            stdout = stdout.rstrip()
            return stdout, 0
        else:
            self.log.error("Error querying cassandra with return code of {0} and error {1}".format(return_code, stderr))
            return stderr, 1

Installing check plugin

The check plugin needs to have the same file permissions as the detection plugin. File permissions must be read+write for the root user (the user that should own the file), and read for the root group and all other users.

Check plugins should be placed in the following directory.

/usr/lib/monasca/agent/custom_checks.d/

The check plugin directory should be accessed using the monasca_agent_check_plugin_dir Ansible variable. This variable is defined in the roles/monasca-agent/vars/main.yml file.

monasca_agent_check_plugin_dir: /usr/lib/monasca/agent/custom_checks.d/

3.4.4 Configuring Check Plugins Edit source

Manually configure a plugin when unit-testing using the monasca-setup script installed with the monasca-agent

Find a good explanation of configuring plugins here: https://github.com/openstack/monasca-agent/blob/master/docs/Agent.md#configuring

SSH to a node that has both the monasca-agent installed as well as the component you wish to monitor.

The following is an example command that configures a plugin that has no parameters (uses the detection plugin class name).

root # /usr/bin/monasca-setup -d ARDANACeilometer

The following is an example command that configures the apache plugin and includes related parameters.

root # /usr/bin/monasca-setup -d apache -a 'url=http://192.168.245.3:9095/server-status?auto'

If there is a change in the configuration it will restart the monasca-agent on the host so the configuration is loaded.

After the plugin is configured, you can verify that the configuration file has your changes (see the next Verify that your check plugin is configured section).

Use the monasca CLI to see if your metric exists (see the Verify that metrics exist section).

Using Ansible modules to configure plugins in SUSE OpenStack Cloud 8

The monasca_agent_plugin module is installed as part of the monasca-agent role.

The following Ansible example configures the process.py plugin for the Ceilometer detection plugin. The following example only passes in the name of the detection class.

- name: _CEI-CMN | monasca_configure |
    Run Monasca agent Cloud Lifecycle Manager specific ceilometer detection plugin
  become: yes
  monasca_agent_plugin:
    name: "ARDANACeilometer"

If a password or other sensitive data are passed to the detection plugin, the no_log option should be set to True. If the no_log option is not set to True, the data passed to the plugin will be logged to syslog.

The following Ansible example configures the Cassandra plugin and passes in related arguments.

 - name: Run Monasca Agent detection plugin for Cassandra
   monasca_agent_plugin:
     name: "Cassandra"
     args="directory_names={{ FND_CDB.vars.cassandra_data_dir }},{{ FND_CDB.vars.cassandra_commit_log_dir }} process_username={{ FND_CDB.vars.cassandra_user }}"
   when: database_type == 'cassandra'

The following Ansible example configures the Keystone endpoint using the http_check.py detection plugin. The class name httpcheck of the http_check.py detection plugin is the name.

root # - name:  keystone-monitor | local_monitor |
    Setup active check on keystone internal endpoint locally
  become: yes
  monasca_agent_plugin:
    name: "httpcheck"
    args: "use_keystone=False \
           url=http://{{ keystone_internal_listen_ip }}:{{
               keystone_internal_port }}/v3 \
           dimensions=service:identity-service,\
                       component:keystone-api,\
                       api_endpoint:internal,\
                       monitored_host_type:instance"
  tags:
    - keystone
    - keystone_monitor

Verify that your check plugin is configured

All check configuration files are located in the following directory. You can see the plugins that are running by looking at the plugin configuration directory.

/etc/monasca/agent/conf.d/

When the monasca-agent starts up, all of the check plugins that have a matching configuration file in the /etc/monasca/agent/conf.d/ directory will be loaded.

If there are errors running the check plugin they will be written to the following error log file.

/var/log/monasca/agent/collector.log

You can change the monasca-agent log level by modifying the log_level option in the /etc/monasca/agent/agent.yaml configuration file, and then restarting the monasca-agent, using the following command.

root # service openstack-monasca-agent restart

You can debug a check plugin by running monasca-collector with the check option. The following is an example of the monasca-collector command.

tux > sudo /usr/bin/monasca-collector check CHECK_NAME

Verify that metrics exist

Begin by logging in to your deployer or controller node.

Run the following set of commands, including the monasca metric-list command. If the metric exists, it will be displayed in the output.

ardana > source ~/service.osrc
ardana > monasca metric-list --name METRIC_NAME

3.4.5 Metric Performance Considerations Edit source

Collecting metrics on your virtual machines can greatly affect performance. SUSE OpenStack Cloud 8 supports 200 compute nodes, with up to 40 VMs each. If your environment is managing maximum number of VMs, adding a single metric for all VMs is the equivalent of adding 8000 metrics.

Because of the potential impact that new metrics have on system performance, consider adding only new metrics that are useful for alarm-definition, capacity planning, or debugging process failure.

3.4.6 Configuring Alarm Definitions Edit source

The monasca-api-spec, found here https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md provides an explanation of Alarm Definitions and Alarms. You can find more information on alarm definition expressions at the following page: https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md#alarm-definition-expressions.

When an alarm definition is defined, the monasca-threshold engine will generate an alarm for each unique instance of the match_by metric dimensions found in the metric. This allows a single alarm definition that can dynamically handle the addition of new hosts.

There are default alarm definitions configured for all "process check" (process.py check) and "HTTP Status" (http_check.py check) metrics in the monasca-default-alarms role. The monasca-default-alarms role is installed as part of the Monasca deployment phase of your cloud's deployment. You do not need to create alarm definitions for these existing checks.

Third parties should create an alarm definition when they wish to alarm on a custom plugin metric. The alarm definition should only be defined once. Setting a notification method for the alarm definition is recommended but not required.

The following Ansible modules used for alarm definitions are installed as part of the monasca-alarm-definition role. This process takes place during the Monasca set up phase of your cloud's deployment.

  • monasca_alarm_definition

  • monasca_notification_method

The following examples, found in the ~/openstack/ardana/ansible/roles/monasca-default-alarms directory, illustrate how Monasca sets up the default alarm definitions.

Monasca Notification Methods

The monasca-api-spec, found in the following link, provides details about creating a notification https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md#create-notification-method

The following are supported notification types.

  • EMAIL

  • WEBHOOK

  • PAGERDUTY

The keystone_admin_tenant project is used so that the alarms will show up on the Operations Console UI.

The following file snippet shows variables from the ~/openstack/ardana/ansible/roles/monasca-default-alarms/defaults/main.yml file.

---
notification_address: root@localhost
notification_name: 'Default Email'
notification_type: EMAIL

monasca_keystone_url: "{{ KEY_API.advertises.vips.private[0].url }}/v3"
monasca_api_url: "{{ MON_AGN.consumes_MON_API.vips.private[0].url }}/v2.0"
monasca_keystone_user: "{{ MON_API.consumes_KEY_API.vars.keystone_monasca_user }}"
monasca_keystone_password: "{{ MON_API.consumes_KEY_API.vars.keystone_monasca_password | quote }}"
monasca_keystone_project: "{{ KEY_API.vars.keystone_admin_tenant }}"

monasca_client_retries: 3
monasca_client_retry_delay: 2

You can specify a single default notification method in the ~/openstack/ardana/ansible/roles/monasca-default-alarms/tasks/main.yml file. You can also add or modify the notification type and related details using the Operations Console UI or Monasca CLI.

The following is a code snippet from the ~/openstack/ardana/ansible/roles/monasca-default-alarms/tasks/main.yml file.

---
- name: monasca-default-alarms | main | Setup default notification method
  monasca_notification_method:
    name: "{{ notification_name }}"
    type: "{{ notification_type }}"
    address: "{{ notification_address }}"
    keystone_url: "{{ monasca_keystone_url }}"
    keystone_user: "{{ monasca_keystone_user }}"
    keystone_password: "{{ monasca_keystone_password }}"
    keystone_project: "{{ monasca_keystone_project }}"
    monasca_api_url: "{{ monasca_api_url }}"
  no_log: True
  tags:
    - system_alarms
    - monasca_alarms
    - openstack_alarms
  register: default_notification_result
  until: not default_notification_result | failed
  retries: "{{ monasca_client_retries }}"
  delay: "{{ monasca_client_retry_delay }}"

Monasca Alarm Definition

In the alarm definition "expression" field, you can specify the metric name and threshold. The "match_by" field is used to create a new alarm for every unique combination of the match_by metric dimensions.

Find more details on alarm definitions at the Monasca API documentation: (https://github.com/stackforge/monasca-api/blob/master/docs/monasca-api-spec.md#alarm-definitions-and-alarms).

The following is a code snippet from the ~/openstack/ardana/ansible/roles/monasca-default-alarms/tasks/main.yml file.

- name: monasca-default-alarms | main | Create Alarm Definitions
  monasca_alarm_definition:
    name: "{{ item.name }}"
    description: "{{ item.description | default('') }}"
    expression: "{{ item.expression }}"
    keystone_token: "{{ default_notification_result.keystone_token }}"
    match_by: "{{ item.match_by | default(['hostname']) }}"
    monasca_api_url: "{{ default_notification_result.monasca_api_url }}"
    severity: "{{ item.severity | default('LOW') }}"
    alarm_actions:
      - "{{ default_notification_result.notification_method_id }}"
    ok_actions:
      - "{{ default_notification_result.notification_method_id }}"
    undetermined_actions:
      - "{{ default_notification_result.notification_method_id }}"
  register: monasca_system_alarms_result
  until: not monasca_system_alarms_result | failed
  retries: "{{ monasca_client_retries }}"
  delay: "{{ monasca_client_retry_delay }}"
  with_flattened:
    - monasca_alarm_definitions_system
    - monasca_alarm_definitions_monasca
    - monasca_alarm_definitions_openstack
    - monasca_alarm_definitions_misc_services
  when: monasca_create_definitions

In the following example ~/openstack/ardana/ansible/roles/monasca-default-alarms/vars/main.yml Ansible variables file, the alarm definition named Process Check sets the match_by variable with the following parameters.

  • process_name

  • hostname

monasca_alarm_definitions_system:
  - name: "Host Status"
    description: "Alarms when the specified host is down or not reachable"
    severity: "HIGH"
    expression: "host_alive_status > 0"
    match_by:
      - "target_host"
      - "hostname"
  - name: "HTTP Status"
    description: >
      "Alarms when the specified HTTP endpoint is down or not reachable"
    severity: "HIGH"
    expression: "http_status > 0"
    match_by:
      - "service"
      - "component"
      - "hostname"
      - "url"
  - name: "CPU Usage"
    description: "Alarms when CPU usage is high"
    expression: "avg(cpu.idle_perc) < 10 times 3"
  - name: "High CPU IOWait"
    description: "Alarms when CPU IOWait is high, possible slow disk issue"
    expression: "avg(cpu.wait_perc) > 40 times 3"
    match_by:
      - "hostname"
  - name: "Disk Inode Usage"
    description: "Alarms when disk inode usage is high"
    expression: "disk.inode_used_perc > 90"
    match_by:
      - "hostname"
      - "device"
    severity: "HIGH"
  - name: "Disk Usage"
    description: "Alarms when disk usage is high"
    expression: "disk.space_used_perc > 90"
    match_by:
      - "hostname"
      - "device"
    severity: "HIGH"
  - name: "Memory Usage"
    description: "Alarms when memory usage is high"
    severity: "HIGH"
    expression: "avg(mem.usable_perc) < 10 times 3"
  - name: "Network Errors"
    description: >
      "Alarms when either incoming or outgoing network errors are high"
    severity: "MEDIUM"
    expression: "net.in_errors_sec > 5 or net.out_errors_sec > 5"
  - name: "Process Check"
    description: "Alarms when the specified process is not running"
    severity: "HIGH"
    expression: "process.pid_count < 1"
    match_by:
      - "process_name"
      - "hostname"
  - name: "Crash Dump Count"
    description: "Alarms when a crash directory is found"
    severity: "MEDIUM"
    expression: "crash.dump_count > 0"
    match_by:
      - "hostname"

The preceding configuration would result in the creation of an alarm for each unique metric that matched the following criteria.

process.pid_count + process_name + hostname

Check that the alarms exist

Begin by using the following commands, including monasca alarm-definition-list, to check that the alarm definition exists.

ardana > source ~/service.osrc
ardana > monasca alarm-definition-list --name ALARM_DEFINITION_NAME

Then use either of the following commands to check that the alarm has been generated. A status of "OK" indicates a healthy alarm.

ardana > monasca alarm-list --metric-name metric name

Or

ardana > monasca alarm-list --alarm-definition-id ID_FROM_ALARM-DEFINITION-LIST
Note
Note

To see CLI options use the monasca help command.

Alarm state upgrade considerations

If the name of a monitoring metric changes or is no longer being sent, existing alarms will show the alarm state as UNDETERMINED. You can update an alarm definition as long as you do not change the metric name or dimension name values in the expression or match_by fields. If you find that you need to alter either of these values, you must delete the old alarm definitions and create new definitions with the updated values.

If a metric is never sent, but had a related alarm definition, then no alarms would exist. If you find that no metrics are never sent, then you should remove the related alarm definition.

When removing an alarm definition, the Ansible module monasca_alarm_definition supports the state "absent".

The following file snippet shows an example of how to remove an alarm definition by setting the state to absent.

- name: monasca-pre-upgrade | Remove alarm definitions
   monasca_alarm_definition:
     name: "{{ item.name }}"
     state: "absent"
     keystone_url: "{{ monasca_keystone_url }}"
     keystone_user: "{{ monasca_keystone_user }}"
     keystone_password: "{{ monasca_keystone_password }}"
     keystone_project: "{{ monasca_keystone_project }}"
     monasca_api_url: "{{ monasca_api_url }}"
   with_items:
     - { name: "Kafka Consumer Lag" }

An alarm exists in the OK state when the monasca threshold engine has seen at least one metric associated with the alarm definition and has not exceeded the alarm definition threshold.

3.4.7 Openstack Integration of Custom Plugins into Monasca-Agent (if applicable) Edit source

Monasca-agent is an OpenStack open-source project. Monasca can also monitor non-openstack services. Third parties should install custom plugins into their SUSE OpenStack Cloud 8 system using the steps outlined in the Section 3.4.3, “Writing Custom Plugins”. If the OpenStack community determines that the custom plugins are of general benefit, the plugin may be added to the openstack/monasca-agent so that they are installed with the monasca-agent. During the review process for openstack/monasca-agent there are no guarantees that code will be approved or merged by a deadline. Open-source contributors are expected to help with codereviews in order to get their code accepted. Once changes are approved and integrated into the openstack/monasca-agent and that version of the monasca-agent is integrated with SUSE OpenStack Cloud 8, the third party can remove the custom plugin installation steps since they would be installed in the default monasca-agent venv.

Find the open source repository for the monaca-agent here: https://github.com/openstack/monasca-agent

Print this page