4 Third-Party Integrations #
4.1 Splunk Integration #
This documentation demonstrates the possible integration between the SUSE OpenStack Cloud 9 centralized logging solution and Splunk including the steps to set up and forward logs.
The SUSE OpenStack Cloud 9 logging solution provides a flexible and extensible framework to centralize the collection and processing of logs from all of the nodes in a cloud. The logs are shipped to a highly available and fault tolerant cluster where they are transformed and stored for better searching and reporting. The SUSE OpenStack Cloud 9 logging solution uses the ELK stack (Elasticsearch, Logstash and Kibana) as a production grade implementation and can support other storage and indexing technologies. The Logstash pipeline can be configured to forward the logs to an alternative target if you wish.
This documentation demonstrates the possible integration between the SUSE OpenStack Cloud 9 centralized logging solution and Splunk including the steps to set up and forward logs.
4.1.1 What is Splunk? #
Splunk is software for searching, monitoring, and analyzing machine-generated big data, via a web-style interface. Splunk captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards and visualizations. It is commercial software (unlike Elasticsearch) and more details about Splunk can be found at https://www.splunk.com.
4.1.2 Configuring Splunk to receive log messages from SUSE OpenStack Cloud 9 #
This documentation assumes that you already have Splunk set up and running. For help with installing and setting up Splunk, refer to Splunk Tutorial.
There are different ways in which a log message (or "event" in Splunk's terminology) can be sent to Splunk. These steps will set up a TCP port where Splunk will listen for messages.
On the Splunk web UI, click on the Settings menu in the upper right-hand corner.
In the
section of the Settings menu, click .Choose the
option.Click the
button to add an input.In the
field, enter the port number you want to use.NoteIf you are on a less secure network and want to restrict connections to this port, use the
field to restrict the traffic to a specific IP address.Click the
button.Specify the Source Type by clicking on the
button and choosinglinux_messages_syslog
from the list.Click the
button.Review the configuration and click the
button.A success message will be displayed.
4.1.3 Forwarding log messages from SUSE OpenStack Cloud 9 Centralized Logging to Splunk #
When you have Splunk set up and configured to receive log messages, you can configure SUSE OpenStack Cloud 9 to forward the logs to Splunk.
Log in to the Cloud Lifecycle Manager.
Check the status of the logging service:
ardana >
cd ~/scratch/ansible/next/ardana/ansibleardana >
ansible-playbook -i hosts/verb_hosts logging-status.ymlIf everything is up and running, continue to the next step.
Edit the logstash config file at the location below:
~/openstack/ardana/ansible/roles/logging-server/templates/logstash.conf.j2
At the bottom of the file will be a section for the Logstash outputs. Add details about your Splunk environment details.
Below is an example, showing the placement in bold:
# Logstash outputs #------------------------------------------------------------------------------ output { # Configure Elasticsearch output # http://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html elasticsearch { index => %{[@metadata][es_index]} hosts => ["{{ elasticsearch_http_host }}:{{ elasticsearch_http_port }}"] flush_size => {{ logstash_flush_size }} idle_flush_time => 5 workers => {{ logstash_threads }} } # Forward Logs to Splunk on the TCP port that matches the one specified in Splunk Web UI. tcp { mode => "client" host => "<Enter Splunk listener IP address>" port => TCP_PORT_NUMBER } }
NoteIf you are not planning on using the Splunk UI to parse your centralized logs, there is no need to forward your logs to Elasticsearch. In this situation, comment out the lines in the Logstash outputs pertaining to Elasticsearch. However, you can continue to forward your centralized logs to multiple locations.
Commit your changes to git:
ardana >
cd ~/openstack/ardana/ansibleardana >
git add -Aardana >
git commit -m "Logstash configuration change for Splunk integration"Run the configuration processor:
ardana >
cd ~/openstack/ardana/ansibleardana >
ansible-playbook -i hosts/localhost config-processor-run.ymlUpdate your deployment directory:
ardana >
cd ~/openstack/ardana/ansibleardana >
ansible-playbook -i hosts/localhost ready-deployment.ymlComplete this change with a reconfigure of the logging environment:
ardana >
cd ~/scratch/ansible/next/ardana/ansibleardana >
ansible-playbook -i hosts/verb_hosts kronos-reconfigure.ymlIn your Splunk UI, confirm that the logs have begun to forward.
4.1.4 Searching for log messages from the Spunk dashboard #
To both verify that your integration worked and to search your log messages that have been forwarded you can navigate back to your Splunk dashboard. In the search field, use this string:
source="tcp:TCP_PORT_NUMBER"
Find information on using the Splunk search tool at http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/WelcometotheSearchTutorial.
4.2 Operations Bridge Integration #
The SUSE OpenStack Cloud 9 monitoring solution (monasca) can easily be integrated with your existing monitoring tools. Integrating SUSE OpenStack Cloud 9 monasca with Operations Bridge using the Operations Bridge Connector simplifies monitoring and managing events and topology information.
The integration provides the following functionality:
Forwarding of SUSE OpenStack Cloud monasca alerts and topology to Operations Bridge for event correlation
Customization of forwarded events and topology
For more information about this connector please see https://software.microfocus.com/en-us/products/operations-bridge-suite/overview.
4.3 Monitoring Third-Party Components With Monasca #
4.3.1 monasca Monitoring Integration Overview #
monasca, the SUSE OpenStack Cloud 9 monitoring service, collects information about your cloud's systems, and allows you to create alarm definitions based on these measurements. monasca-agent is the component that collects metrics such as metric storage and alarm thresholding and forwards them to the monasca-api for further processing.
With a small amount of configuration, you can use the detection and check plugins that are provided with your cloud to monitor integrated third-party components. In addition, you can write custom plugins and integrate them with the existing monitoring service.
Find instructions for customizing existing plugins to monitor third-party components in the Section 4.3.4, “Configuring Check Plugins”.
Find instructions for installing and configuring new custom plugins in the Section 4.3.3, “Writing Custom Plugins”.
You can also use existing alarm definitions, as well as create new alarm definitions that relate to a custom plugin or metric. Instructions for defining new alarm definitions are in the Section 4.3.6, “Configuring Alarm Definitions”.
You can use the Operations Console and monasca CLI to list all of the alarms, alarm-definitions, and metrics that exist on your cloud.
4.3.2 monasca Agent #
The monasca agent (monasca-agent) collects information about your cloud using the installed plugins. The plugins are written in Python, and determine the monitoring metrics for your system, as well as the interval for collection. The default collection interval is 30 seconds, and we strongly recommend not changing this default value.
The following two types of custom plugins can be added to your cloud.
Detection Plugin. Determines whether the monasca-agent has the ability to monitor the specified component or service on a host. If successful, this type of plugin configures an associated check plugin by creating a YAML configuration file.
Check Plugin. Specifies the metrics to be monitored, using the configuration file created by the detection plugin.
monasca-agent is installed on every server in your cloud, and provides plugins that monitor the following.
System metrics relating to CPU, memory, disks, host availability, etc.
Process health metrics (process, http_check)
SUSE OpenStack Cloud 9-specific component metrics, such as apache rabbitmq, kafka, cassandra, etc.
monasca is pre-configured with default check plugins and associated detection plugins. The default plugins can be reconfigured to monitor third-party components, and often only require small adjustments to adapt them to this purpose. Find a list of the default plugins here: https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md#detection-plugins
Often, a single check plugin will be used to monitor multiple services. For
example, many services use the http_check.py
detection
plugin to detect the up/down status of a service endpoint. Often the
process.py
check plugin, which provides process monitoring
metrics, is used as a basis for a custom process detection plugin.
More information about the monasca agent can be found in the following locations
monasca agent overview: https://github.com/openstack/monasca-agent/blob/master/docs/Agent.md
Information on existing plugins: https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md
Information on plugin customizations: https://github.com/openstack/monasca-agent/blob/master/docs/Customizations.md
4.3.3 Writing Custom Plugins #
When the pre-built monasca plugins do not meet your monitoring needs, you can write custom plugins to monitor your cloud. After you have written a plugin, you must install and configure it.
When your needs dictate a very specific custom monitoring check, you must provide both a detection and check plugin.
The steps involved in configuring a custom plugin include running a detection plugin and passing any necesssary parameters to the detection plugin so the resulting check configuration file is created with all necessary data.
When using an existing check plugin to monitor a third-party component, a custom detection plugin is needed only if there is not an associated default detection plugin.
Check plugin configuration files
Each plugin needs a corresponding YAML configuration file with the same stem
name as the plugin check file. For example, the plugin file
http_check.py
(in
/usr/lib/python2.7/site-packages/monasca_agent/collector/checks_d/
)
should have a corresponding configuration file,
http_check.yaml
(in
/etc/monasca/agent/conf.d/http_check.yaml
). The stem
name http_check
must be the same for both files.
Permissions for the YAML configuration file must be read+write for mon-agent
user (the
user that must also own the file), and read
for the mon-agent
group. Permissions for the file must be
restricted to the mon-agent user and
monasca group. The following example shows
correct permissions settings for the file
http_check.yaml
.
ardana >
ls -alt /etc/monasca/agent/conf.d/http_check.yaml
-rw-r----- 1 monasca-agent monasca 10590 Jul 26 05:44 http_check.yaml
A check plugin YAML configuration file has the following structure.
init_config: key1: value1 key2: value2 instances: - name: john_smith username: john_smith password: 123456 - name: jane_smith username: jane_smith password: 789012
In the above file structure, the init_config
section
allows you to specify any number of global
key:value pairs. Each pair will be available
on every run of the check that relates to the YAML configuration file.
The instances
section allows you to list the instances
that the related check will be run on. The check will be run once on each
instance listed in the instances
section. Ensure that each
instance listed in the instances
section has a unique
name.
Custom detection plugins
Detection plugins should be written to perform checks that ensure that a component can be monitored on a host. Any arguments needed by the associated check plugin are passed into the detection plugin at setup (configuration) time. The detection plugin will write to the associated check configuration file.
When a detection plugin is successfully run in the configuration step, it will write to the check configuration YAML file. The configuration file for the check is written to the following directory.
/etc/monasca/agent/conf.d/
Writing process detection plugin using the ServicePlugin class
The monasca-agent provides a ServicePlugin
class that makes process detection monitoring easy.
Process check
The process check plugin generates metrics based on the process status for
specified process names. It generates
process.pid_count
metrics for the specified
dimensions, and a set of detailed process metrics for the specified
dimensions by default.
The ServicePlugin class allows you to specify a list of process name(s) to
detect, and uses psutil to see if the
process exists on the host. It then appends the process.yml
configuration
file with the process name(s), if they do not already exist.
The following is an example of a process.py
check ServicePlugin
.
import monasca_setup.detection class monascaTransformDetect(monasca_setup.detection.ServicePlugin): """Detect monasca Transform daemons and setup configuration to monitor them.""" def __init__(self, template_dir, overwrite=False, args=None): log.info(" Watching the monasca transform processes.") service_params = { 'args': {}, 'template_dir': template_dir, 'overwrite': overwrite, 'service_name': 'monasca-transform', 'process_names': ['monasca-transform','pyspark', 'transform/lib/driver'] } super(monascaTransformDetect, self).__init__(service_params)
Writing a Custom Detection Plugin using Plugin or ArgsPlugin classes
A custom detection plugin class should derive from either the Plugin or
ArgsPlugin classes provided in the
/usr/lib/python2.7/site-packages/monasca_setup/detection
directory.
If the plugin parses command line arguments, the ArgsPlugin
class is useful.
The ArgsPlugin class derives from the Plugin class. The ArgsPlugin class has
a method to check for required arguments, and a method to return the instance
that will be used for writing to the configuration file with the dimensions
from the command line parsed and included.
If the ArgsPlugin methods do not seem to apply, then derive directly from the Plugin class.
When deriving from these classes, the following methods should be implemented.
_detect - set self.available=True when conditions are met that the thing to monitor exists on a host.
build_config - writes the instance information to the configuration and return the configuration.
dependencies_installed (default implementation is in ArgsPlugin, but not Plugin) - return true when python dependent libraries are installed.
The following is an example custom detection plugin.
import ast import logging import monasca_setup.agent_config import monasca_setup.detection log = logging.getLogger(__name__) class HttpCheck(monasca_setup.detection.ArgsPlugin): """Setup an http_check according to the passed in args. Despite being a detection plugin this plugin does no detection and will be a noop without arguments. Expects space separated arguments, the required argument is url. Optional parameters include: disable_ssl_validation and match_pattern. """ def _detect(self): """Run detection, set self.available True if the service is detected. """ self.available = self._check_required_args(['url']) def build_config(self): """Build the config as a Plugins object and return. """ config = monasca_setup.agent_config.Plugins() # No support for setting headers at this time instance = self._build_instance(['url', 'timeout', 'username', 'password', 'match_pattern', 'disable_ssl_validation', 'name', 'use_keystone', 'collect_response_time']) # Normalize any boolean parameters for param in ['use_keystone', 'collect_response_time']: if param in self.args: instance[param] = ast.literal_eval(self.args[param].capitalize()) # Set some defaults if 'collect_response_time' not in instance: instance['collect_response_time'] = True if 'name' not in instance: instance['name'] = self.args['url'] config['http_check'] = {'init_config': None, 'instances': [instance]} return config
Installing a detection plugin in the OpenStack version delivered with SUSE OpenStack Cloud
Install a plugin by copying it to the plugin directory
(/usr/lib/python2.7/site-packages/monasca_agent/collector/checks_d/
).
The plugin should have file permissions of read+write for the root user (the user that should also own the file) and read for the root group and all other users.
The following is an example of correct file permissions for the http_check.py file.
-rw-r--r-- 1 root root 1769 Sep 19 20:14 http_check.py
Detection plugins should be placed in the following directory.
/usr/lib/monasca/agent/custom_detect.d/
The detection plugin directory name should be accessed using the
monasca_agent_detection_plugin_dir
Ansible variable. This
variable is defined in the
roles/monasca-agent/vars/main.yml
file.
monasca_agent_detection_plugin_dir: /usr/lib/monasca/agent/custom_detect.d/
Example: Add Ansible monasca_configure
task to install the
plugin. (The monasca_configure
task can be added to any
service playbook.) In this example, it is added to
~/openstack/ardana/ansible/roles/_CEI-CMN/tasks/monasca_configure.yml
.
--- - name: _CEI-CMN | monasca_configure | Copy ceilometer Custom plugin become: yes copy: src: ardanaceilometer_mon_plugin.py dest: "{{ monasca_agent_detection_plugin_dir }}" owner: root group: root mode: 0440
Custom check plugins
Custom check plugins generate metrics. Scalability should be taken into consideration on systems that will have hundreds of servers, as a large number of metrics can affect performance by impacting disk performance, RAM and CPU usage.
You may want to tune your configuration parameters so that less-important metrics are not monitored as frequently. When check plugins are configured (when they have an associated YAML configuration file) the agent will attempt to run them.
Checks should be able to run within the 30-second metric collection window.
If your check runs a command, you should provide a timeout to prevent the
check from running longer than the default 30-second window. You can use the
monasca_agent.common.util.timeout_command
to set a timeout
for in your custom check plugin python code.
Find a description of how to write custom check plugins at https://github.com/openstack/monasca-agent/blob/master/docs/Customizations.md#creating-a-custom-check-plugin
Custom checks derive from the AgentCheck class located in the
monasca_agent/collector/checks/check.py
file. A check
method is required.
Metrics should contain dimensions that make each item that you are monitoring unique (such as service, component, hostname). The hostname dimension is defined by default within the AgentCheck class, so every metric has this dimension.
A custom check will do the following.
Read the configuration instance passed into the check method.
Set dimensions that will be included in the metric.
Create the metric with gauge, rate, or counter types.
Metric Types:
gauge: Instantaneous reading of a particular value (for example, mem.free_mb).
rate: Measurement over a time period. The following equation can be used to define rate.
rate=delta_v/float(delta_t)
counter: The number of events, increment and decrement methods, for example, zookeeper.timeouts
The following is an example component check named SimpleCassandraExample.
import monasca_agent.collector.checks as checks from monasca_agent.common.util import timeout_command CASSANDRA_VERSION_QUERY = "SELECT version();" class SimpleCassandraExample(checks.AgentCheck): def __init__(self, name, init_config, agent_config): super(SimpleCassandraExample, self).__init__(name, init_config, agent_config) @staticmethod def _get_config(instance): user = instance.get('user') password = instance.get('password') service = instance.get('service') timeout = int(instance.get('timeout')) return user, password, service, timeout def check(self, instance): user, password, service, node_name, timeout = self._get_config(instance) dimensions = self._set_dimensions({'component': 'cassandra', 'service': service}, instance) results, connection_status = self._query_database(user, password, timeout, CASSANDRA_VERSION_QUERY) if connection_status != 0: self.gauge('cassandra.connection_status', 1, dimensions=dimensions) else: # successful connection status self.gauge('cassandra.connection_status', 0, dimensions=dimensions) def _query_database(self, user, password, timeout, query): stdout, stderr, return_code = timeout_command(["/opt/cassandra/bin/vsql", "-U", user, "-w", password, "-A", "-R", "|", "-t", "-F", ",", "-x"], timeout, command_input=query) if return_code == 0: # remove trailing newline stdout = stdout.rstrip() return stdout, 0 else: self.log.error("Error querying cassandra with return code of {0} and error {1}".format(return_code, stderr)) return stderr, 1
Installing check plugin
The check plugin needs to have the same file permissions as the detection plugin. File permissions must be read+write for the root user (the user that should own the file), and read for the root group and all other users.
Check plugins should be placed in the following directory.
/usr/lib/monasca/agent/custom_checks.d/
The check plugin directory should be accessed using the
monasca_agent_check_plugin_dir
Ansible variable. This
variable is defined in the
roles/monasca-agent/vars/main.yml
file.
monasca_agent_check_plugin_dir: /usr/lib/monasca/agent/custom_checks.d/
4.3.4 Configuring Check Plugins #
Manually configure a plugin when unit-testing using the monasca-setup script installed with the monasca-agent
Find a good explanation of configuring plugins here: https://github.com/openstack/monasca-agent/blob/master/docs/Agent.md#configuring
SSH to a node that has both the monasca-agent installed as well as the component you wish to monitor.
The following is an example command that configures a plugin that has no parameters (uses the detection plugin class name).
root #
/usr/bin/monasca-setup -d ARDANACeilometer
The following is an example command that configures the apache plugin and includes related parameters.
root #
/usr/bin/monasca-setup -d apache -a 'url=http://192.168.245.3:9095/server-status?auto'
If there is a change in the configuration it will restart the monasca-agent on the host so the configuration is loaded.
After the plugin is configured, you can verify that the configuration file has your changes (see the next Verify that your check plugin is configured section).
Use the monasca CLI to see if your metric exists (see the Verify that metrics exist section).
Using Ansible modules to configure plugins in SUSE OpenStack Cloud 9
The monasca_agent_plugin
module is installed as part of
the monasca-agent role.
The following Ansible example configures the process.py plugin for the ceilometer detection plugin. The following example only passes in the name of the detection class.
- name: _CEI-CMN | monasca_configure | Run monasca agent Cloud Lifecycle Manager specific ceilometer detection plugin become: yes monasca_agent_plugin: name: "ARDANACeilometer"
If a password or other sensitive data are passed to the detection plugin, the
no_log
option should be set to
True. If the no_log
option is not set to True, the data passed
to the plugin will be logged to syslog.
The following Ansible example configures the Cassandra plugin and passes in related arguments.
- name: Run monasca Agent detection plugin for Cassandra monasca_agent_plugin: name: "Cassandra" args="directory_names={{ FND_CDB.vars.cassandra_data_dir }},{{ FND_CDB.vars.cassandra_commit_log_dir }} process_username={{ FND_CDB.vars.cassandra_user }}" when: database_type == 'cassandra'
The following Ansible example configures the keystone endpoint using the
http_check.py detection plugin. The class name httpcheck
of the http_check.py detection plugin is the name.
root #
- name: keystone-monitor | local_monitor |
Setup active check on keystone internal endpoint locally
become: yes
monasca_agent_plugin:
name: "httpcheck"
args: "use_keystone=False \
url=http://{{ keystone_internal_listen_ip }}:{{
keystone_internal_port }}/v3 \
dimensions=service:identity-service,\
component:keystone-api,\
api_endpoint:internal,\
monitored_host_type:instance"
tags:
- keystone
- keystone_monitor
Verify that your check plugin is configured
All check configuration files are located in the following directory. You can see the plugins that are running by looking at the plugin configuration directory.
/etc/monasca/agent/conf.d/
When the monasca-agent starts up, all of the check plugins that have a
matching configuration file in the
/etc/monasca/agent/conf.d/
directory will be loaded.
If there are errors running the check plugin they will be written to the following error log file.
/var/log/monasca/agent/collector.log
You can change the monasca-agent log level by modifying the
log_level
option in the
/etc/monasca/agent/agent.yaml
configuration file, and then
restarting the monasca-agent, using the following command.
root #
service openstack-monasca-agent restart
You can debug a check plugin by running monasca-collector
with the check option. The following is an example of the
monasca-collector
command.
tux >
sudo /usr/bin/monasca-collector check CHECK_NAME
Verify that metrics exist
Begin by logging in to your deployer or controller node.
Run the following set of commands, including the monasca
metric-list
command. If the metric exists, it will be displayed in
the output.
ardana >
source ~/service.osrcardana >
monasca metric-list --name METRIC_NAME
4.3.5 Metric Performance Considerations #
Collecting metrics on your virtual machines can greatly affect performance. SUSE OpenStack Cloud 9 supports 200 compute nodes, with up to 40 VMs each. If your environment is managing maximum number of VMs, adding a single metric for all VMs is the equivalent of adding 8000 metrics.
Because of the potential impact that new metrics have on system performance, consider adding only new metrics that are useful for alarm-definition, capacity planning, or debugging process failure.
4.3.6 Configuring Alarm Definitions #
The monasca-api-spec, found here https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md provides an explanation of Alarm Definitions and Alarms. You can find more information on alarm definition expressions at the following page: https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md#alarm-definition-expressions.
When an alarm definition is defined, the monasca-threshold engine will generate an alarm for each unique instance of the match_by metric dimensions found in the metric. This allows a single alarm definition that can dynamically handle the addition of new hosts.
There are default alarm definitions configured for all "process check" (process.py check) and "HTTP Status" (http_check.py check) metrics in the monasca-default-alarms role. The monasca-default-alarms role is installed as part of the monasca deployment phase of your cloud's deployment. You do not need to create alarm definitions for these existing checks.
Third parties should create an alarm definition when they wish to alarm on a custom plugin metric. The alarm definition should only be defined once. Setting a notification method for the alarm definition is recommended but not required.
The following Ansible modules used for alarm definitions are installed as part of the monasca-alarm-definition role. This process takes place during the monasca set up phase of your cloud's deployment.
monasca_alarm_definition
monasca_notification_method
The following examples, found in the
~/openstack/ardana/ansible/roles/monasca-default-alarms
directory, illustrate how monasca sets up the default alarm definitions.
monasca Notification Methods
The monasca-api-spec, found in the following link, provides details about creating a notification https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md#create-notification-method
The following are supported notification types.
EMAIL
WEBHOOK
PAGERDUTY
The keystone_admin_tenant
project is used so that the
alarms will show up on the Operations Console UI.
The following file snippet shows variables from the
~/openstack/ardana/ansible/roles/monasca-default-alarms/defaults/main.yml
file.
--- notification_address: root@localhost notification_name: 'Default Email' notification_type: EMAIL monasca_keystone_url: "{{ KEY_API.advertises.vips.private[0].url }}/v3" monasca_api_url: "{{ MON_AGN.consumes_MON_API.vips.private[0].url }}/v2.0" monasca_keystone_user: "{{ MON_API.consumes_KEY_API.vars.keystone_monasca_user }}" monasca_keystone_password: "{{ MON_API.consumes_KEY_API.vars.keystone_monasca_password | quote }}" monasca_keystone_project: "{{ KEY_API.vars.keystone_admin_tenant }}" monasca_client_retries: 3 monasca_client_retry_delay: 2
You can specify a single default notification method in the
~/openstack/ardana/ansible/roles/monasca-default-alarms/tasks/main.yml
file. You can also add or modify the notification type and related details
using the Operations Console UI or monasca CLI.
The following is a code snippet from the
~/openstack/ardana/ansible/roles/monasca-default-alarms/tasks/main.yml
file.
--- - name: monasca-default-alarms | main | Setup default notification method monasca_notification_method: name: "{{ notification_name }}" type: "{{ notification_type }}" address: "{{ notification_address }}" keystone_url: "{{ monasca_keystone_url }}" keystone_user: "{{ monasca_keystone_user }}" keystone_password: "{{ monasca_keystone_password }}" keystone_project: "{{ monasca_keystone_project }}" monasca_api_url: "{{ monasca_api_url }}" no_log: True tags: - system_alarms - monasca_alarms - openstack_alarms register: default_notification_result until: not default_notification_result | failed retries: "{{ monasca_client_retries }}" delay: "{{ monasca_client_retry_delay }}"
monasca Alarm Definition
In the alarm definition "expression" field, you can specify the metric name and threshold. The "match_by" field is used to create a new alarm for every unique combination of the match_by metric dimensions.
Find more details on alarm definitions at the monasca API documentation: (https://github.com/stackforge/monasca-api/blob/master/docs/monasca-api-spec.md#alarm-definitions-and-alarms).
The following is a code snippet from the
~/openstack/ardana/ansible/roles/monasca-default-alarms/tasks/main.yml
file.
- name: monasca-default-alarms | main | Create Alarm Definitions monasca_alarm_definition: name: "{{ item.name }}" description: "{{ item.description | default('') }}" expression: "{{ item.expression }}" keystone_token: "{{ default_notification_result.keystone_token }}" match_by: "{{ item.match_by | default(['hostname']) }}" monasca_api_url: "{{ default_notification_result.monasca_api_url }}" severity: "{{ item.severity | default('LOW') }}" alarm_actions: - "{{ default_notification_result.notification_method_id }}" ok_actions: - "{{ default_notification_result.notification_method_id }}" undetermined_actions: - "{{ default_notification_result.notification_method_id }}" register: monasca_system_alarms_result until: not monasca_system_alarms_result | failed retries: "{{ monasca_client_retries }}" delay: "{{ monasca_client_retry_delay }}" with_flattened: - monasca_alarm_definitions_system - monasca_alarm_definitions_monasca - monasca_alarm_definitions_openstack - monasca_alarm_definitions_misc_services when: monasca_create_definitions
In the following example
~/openstack/ardana/ansible/roles/monasca-default-alarms/vars/main.yml
Ansible variables file, the alarm definition named
Process Check sets the
match_by variable with the following
parameters.
process_name
hostname
monasca_alarm_definitions_system: - name: "Host Status" description: "Alarms when the specified host is down or not reachable" severity: "HIGH" expression: "host_alive_status > 0" match_by: - "target_host" - "hostname" - name: "HTTP Status" description: > "Alarms when the specified HTTP endpoint is down or not reachable" severity: "HIGH" expression: "http_status > 0" match_by: - "service" - "component" - "hostname" - "url" - name: "CPU Usage" description: "Alarms when CPU usage is high" expression: "avg(cpu.idle_perc) < 10 times 3" - name: "High CPU IOWait" description: "Alarms when CPU IOWait is high, possible slow disk issue" expression: "avg(cpu.wait_perc) > 40 times 3" match_by: - "hostname" - name: "Disk Inode Usage" description: "Alarms when disk inode usage is high" expression: "disk.inode_used_perc > 90" match_by: - "hostname" - "device" severity: "HIGH" - name: "Disk Usage" description: "Alarms when disk usage is high" expression: "disk.space_used_perc > 90" match_by: - "hostname" - "device" severity: "HIGH" - name: "Memory Usage" description: "Alarms when memory usage is high" severity: "HIGH" expression: "avg(mem.usable_perc) < 10 times 3" - name: "Network Errors" description: > "Alarms when either incoming or outgoing network errors are high" severity: "MEDIUM" expression: "net.in_errors_sec > 5 or net.out_errors_sec > 5" - name: "Process Check" description: "Alarms when the specified process is not running" severity: "HIGH" expression: "process.pid_count < 1" match_by: - "process_name" - "hostname" - name: "Crash Dump Count" description: "Alarms when a crash directory is found" severity: "MEDIUM" expression: "crash.dump_count > 0" match_by: - "hostname"
The preceding configuration would result in the creation of an alarm for each unique metric that matched the following criteria.
process.pid_count + process_name + hostname
Check that the alarms exist
Begin by using the following commands, including monasca
alarm-definition-list
, to check that the alarm definition exists.
ardana >
source ~/service.osrcardana >
monasca alarm-definition-list --name ALARM_DEFINITION_NAME
Then use either of the following commands to check that the alarm has been generated. A status of "OK" indicates a healthy alarm.
ardana >
monasca alarm-list --metric-name metric name
Or
ardana >
monasca alarm-list --alarm-definition-id ID_FROM_ALARM-DEFINITION-LIST
To see CLI options use the monasca help
command.
Alarm state upgrade considerations
If the name of a monitoring metric changes or is no longer being sent,
existing alarms will show the alarm state as
UNDETERMINED
. You can update an alarm definition as long
as you do not change the metric name or
dimension name values in the expression or match_by fields. If you find that you need to alter
either of these values, you must delete the old alarm definitions and create
new definitions with the updated values.
If a metric is never sent, but has a related alarm definition, then no alarms would exist. If you find that metrics are never sent, then you should remove the related alarm definitions.
When removing an alarm definition, the Ansible module
monasca_alarm_definition supports the state
absent
.
The following file snippet shows an example of how to remove an alarm
definition by setting the state to absent
.
- name: monasca-pre-upgrade | Remove alarm definitions monasca_alarm_definition: name: "{{ item.name }}" state: "absent" keystone_url: "{{ monasca_keystone_url }}" keystone_user: "{{ monasca_keystone_user }}" keystone_password: "{{ monasca_keystone_password }}" keystone_project: "{{ monasca_keystone_project }}" monasca_api_url: "{{ monasca_api_url }}" with_items: - { name: "Kafka Consumer Lag" }
An alarm exists in the OK state when the monasca threshold engine has seen at least one metric associated with the alarm definition and has not exceeded the alarm definition threshold.
4.3.7 Openstack Integration of Custom Plugins into monasca-Agent (if applicable) #
monasca-agent is an OpenStack open-source project. monasca can also monitor non-openstack services. Third parties should install custom plugins into their SUSE OpenStack Cloud 9 system using the steps outlined in the Section 4.3.3, “Writing Custom Plugins”. If the OpenStack community determines that the custom plugins are of general benefit, the plugin may be added to the openstack/monasca-agent so that they are installed with the monasca-agent. During the review process for openstack/monasca-agent there are no guarantees that code will be approved or merged by a deadline. Open-source contributors are expected to help with codereviews in order to get their code accepted. Once changes are approved and integrated into the openstack/monasca-agent and that version of the monasca-agent is integrated with SUSE OpenStack Cloud 9, the third party can remove the custom plugin installation steps since they would be installed in the default monasca-agent venv.
Find the open source repository for the monaca-agent here: https://github.com/openstack/monasca-agent