Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE OpenStack Cloud 8

10 Telemetry

Even in the cloud industry, providers must use a multi-step process for billing. The required steps to bill for usage in a cloud environment are metering, rating, and billing. Because the provider's requirements may be far too specific for a shared solution, rating and billing solutions cannot be designed in a common module that satisfies all. Providing users with measurements on cloud services is required to meet the measured service definition of cloud computing.

The Telemetry service was originally designed to support billing systems for OpenStack cloud resources. This project only covers the metering portion of the required processing for billing. This service collects information about the system and stores it in the form of samples in order to provide data about anything that can be billed.

In addition to system measurements, the Telemetry service also captures event notifications triggered when various actions are executed in the OpenStack system. This data is captured as Events and stored alongside metering data.

The list of meters is continuously growing, which makes it possible to use the data collected by Telemetry for different purposes, other than billing. For example, the autoscaling feature in the Orchestration service can be triggered by alarms this module sets and then gets notified within Telemetry.

The sections in this document contain information about the architecture and usage of Telemetry. The first section contains a brief summary about the system architecture used in a typical OpenStack deployment. The second section describes the data collection mechanisms. You can also read about alarming to understand how alarm definitions can be posted to Telemetry and what actions can happen if an alarm is raised. The last section contains a troubleshooting guide, which mentions error situations and possible solutions to the problems.

You can retrieve the collected samples in three different ways: with the REST API, with the command-line interface, or with the Metering tab on an OpenStack dashboard.

10.1 System architecture

The Telemetry service uses an agent-based architecture. Several modules combine their responsibilities to collect data, store samples in a database, or provide an API service for handling incoming requests.

The Telemetry service is built from the following agents and services:

ceilometer-api

Presents aggregated metering data to consumers (such as billing engines and analytics tools).

ceilometer-polling

Polls for different kinds of meter data by using the polling plug-ins (pollsters) registered in different namespaces. It provides a single polling interface across different namespaces.

ceilometer-agent-central

Polls the public RESTful APIs of other OpenStack services such as Compute service and Image service, in order to keep tabs on resource existence, by using the polling plug-ins (pollsters) registered in the central polling namespace.

ceilometer-agent-compute

Polls the local hypervisor or libvirt daemon to acquire performance data for the local instances, messages and emits the data as AMQP messages, by using the polling plug-ins (pollsters) registered in the compute polling namespace.

ceilometer-agent-ipmi

Polls the local node with IPMI support, in order to acquire IPMI sensor data and Intel Node Manager data, by using the polling plug-ins (pollsters) registered in the IPMI polling namespace.

ceilometer-agent-notification

Consumes AMQP messages from other OpenStack services.

ceilometer-collector

Consumes AMQP notifications from the agents, then dispatches these data to the appropriate data store.

ceilometer-alarm-evaluator

Determines when alarms fire due to the associated statistic trend crossing a threshold over a sliding time window.

ceilometer-alarm-notifier

Initiates alarm actions, for example calling out to a webhook with a description of the alarm state transition.

Note
Note
  1. The ceilometer-polling service is available since the Kilo release. It is intended to replace ceilometer-agent-central, ceilometer-agent-compute, and ceilometer-agent-ipmi.

  2. The ceilometer-alarm-evaluator and ceilometer-alarm-notifier services are removed in Mitaka release.

Except for the ceilometer-agent-compute and the ceilometer-agent-ipmi services, all the other services are placed on one or more controller nodes.

The Telemetry architecture highly depends on the AMQP service both for consuming notifications coming from OpenStack services and internal communication.

10.1.1 Supported databases

The other key external component of Telemetry is the database, where events, samples, alarm definitions, and alarms are stored.

Note
Note

Multiple database back ends can be configured in order to store events, samples, and alarms separately. We recommend Gnocchi for time-series storage.

The list of supported database back ends:

10.1.2 Supported hypervisors

The Telemetry service collects information about the virtual machines, which requires close connection to the hypervisor that runs on the compute hosts.

The following is a list of supported hypervisors.

10.1.3 Supported networking services

Telemetry is able to retrieve information from OpenStack Networking and external networking services:

  • OpenStack Networking:

    • Basic network meters

    • Firewall-as-a-Service (FWaaS) meters

    • Load-Balancer-as-a-Service (LBaaS) meters

    • VPN-as-a-Service (VPNaaS) meters

  • SDN controller meters:

10.1.4 Users, roles, and projects

This service of OpenStack uses OpenStack Identity for authenticating and authorizing users. The required configuration options are listed in the Telemetry section in the OpenStack Configuration Reference.

The system uses two roles:admin and non-admin. The authorization happens before processing each API request. The amount of returned data depends on the role the requestor owns.

The creation of alarm definitions also highly depends on the role of the user, who initiated the action. Further details about Section 10.5, “Alarms” handling can be found in this guide.

10.2 Data collection

The main responsibility of Telemetry in OpenStack is to collect information about the system that can be used by billing systems or interpreted by analytic tooling. Telemetry in OpenStack originally focused on the counters used for billing, and the recorded range is continuously growing wider.

Collected data can be stored in the form of samples or events in the supported databases, which are listed in Section 10.1.1, “Supported databases”.

Samples can have various sources. Sample sources depend on, and adapt to, the needs and configuration of Telemetry. The Telemetry service requires multiple methods to collect data samples.

The available data collection mechanisms are:

Notifications

Processing notifications from other OpenStack services, by consuming messages from the configured message queue system.

Polling

Retrieve information directly from the hypervisor or from the host machine using SNMP, or by using the APIs of other OpenStack services.

RESTful API

Pushing samples via the RESTful API of Telemetry.

10.2.1 Notifications

All OpenStack services send notifications about the executed operations or system state. Several notifications carry information that can be metered. For example, CPU time of a VM instance created by OpenStack Compute service.

The notification agent works alongside, but separately, from the Telemetry service. The agent is responsible for consuming notifications. This component is responsible for consuming from the message bus and transforming notifications into events and measurement samples.

Since the Liberty release, the notification agent is responsible for all data processing such as transformations and publishing. After processing, the data is sent via AMQP to the collector service or any external service. These external services persist the data in configured databases.

The different OpenStack services emit several notifications about the various types of events that happen in the system during normal operation. Not all these notifications are consumed by the Telemetry service, as the intention is only to capture the billable events and notifications that can be used for monitoring or profiling purposes. The notification agent filters by the event type. Each notification message contains the event type. The following table contains the event types by each OpenStack service that Telemetry transforms into samples.

OpenStack service

Event types

Note

OpenStack Compute

scheduler.run_instance.scheduled

scheduler.select_destinations

compute.instance.*

For a more detailed list of Compute notifications please check the System Usage Data wiki page.

Bare metal service

hardware.ipmi.*

 

OpenStack Image

image.update

image.upload

image.delete

image.send

The required configuration for Image service can be * - service found in Configure the Image service for Telemetry section in the Installation Tutorials and Guides.

OpenStack Networking

floatingip.create.end

floatingip.update.*

floatingip.exists

network.create.end

network.update.*

network.exists

port.create.end

port.update.*

port.exists

router.create.end

router.update.*

router.exists

subnet.create.end

subnet.update.*

subnet.exists

l3.meter

 

Orchestration service

orchestration.stack.create.end

orchestration.stack.update.end

orchestration.stack.delete.end

orchestration.stack.resume.end

orchestration.stack.suspend.end

 

OpenStack Block Storage

volume.exists

volume.create.*

volume.delete.*

volume.update.*

volume.resize.*

volume.attach.*

volume.detach.*

snapshot.exists

snapshot.create.*

snapshot.delete.*

snapshot.update.*

volume.backup.create.*

volume.backup.delete.*

volume.backup.restore.*

The required configuration for Block Storage service can be found in the Add the Block Storage service agent for Telemetry section in the Installation Tutorials and Guides.

Note
Note

Some services require additional configuration to emit the notifications using the correct control exchange on the message queue and so forth. These configuration needs are referred in the above table for each OpenStack service that needs it.

Specific notifications from the Compute service are important for administrators and users. Configuring nova_notifications in the nova.conf file allows administrators to respond to events rapidly. For more information on configuring notifications for the compute service, see Telemetry services in the Installation Tutorials and Guides.

Note
Note

When the store_events option is set to True in ceilometer.conf, Prior to the Kilo release, the notification agent needed database access in order to work properly.

10.2.1.1 Compute agent

This agent is responsible for collecting resource usage data of VM instances on individual compute nodes within an OpenStack deployment. This mechanism requires a closer interaction with the hypervisor, therefore a separate agent type fulfills the collection of the related meters, which is placed on the host machines to retrieve this information locally.

A Compute agent instance has to be installed on each and every compute node, installation instructions can be found in the Install the Compute agent for Telemetry section in the Installation Tutorials and Guides.

Just like the central agent, this component also does not need a direct database connection. The samples are sent via AMQP to the notification agent.

The list of supported hypervisors can be found in Section 10.1.2, “Supported hypervisors”. The Compute agent uses the API of the hypervisor installed on the compute hosts. Therefore, the supported meters may be different in case of each virtualization back end, as each inspection tool provides a different set of meters.

The list of collected meters can be found in Section 10.6.1, “OpenStack Compute”. The support column provides the information about which meter is available for each hypervisor supported by the Telemetry service.

Note
Note

Telemetry supports Libvirt, which hides the hypervisor under it.

10.2.1.2 Middleware for the OpenStack Object Storage service

A subset of Object Store statistics requires additional middleware to be installed behind the proxy of Object Store. This additional component emits notifications containing data-flow-oriented meters, namely the storage.objects.(incoming|outgoing).bytes values. The list of these meters are listed in Section 10.6.7, “OpenStack Object Storage”, marked with notification as origin.

The instructions on how to install this middleware can be found in Configure the Object Storage service for Telemetry section in the Installation Tutorials and Guides.

10.2.1.3 Telemetry middleware

Telemetry provides HTTP request and API endpoint counting capability in OpenStack. This is achieved by storing a sample for each event marked as audit.http.request, audit.http.response, http.request or http.response.

It is recommended that these notifications be consumed as events rather than samples to better index the appropriate values and avoid massive load on the Metering database. If preferred, Telemetry can consume these events as samples if the services are configured to emit http.* notifications.

10.2.2 Polling

The Telemetry service is intended to store a complex picture of the infrastructure. This goal requires additional information than what is provided by the events and notifications published by each service. Some information is not emitted directly, like resource usage of the VM instances.

Therefore Telemetry uses another method to gather this data by polling the infrastructure including the APIs of the different OpenStack services and other assets, like hypervisors. The latter case requires closer interaction with the compute hosts. To solve this issue, Telemetry uses an agent based architecture to fulfill the requirements against the data collection.

There are three types of agents supporting the polling mechanism, the compute agent, the central agent, and the IPMI agent. Under the hood, all the types of polling agents are the same ceilometer-polling agent, except that they load different polling plug-ins (pollsters) from different namespaces to gather data. The following subsections give further information regarding the architectural and configuration details of these components.

Running ceilometer-agent-compute is exactly the same as:

$ ceilometer-polling --polling-namespaces compute

Running ceilometer-agent-central is exactly the same as:

$ ceilometer-polling --polling-namespaces central

Running ceilometer-agent-ipmi is exactly the same as:

$ ceilometer-polling --polling-namespaces ipmi

In addition to loading all the polling plug-ins registered in the specified namespaces, the ceilometer-polling agent can also specify the polling plug-ins to be loaded by using the pollster-list option:

$ ceilometer-polling --polling-namespaces central \
        --pollster-list image image.size storage.*
Note
Note

HA deployment is NOT supported if the pollster-list option is used.

Note
Note

The ceilometer-polling service is available since Kilo release.

10.2.2.1 Central agent

This agent is responsible for polling public REST APIs to retrieve additional information on OpenStack resources not already surfaced via notifications, and also for polling hardware resources over SNMP.

The following services can be polled with this agent:

  • OpenStack Networking

  • OpenStack Object Storage

  • OpenStack Block Storage

  • Hardware resources via SNMP

  • Energy consumption meters via Kwapi framework

To install and configure this service use the Add the Telemetry service section in the Installation Tutorials and Guides.

The central agent does not need direct database connection. The samples collected by this agent are sent via AMQP to the notification agent to be processed.

Note
Note

Prior to the Liberty release, data from the polling agents was processed locally and published accordingly rather than by the notification agent.

10.2.2.2 IPMI agent

This agent is responsible for collecting IPMI sensor data and Intel Node Manager data on individual compute nodes within an OpenStack deployment. This agent requires an IPMI capable node with the ipmitool utility installed, which is commonly used for IPMI control on various Linux distributions.

An IPMI agent instance could be installed on each and every compute node with IPMI support, except when the node is managed by the Bare metal service and the conductor.send_sensor_data option is set to true in the Bare metal service. It is no harm to install this agent on a compute node without IPMI or Intel Node Manager support, as the agent checks for the hardware and if none is available, returns empty data. It is suggested that you install the IPMI agent only on an IPMI capable node for performance reasons.

Just like the central agent, this component also does not need direct database access. The samples are sent via AMQP to the notification agent.

The list of collected meters can be found in Section 10.6.2, “Bare metal service”.

Note
Note

Do not deploy both the IPMI agent and the Bare metal service on one compute node. If conductor.send_sensor_data is set, this misconfiguration causes duplicated IPMI sensor samples.

10.2.3 Support for HA deployment

Both the polling agents and notification agents can run in an HA deployment, which means that multiple instances of these services can run in parallel with workload partitioning among these running instances.

The Tooz library provides the coordination within the groups of service instances. It provides an API above several back ends that can be used for building distributed applications.

Tooz supports various drivers including the following back end solutions:

  • Zookeeper. Recommended solution by the Tooz project.

  • Redis. Recommended solution by the Tooz project.

  • Memcached. Recommended for testing.

You must configure a supported Tooz driver for the HA deployment of the Telemetry services.

For information about the required configuration options that have to be set in the ceilometer.conf configuration file for both the central and Compute agents, see the Coordination section in the OpenStack Configuration Reference.

10.2.3.1 Notification agent HA deployment

In the Kilo release, workload partitioning support was added to the notification agent. This is particularly useful as the pipeline processing is handled exclusively by the notification agent now which may result in a larger amount of load.

To enable workload partitioning by notification agent, the backend_url option must be set in the ceilometer.conf configuration file. Additionally, workload_partitioning should be enabled in the Notification section in the OpenStack Configuration Reference.

Note
Note

In Liberty, the notification agent creates multiple queues to divide the workload across all active agents. The number of queues can be controlled by the pipeline_processing_queues option in the ceilometer.conf configuration file. A larger value will result in better distribution of tasks but will also require more memory and longer startup time. It is recommended to have a value approximately three times the number of active notification agents. At a minimum, the value should be equal to the number of active agents.

10.2.3.2 Polling agent HA deployment

Note
Note

Without the backend_url option being set only one instance of both the central and Compute agent service is able to run and function correctly.

The availability check of the instances is provided by heartbeat messages. When the connection with an instance is lost, the workload will be reassigned within the remained instances in the next polling cycle.

Note
Note

Memcached uses a timeout value, which should always be set to a value that is higher than the heartbeat value set for Telemetry.

For backward compatibility and supporting existing deployments, the central agent configuration also supports using different configuration files for groups of service instances of this type that are running in parallel. For enabling this configuration set a value for the partitioning_group_prefix option in the polling section in the OpenStack Configuration Reference.

Warning
Warning

For each sub-group of the central agent pool with the same partitioning_group_prefix a disjoint subset of meters must be polled, otherwise samples may be missing or duplicated. The list of meters to poll can be set in the /etc/ceilometer/pipeline.yaml configuration file. For more information about pipelines see Section 10.3, “Data collection, processing, and pipelines”.

To enable the Compute agent to run multiple instances simultaneously with workload partitioning, the workload_partitioning option has to be set to True under the Compute section in the ceilometer.conf configuration file.

10.2.4 Send samples to Telemetry

While most parts of the data collection in the Telemetry service are automated, Telemetry provides the possibility to submit samples via the REST API to allow users to send custom samples into this service.

This option makes it possible to send any kind of samples without the need of writing extra code lines or making configuration changes.

The samples that can be sent to Telemetry are not limited to the actual existing meters. There is a possibility to provide data for any new, customer defined counter by filling out all the required fields of the POST request.

If the sample corresponds to an existing meter, then the fields like meter-type and meter name should be matched accordingly.

The required fields for sending a sample using the command-line client are:

  • ID of the corresponding resource. (--resource-id)

  • Name of meter. (--meter-name)

  • Type of meter. (--meter-type)

    Predefined meter types:

    • Gauge

    • Delta

    • Cumulative

  • Unit of meter. (--meter-unit)

  • Volume of sample. (--sample-volume)

To send samples to Telemetry using the command-line client, the following command should be invoked:

$ ceilometer sample-create -r 37128ad6-daaa-4d22-9509-b7e1c6b08697 \
  -m memory.usage --meter-type gauge --meter-unit MB --sample-volume 48
+-------------------+--------------------------------------------+
| Property          | Value                                      |
+-------------------+--------------------------------------------+
| message_id        | 6118820c-2137-11e4-a429-08002715c7fb       |
| name              | memory.usage                               |
| project_id        | e34eaa91d52a4402b4cb8bc9bbd308c1           |
| resource_id       | 37128ad6-daaa-4d22-9509-b7e1c6b08697       |
| resource_metadata | {}                                         |
| source            | e34eaa91d52a4402b4cb8bc9bbd308c1:openstack |
| timestamp         | 2014-08-11T09:10:46.358926                 |
| type              | gauge                                      |
| unit              | MB                                         |
| user_id           | 679b0499e7a34ccb9d90b64208401f8e           |
| volume            | 48.0                                       |
+-------------------+--------------------------------------------+

10.2.4.1 Meter definitions

The Telemetry service collects a subset of the meters by filtering notifications emitted by other OpenStack services. Starting with the Liberty release, you can find the meter definitions in a separate configuration file, called ceilometer/meter/data/meter.yaml. This enables operators/administrators to add new meters to Telemetry project by updating the meter.yaml file without any need for additional code changes.

Note
Note

The meter.yaml file should be modified with care. Unless intended do not remove any existing meter definitions from the file. Also, the collected meters can differ in some cases from what is referenced in the documentation.

A standard meter definition looks like:

---
metric:
  - name: 'meter name'
    event_type: 'event name'
    type: 'type of meter eg: gauge, cumulative or delta'
    unit: 'name of unit eg: MB'
    volume: 'path to a measurable value eg: $.payload.size'
    resource_id: 'path to resource id eg: $.payload.id'
    project_id: 'path to project id eg: $.payload.owner'

The definition above shows a simple meter definition with some fields, from which name, event_type, type, unit, and volume are required. If there is a match on the event type, samples are generated for the meter.

If you take a look at the meter.yaml file, it contains the sample definitions for all the meters that Telemetry is collecting from notifications. The value of each field is specified by using JSON path in order to find the right value from the notification message. In order to be able to specify the right field you need to be aware of the format of the consumed notification. The values that need to be searched in the notification message are set with a JSON path starting with $. For instance, if you need the size information from the payload you can define it like $.payload.size.

A notification message may contain multiple meters. You can use * in the meter definition to capture all the meters and generate samples respectively. You can use wild cards as shown in the following example:

---
metric:
  - name: $.payload.measurements.[*].metric.[*].name
    event_type: 'event_name.*'
    type: 'delta'
    unit: $.payload.measurements.[*].metric.[*].unit
    volume: payload.measurements.[*].result
    resource_id: $.payload.target
    user_id: $.payload.initiator.id
    project_id: $.payload.initiator.project_id

In the above example, the name field is a JSON path with matching a list of meter names defined in the notification message.

You can even use complex operations on JSON paths. In the following example, volume and resource_id fields perform an arithmetic and string concatenation:

---
metric:
- name: 'compute.node.cpu.idle.percent'
  event_type: 'compute.metrics.update'
  type: 'gauge'
  unit: 'percent'
  volume: payload.metrics[?(@.name='cpu.idle.percent')].value * 100
  resource_id: $.payload.host + "_" + $.payload.nodename

You can use the timedelta plug-in to evaluate the difference in seconds between two datetime fields from one notification.

---
metric:
- name: 'compute.instance.booting.time'
  event_type: 'compute.instance.create.end'
 type: 'gauge'
 unit: 'sec'
 volume:
   fields: [$.payload.created_at, $.payload.launched_at]
   plugin: 'timedelta'
 project_id: $.payload.tenant_id
 resource_id: $.payload.instance_id

You will find some existence meters in the meter.yaml. These meters have a volume as 1 and are at the bottom of the yaml file with a note suggesting that these will be removed in Mitaka release.

For example, the meter definition for existence meters is as follows:

---
metric:
  - name: 'meter name'
    type: 'delta'
    unit: 'volume'
    volume: 1
    event_type:
        - 'event type'
    resource_id: $.payload.volume_id
    user_id: $.payload.user_id
    project_id: $.payload.tenant_id

These meters are not loaded by default. To load these meters, flip the disable_non_metric_meters option in the ceilometer.conf file.

10.2.5 Block Storage audit script setup to get notifications

If you want to collect OpenStack Block Storage notification on demand, you can use cinder-volume-usage-audit from OpenStack Block Storage. This script becomes available when you install OpenStack Block Storage, so you can use it without any specific settings and you don't need to authenticate to access the data. To use it, you must run this command in the following format:

$ cinder-volume-usage-audit \
  --start_time='YYYY-MM-DD HH:MM:SS' --end_time='YYYY-MM-DD HH:MM:SS' --send_actions

This script outputs what volumes or snapshots were created, deleted, or exists in a given period of time and some information about these volumes or snapshots. Information about the existence and size of volumes and snapshots is store in the Telemetry service. This data is also stored as an event which is the recommended usage as it provides better indexing of data.

Using this script via cron you can get notifications periodically, for example, every 5 minutes:

*/5 * * * * /path/to/cinder-volume-usage-audit --send_actions

10.2.6 Storing samples

The Telemetry service has a separate service that is responsible for persisting the data that comes from the pollsters or is received as notifications. The data can be stored in a file or a database back end, for which the list of supported databases can be found in Section 10.1.1, “Supported databases”. The data can also be sent to an external data store by using an HTTP dispatcher.

The ceilometer-collector service receives the data as messages from the message bus of the configured AMQP service. It sends these datapoints without any modification to the configured target. The service has to run on a host machine from which it has access to the configured dispatcher.

Note
Note

Multiple dispatchers can be configured for Telemetry at one time.

Multiple ceilometer-collector processes can be run at a time. It is also supported to start multiple worker threads per collector process. The collector_workers configuration option has to be modified in the Collector section of the ceilometer.conf configuration file.

10.2.6.1 Database dispatcher

When the database dispatcher is configured as data store, you have the option to set a time_to_live option (ttl) for samples. By default the time to live value for samples is set to -1, which means that they are kept in the database forever.

The time to live value is specified in seconds. Each sample has a time stamp, and the ttl value indicates that a sample will be deleted from the database when the number of seconds has elapsed since that sample reading was stamped. For example, if the time to live is set to 600, all samples older than 600 seconds will be purged from the database.

Certain databases support native TTL expiration. In cases where this is not possible, a command-line script, which you can use for this purpose is ceilometer-expirer. You can run it in a cron job, which helps to keep your database in a consistent state.

The level of support differs in case of the configured back end:

Database

TTL value support

Note

MongoDB

Yes

MongoDB has native TTL support for deleting samples that are older than the configured ttl value.

SQL-based back ends

Yes

ceilometer-expirer has to be used for deleting samples and its related data from the database.

HBase

No

Telemetry's HBase support does not include native TTL nor ceilometer-expirer support.

DB2 NoSQL

No

DB2 NoSQL does not have native TTL nor ceilometer-expirer support.

10.2.6.2 HTTP dispatcher

The Telemetry service supports sending samples to an external HTTP target. The samples are sent without any modification. To set this option as the collector's target, the dispatcher has to be changed to http in the ceilometer.conf configuration file. For the list of options that you need to set, see the see the dispatcher_http section in the OpenStack Configuration Reference.

10.2.6.3 File dispatcher

You can store samples in a file by setting the dispatcher option in the ceilometer.conf file. For the list of configuration options, see the dispatcher_file section in the OpenStack Configuration Reference.

10.3 Data collection, processing, and pipelines

The mechanism by which data is collected and processed is called a pipeline. Pipelines, at the configuration level, describe a coupling between sources of data and the corresponding sinks for transformation and publication of data.

A source is a producer of data: samples or events. In effect, it is a set of pollsters or notification handlers emitting datapoints for a set of matching meters and event types.

Each source configuration encapsulates name matching, polling interval determination, optional resource enumeration or discovery, and mapping to one or more sinks for publication.

Data gathered can be used for different purposes, which can impact how frequently it needs to be published. Typically, a meter published for billing purposes needs to be updated every 30 minutes while the same meter may be needed for performance tuning every minute.

Warning
Warning

Rapid polling cadences should be avoided, as it results in a huge amount of data in a short time frame, which may negatively affect the performance of both Telemetry and the underlying database back end. We strongly recommend you do not use small granularity values like 10 seconds.

A sink, on the other hand, is a consumer of data, providing logic for the transformation and publication of data emitted from related sources.

In effect, a sink describes a chain of handlers. The chain starts with zero or more transformers and ends with one or more publishers. The first transformer in the chain is passed data from the corresponding source, takes some action such as deriving rate of change, performing unit conversion, or aggregating, before passing the modified data to the next step that is described in Section 10.4.3, “Publishers”.

10.3.1 Pipeline configuration

The pipeline configuration is, by default stored in separate configuration files called pipeline.yaml and event_pipeline.yaml next to the ceilometer.conf file. The meter pipeline and event pipeline configuration files can be set by the pipeline_cfg_file and event_pipeline_cfg_file options listed in the Description of configuration options for api table section in the OpenStack Configuration Reference respectively. Multiple pipelines can be defined in one pipeline configuration file.

The meter pipeline definition looks like:

---
sources:
  - name: 'source name'
    interval: 'how often should the samples be injected into the pipeline'
    meters:
      - 'meter filter'
    resources:
      - 'list of resource URLs'
    sinks
      - 'sink name'
sinks:
  - name: 'sink name'
    transformers: 'definition of transformers'
    publishers:
      - 'list of publishers'

The interval parameter in the sources section should be defined in seconds. It determines the polling cadence of sample injection into the pipeline, where samples are produced under the direct control of an agent.

There are several ways to define the list of meters for a pipeline source. The list of valid meters can be found in Section 10.6, “Measurements”. There is a possibility to define all the meters, or just included or excluded meters, with which a source should operate:

  • To include all meters, use the * wildcard symbol. It is highly advisable to select only the meters that you intend on using to avoid flooding the metering database with unused data.

  • To define the list of meters, use either of the following:

    • To define the list of included meters, use the meter_name syntax.

    • To define the list of excluded meters, use the !meter_name syntax.

    • For meters, which have variants identified by a complex name field, use the wildcard symbol to select all, for example, for instance:m1.tiny, use instance:\*.

Note
Note

The OpenStack Telemetry service does not have any duplication check between pipelines, and if you add a meter to multiple pipelines then it is assumed the duplication is intentional and may be stored multiple times according to the specified sinks.

The above definition methods can be used in the following combinations:

  • Use only the wildcard symbol.

  • Use the list of included meters.

  • Use the list of excluded meters.

  • Use wildcard symbol with the list of excluded meters.

Note
Note

At least one of the above variations should be included in the meters section. Included and excluded meters cannot co-exist in the same pipeline. Wildcard and included meters cannot co-exist in the same pipeline definition section.

The optional resources section of a pipeline source allows a static list of resource URLs to be configured for polling.

The transformers section of a pipeline sink provides the possibility to add a list of transformer definitions. The available transformers are:

Name of transformer

Reference name for configuration

Accumulator

accumulator

Aggregator

aggregator

Arithmetic

arithmetic

Rate of change

rate_of_change

Unit conversion

unit_conversion

Delta

delta

The publishers section contains the list of publishers, where the samples data should be sent after the possible transformations.

Similarly, the event pipeline definition looks like:

---
sources:
  - name: 'source name'
    events:
      - 'event filter'
    sinks
      - 'sink name'
sinks:
  - name: 'sink name'
    publishers:
      - 'list of publishers'

The event filter uses the same filtering logic as the meter pipeline.

10.3.1.1 Transformers

The definition of transformers can contain the following fields:

name

Name of the transformer.

parameters

Parameters of the transformer.

The parameters section can contain transformer specific fields, like source and target fields with different subfields in case of the rate of change, which depends on the implementation of the transformer.

In the case of the transformer that creates the cpu_util meter, the definition looks like:

transformers:
    - name: "rate_of_change"
      parameters:
          target:
              name: "cpu_util"
              unit: "%"
              type: "gauge"
              scale: "100.0 / (10**9 * (resource_metadata.cpu_number or 1))"

The rate of change the transformer generates is the cpu_util meter from the sample values of the cpu counter, which represents cumulative CPU time in nanoseconds. The transformer definition above defines a scale factor (for nanoseconds and multiple CPUs), which is applied before the transformation derives a sequence of gauge samples with unit %, from sequential values of the cpu meter.

The definition for the disk I/O rate, which is also generated by the rate of change transformer:

transformers:
    - name: "rate_of_change"
      parameters:
          source:
              map_from:
                  name: "disk\\.(read|write)\\.(bytes|requests)"
                  unit: "(B|request)"
          target:
              map_to:
                  name: "disk.\\1.\\2.rate"
                  unit: "\\1/s"
              type: "gauge"

10.3.1.2 Unit conversion transformer

Transformer to apply a unit conversion. It takes the volume of the meter and multiplies it with the given scale expression. Also supports map_from and map_to like the rate of change transformer.

Sample configuration:

transformers:
    - name: "unit_conversion"
      parameters:
          target:
              name: "disk.kilobytes"
              unit: "KB"
              scale: "volume * 1.0 / 1024.0"

With map_from and map_to:

transformers:
    - name: "unit_conversion"
      parameters:
          source:
              map_from:
                  name: "disk\\.(read|write)\\.bytes"
          target:
              map_to:
                  name: "disk.\\1.kilobytes"
              scale: "volume * 1.0 / 1024.0"
              unit: "KB"

10.3.1.3 Aggregator transformer

A transformer that sums up the incoming samples until enough samples have come in or a timeout has been reached.

Timeout can be specified with the retention_time option. If you want to flush the aggregation, after a set number of samples have been aggregated, specify the size parameter.

The volume of the created sample is the sum of the volumes of samples that came into the transformer. Samples can be aggregated by the attributes project_id, user_id and resource_metadata. To aggregate by the chosen attributes, specify them in the configuration and set which value of the attribute to take for the new sample (first to take the first sample's attribute, last to take the last sample's attribute, and drop to discard the attribute).

To aggregate 60s worth of samples by resource_metadata and keep the resource_metadata of the latest received sample:

transformers:
    - name: "aggregator"
      parameters:
          retention_time: 60
          resource_metadata: last

To aggregate each 15 samples by user_id and resource_metadata and keep the user_id of the first received sample and drop the resource_metadata:

transformers:
    - name: "aggregator"
      parameters:
          size: 15
          user_id: first
          resource_metadata: drop

10.3.1.4 Accumulator transformer

This transformer simply caches the samples until enough samples have arrived and then flushes them all down the pipeline at once:

transformers:
    - name: "accumulator"
      parameters:
          size: 15

10.3.1.5 Multi meter arithmetic transformer

This transformer enables us to perform arithmetic calculations over one or more meters and/or their metadata, for example:

memory_util = 100 * memory.usage / memory

A new sample is created with the properties described in the target section of the transformer's configuration. The sample's volume is the result of the provided expression. The calculation is performed on samples from the same resource.

Note
Note

The calculation is limited to meters with the same interval.

Example configuration:

transformers:
    - name: "arithmetic"
      parameters:
        target:
          name: "memory_util"
          unit: "%"
          type: "gauge"
          expr: "100 * $(memory.usage) / $(memory)"

To demonstrate the use of metadata, the following implementation of a novel meter shows average CPU time per core:

transformers:
    - name: "arithmetic"
      parameters:
        target:
          name: "avg_cpu_per_core"
          unit: "ns"
          type: "cumulative"
          expr: "$(cpu) / ($(cpu).resource_metadata.cpu_number or 1)"
Note
Note

Expression evaluation gracefully handles NaNs and exceptions. In such a case it does not create a new sample but only logs a warning.

10.3.1.6 Delta transformer

This transformer calculates the change between two sample datapoints of a resource. It can be configured to capture only the positive growth deltas.

Example configuration:

transformers:
    - name: "delta"
      parameters:
        target:
            name: "cpu.delta"
        growth_only: True

10.4 Data retrieval

The Telemetry service offers several mechanisms from which the persisted data can be accessed. As described in Section 10.1, “System architecture” and in Section 10.2, “Data collection”, the collected information can be stored in one or more database back ends, which are hidden by the Telemetry RESTful API.

Note
Note

It is highly recommended not to access the database directly and read or modify any data in it. The API layer hides all the changes in the actual database schema and provides a standard interface to expose the samples, alarms and so forth.

10.4.1 Telemetry v2 API

The Telemetry service provides a RESTful API, from which the collected samples and all the related information can be retrieved, like the list of meters, alarm definitions and so forth.

The Telemetry API URL can be retrieved from the service catalog provided by OpenStack Identity, which is populated during the installation process. The API access needs a valid token and proper permission to retrieve data, as described in Section 10.1.4, “Users, roles, and projects”.

Further information about the available API endpoints can be found in the Telemetry API Reference.

10.4.1.1 Query

The API provides some additional functionalities, like querying the collected data set. For the samples and alarms API endpoints, both simple and complex query styles are available, whereas for the other endpoints only simple queries are supported.

After validating the query parameters, the processing is done on the database side in the case of most database back ends in order to achieve better performance.

Simple query

Many of the API endpoints accept a query filter argument, which should be a list of data structures that consist of the following items:

  • field

  • op

  • value

  • type

Regardless of the endpoint on which the filter is applied on, it will always target the fields of the Sample type.

Several fields of the API endpoints accept shorter names than the ones defined in the reference. The API will do the transformation internally and return the output with the fields that are listed in the API reference. The fields are the following:

  • project_id: project

  • resource_id: resource

  • user_id: user

When a filter argument contains multiple constraints of the above form, a logical AND relation between them is implied.

Complex query

The filter expressions of the complex query feature operate on the fields of Sample, Alarm and AlarmChange types. The following comparison operators are supported:

  • =

  • !=

  • <

  • <=

  • >

  • >=

The following logical operators can be used:

  • and

  • or

  • not

Note
Note

The not operator has different behavior in MongoDB and in the SQLAlchemy-based database engines. If the not operator is applied on a non existent metadata field then the result depends on the database engine. In case of MongoDB, it will return every sample as the not operator is evaluated true for every sample where the given field does not exist. On the other hand the SQL-based database engine will return an empty result because of the underlying join operation.

Complex query supports specifying a list of orderby expressions. This means that the result of the query can be ordered based on the field names provided in this list. When multiple keys are defined for the ordering, these will be applied sequentially in the order of the specification. The second expression will be applied on the groups for which the values of the first expression are the same. The ordering can be ascending or descending.

The number of returned items can be bounded using the limit option.

The filter, orderby and limit fields are optional.

Note
Note

As opposed to the simple query, complex query is available via a separate API endpoint. For more information see the Telemetry v2 Web API Reference.

10.4.1.2 Statistics

The sample data can be used in various ways for several purposes, like billing or profiling. In external systems the data is often used in the form of aggregated statistics. The Telemetry API provides several built-in functions to make some basic calculations available without any additional coding.

Telemetry supports the following statistics and aggregation functions:

avg

Average of the sample volumes over each period.

cardinality

Count of distinct values in each period identified by a key specified as the parameter of this aggregate function. The supported parameter values are:

  • project_id

  • resource_id

  • user_id

Note
Note

The aggregate.param option is required.

count

Number of samples in each period.

max

Maximum of the sample volumes in each period.

min

Minimum of the sample volumes in each period.

stddev

Standard deviation of the sample volumes in each period.

sum

Sum of the sample volumes over each period.

The simple query and the statistics functionality can be used together in a single API request.

10.4.2 Telemetry command-line client and SDK

The Telemetry service provides a command-line client, with which the collected data is available just as the alarm definition and retrieval options. The client uses the Telemetry RESTful API in order to execute the requested operations.

To be able to use the ceilometer command, the python-ceilometerclient package needs to be installed and configured properly. For details about the installation process, see the Telemetry chapter in the Installation Tutorials and Guides.

Note
Note

The Telemetry service captures the user-visible resource usage data. Therefore the database will not contain any data without the existence of these resources, like VM images in the OpenStack Image service.

Similarly to other OpenStack command-line clients, the ceilometer client uses OpenStack Identity for authentication. The proper credentials and --auth_url parameter have to be defined via command line parameters or environment variables.

This section provides some examples without the aim of completeness. These commands can be used for instance for validating an installation of Telemetry.

To retrieve the list of collected meters, the following command should be used:

$ ceilometer meter-list
+------------------------+------------+------+------------------------------------------+----------------------------------+----------------------------------+
| Name                   | Type       | Unit | Resource ID                              | User ID                          | Project ID                       |
+------------------------+------------+------+------------------------------------------+----------------------------------+----------------------------------+
| cpu                    | cumulative | ns   | bb52e52b-1e42-4751-b3ac-45c52d83ba07     | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| cpu                    | cumulative | ns   | c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b     | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| cpu_util               | gauge      | %    | bb52e52b-1e42-4751-b3ac-45c52d83ba07     | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| cpu_util               | gauge      | %    | c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b     | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.device.read.bytes | cumulative | B    | bb52e52b-1e42-4751-b3ac-45c52d83ba07-hdd | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.device.read.bytes | cumulative | B    | bb52e52b-1e42-4751-b3ac-45c52d83ba07-vda | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.device.read.bytes | cumulative | B    | c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b-hdd | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.device.read.bytes | cumulative | B    | c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b-vda | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| ...                                                                                                                                                         |
+------------------------+------------+------+------------------------------------------+----------------------------------+----------------------------------+

The ceilometer command was run with admin rights, which means that all the data is accessible in the database. For more information about access right see Section 10.1.4, “Users, roles, and projects”. As it can be seen in the above example, there are two VM instances existing in the system, as there are VM instance related meters on the top of the result list. The existence of these meters does not indicate that these instances are running at the time of the request. The result contains the currently collected meters per resource, in an ascending order based on the name of the meter.

Samples are collected for each meter that is present in the list of meters, except in case of instances that are not running or deleted from the OpenStack Compute database. If an instance no longer exists and there is a time_to_live value set in the ceilometer.conf configuration file, then a group of samples are deleted in each expiration cycle. When the last sample is deleted for a meter, the database can be cleaned up by running ceilometer-expirer and the meter will not be present in the list above anymore. For more information about the expiration procedure see Section 10.2.6, “Storing samples”.

The Telemetry API supports simple query on the meter endpoint. The query functionality has the following syntax:

--query <field1><operator1><value1>;...;<field_n><operator_n><value_n>

The following command needs to be invoked to request the meters of one VM instance:

$ ceilometer meter-list --query resource=bb52e52b-1e42-4751-b3ac-45c52d83ba07
+-------------------------+------------+-----------+--------------------------------------+----------------------------------+----------------------------------+
| Name                    | Type       | Unit      | Resource ID                          | User ID                          | Project ID                       |
+-------------------------+------------+-----------+--------------------------------------+----------------------------------+----------------------------------+
| cpu                     | cumulative | ns        | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| cpu_util                | gauge      | %         | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| cpu_l3_cache            | gauge      | B         | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.ephemeral.size     | gauge      | GB        | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.read.bytes         | cumulative | B         | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.read.bytes.rate    | gauge      | B/s       | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.read.requests      | cumulative | request   | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.read.requests.rate | gauge      | request/s | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.root.size          | gauge      | GB        | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.write.bytes        | cumulative | B         | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.write.bytes.rate   | gauge      | B/s       | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.write.requests     | cumulative | request   | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| disk.write.requests.rate| gauge      | request/s | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| instance                | gauge      | instance  | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| instance:m1.tiny        | gauge      | instance  | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| memory                  | gauge      | MB        | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
| vcpus                   | gauge      | vcpu      | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f |
+-------------------------+------------+-----------+--------------------------------------+----------------------------------+----------------------------------+

As it was described above, the whole set of samples can be retrieved that are stored for a meter or filtering the result set by using one of the available query types. The request for all the samples of the cpu meter without any additional filtering looks like the following:

$ ceilometer sample-list --meter cpu
+--------------------------------------+-------+------------+------------+------+---------------------+
| Resource ID                          | Meter | Type       | Volume     | Unit | Timestamp           |
+--------------------------------------+-------+------------+------------+------+---------------------+
| c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b | cpu   | cumulative | 5.4863e+11 | ns   | 2014-08-31T11:17:03 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu   | cumulative | 5.7848e+11 | ns   | 2014-08-31T11:17:03 |
| c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b | cpu   | cumulative | 5.4811e+11 | ns   | 2014-08-31T11:07:05 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu   | cumulative | 5.7797e+11 | ns   | 2014-08-31T11:07:05 |
| c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b | cpu   | cumulative | 5.3589e+11 | ns   | 2014-08-31T10:27:19 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu   | cumulative | 5.6397e+11 | ns   | 2014-08-31T10:27:19 |
| ...                                                                                                 |
+--------------------------------------+-------+------------+------------+------+---------------------+

The result set of the request contains the samples for both instances ordered by the timestamp field in the default descending order.

The simple query makes it possible to retrieve only a subset of the collected samples. The following command can be executed to request the cpu samples of only one of the VM instances:

$ ceilometer sample-list --meter cpu --query resource=bb52e52b-1e42-4751-
  b3ac-45c52d83ba07
+--------------------------------------+------+------------+------------+------+---------------------+
| Resource ID                          | Name | Type       | Volume     | Unit | Timestamp           |
+--------------------------------------+------+------------+------------+------+---------------------+
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu  | cumulative | 5.7906e+11 | ns   | 2014-08-31T11:27:08 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu  | cumulative | 5.7848e+11 | ns   | 2014-08-31T11:17:03 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu  | cumulative | 5.7797e+11 | ns   | 2014-08-31T11:07:05 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu  | cumulative | 5.6397e+11 | ns   | 2014-08-31T10:27:19 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu  | cumulative | 5.6207e+11 | ns   | 2014-08-31T10:17:03 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu  | cumulative | 5.3831e+11 | ns   | 2014-08-31T08:41:57 |
| ...                                                                                                |
+--------------------------------------+------+------------+------------+------+---------------------+

As it can be seen on the output above, the result set contains samples for only one instance of the two.

The ceilometer query-samples command is used to execute rich queries. This command accepts the following parameters:

--filter

Contains the filter expression for the query in the form of: {complex_op: [{simple_op: {field_name: value}}]}.

--orderby

Contains the list of orderby expressions in the form of: [{field_name: direction}, {field_name: direction}].

--limit

Specifies the maximum number of samples to return.

For more information about complex queries see Section 10.4.1.1, “Query”.

As the complex query functionality provides the possibility of using complex operators, it is possible to retrieve a subset of samples for a given VM instance. To request for the first six samples for the cpu and disk.read.bytes meters, the following command should be invoked:

$ ceilometer query-samples --filter '{"and": \
  [{"=":{"resource":"bb52e52b-1e42-4751-b3ac-45c52d83ba07"}},{"or":[{"=":{"counter_name":"cpu"}}, \
  {"=":{"counter_name":"disk.read.bytes"}}]}]}' --orderby '[{"timestamp":"asc"}]' --limit 6
+--------------------------------------+-----------------+------------+------------+------+---------------------+
| Resource ID                          | Meter           | Type       | Volume     | Unit | Timestamp           |
+--------------------------------------+-----------------+------------+------------+------+---------------------+
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | disk.read.bytes | cumulative | 385334.0   | B    | 2014-08-30T13:00:46 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu             | cumulative | 1.2132e+11 | ns   | 2014-08-30T13:00:47 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu             | cumulative | 1.4295e+11 | ns   | 2014-08-30T13:10:51 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | disk.read.bytes | cumulative | 601438.0   | B    | 2014-08-30T13:10:51 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | disk.read.bytes | cumulative | 601438.0   | B    | 2014-08-30T13:20:33 |
| bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu             | cumulative | 1.4795e+11 | ns   | 2014-08-30T13:20:34 |
+--------------------------------------+-----------------+------------+------------+------+---------------------+

Ceilometer also captures data as events, which represents the state of a resource. Refer to /telemetry-events for more information regarding Events.

To retrieve a list of recent events that occurred in the system, the following command can be executed:

$ ceilometer event-list
+--------------------------------------+---------------+----------------------------+-----------------------------------------------------------------+
| Message ID                           | Event Type    | Generated                  | Traits                                                          |
+--------------------------------------+---------------+----------------------------+-----------------------------------------------------------------+
| dfdb87b6-92c6-4d40-b9b5-ba308f304c13 | image.create  | 2015-09-24T22:17:39.498888 | +---------+--------+-----------------+                          |
|                                      |               |                            | |   name  |  type  |      value      |                          |
|                                      |               |                            | +---------+--------+-----------------+                          |
|                                      |               |                            | | service | string | image.localhost |                          |
|                                      |               |                            | +---------+--------+-----------------+                          |
| 84054bc6-2ae6-4b93-b5e7-06964f151cef | image.prepare | 2015-09-24T22:17:39.594192 | +---------+--------+-----------------+                          |
|                                      |               |                            | |   name  |  type  |      value      |                          |
|                                      |               |                            | +---------+--------+-----------------+                          |
|                                      |               |                            | | service | string | image.localhost |                          |
|                                      |               |                            | +---------+--------+-----------------+                          |
| 2ec99c2c-08ee-4079-bf80-27d4a073ded6 | image.update  | 2015-09-24T22:17:39.578336 | +-------------+--------+--------------------------------------+ |
|                                      |               |                            | |     name    |  type  |                value                 | |
|                                      |               |                            | +-------------+--------+--------------------------------------+ |
|                                      |               |                            | |  created_at | string |         2015-09-24T22:17:39Z         | |
|                                      |               |                            | |     name    | string |    cirros-0.3.4-x86_64-uec-kernel    | |
|                                      |               |                            | |  project_id | string |   56ffddea5b4f423496444ea36c31be23   | |
|                                      |               |                            | | resource_id | string | 86eb8273-edd7-4483-a07c-002ff1c5657d | |
|                                      |               |                            | |   service   | string |           image.localhost            | |
|                                      |               |                            | |    status   | string |                saving                | |
|                                      |               |                            | |   user_id   | string |   56ffddea5b4f423496444ea36c31be23   | |
|                                      |               |                            | +-------------+--------+--------------------------------------+ |
+--------------------------------------+---------------+----------------------------+-----------------------------------------------------------------+
Note
Note

In Liberty, the data returned corresponds to the role and user. Non-admin users will only return events that are scoped to them. Admin users will return all events related to the project they administer as well as all unscoped events.

Similar to querying meters, additional filter parameters can be given to retrieve specific events:

$ ceilometer event-list -q 'event_type=compute.instance.exists; \
  instance_type=m1.tiny'
+--------------------------------------+-------------------------+----------------------------+----------------------------------------------------------------------------------+
| Message ID                           | Event Type              | Generated                  | Traits                                                                           |
+--------------------------------------+-------------------------+----------------------------+----------------------------------------------------------------------------------+
| 134a2ab3-6051-496c-b82f-10a3c367439a | compute.instance.exists | 2015-09-25T03:00:02.152041 | +------------------------+----------+------------------------------------------+ |
|                                      |                         |                            | |          name          |   type   |                  value                   | |
|                                      |                         |                            | +------------------------+----------+------------------------------------------+ |
|                                      |                         |                            | | audit_period_beginning | datetime |           2015-09-25T02:00:00            | |
|                                      |                         |                            | |  audit_period_ending   | datetime |           2015-09-25T03:00:00            | |
|                                      |                         |                            | |        disk_gb         | integer  |                    1                     | |
|                                      |                         |                            | |      ephemeral_gb      | integer  |                    0                     | |
|                                      |                         |                            | |          host          |  string  |          localhost.localdomain           | |
|                                      |                         |                            | |      instance_id       |  string  |   2115f189-c7f1-4228-97bc-d742600839f2   | |
|                                      |                         |                            | |     instance_type      |  string  |                 m1.tiny                  | |
|                                      |                         |                            | |    instance_type_id    | integer  |                    2                     | |
|                                      |                         |                            | |      launched_at       | datetime |           2015-09-24T22:24:56            | |
|                                      |                         |                            | |       memory_mb        | integer  |                   512                    | |
|                                      |                         |                            | |       project_id       |  string  |     56ffddea5b4f423496444ea36c31be23     | |
|                                      |                         |                            | |       request_id       |  string  | req-c6292b21-bf98-4a1d-b40c-cebba4d09a67 | |
|                                      |                         |                            | |        root_gb         | integer  |                    1                     | |
|                                      |                         |                            | |        service         |  string  |                 compute                  | |
|                                      |                         |                            | |         state          |  string  |                  active                  | |
|                                      |                         |                            | |       tenant_id        |  string  |     56ffddea5b4f423496444ea36c31be23     | |
|                                      |                         |                            | |        user_id         |  string  |     0b3d725756f94923b9d0c4db864d06a9     | |
|                                      |                         |                            | |         vcpus          | integer  |                    1                     | |
|                                      |                         |                            | +------------------------+----------+------------------------------------------+ |
+--------------------------------------+-------------------------+----------------------------+----------------------------------------------------------------------------------+
Note
Note

As of the Liberty release, the number of items returned will be restricted to the value defined by default_api_return_limit in the ceilometer.conf configuration file. Alternatively, the value can be set per query by passing the limit option in the request.

10.4.2.1 Telemetry Python bindings

The command-line client library provides python bindings in order to use the Telemetry Python API directly from python programs.

The first step in setting up the client is to create a client instance with the proper credentials:

>>> import ceilometerclient.client
>>> cclient = ceilometerclient.client.get_client(VERSION, username=USERNAME, password=PASSWORD, tenant_name=PROJECT_NAME, auth_url=AUTH_URL)

The VERSION parameter can be 1 or 2, specifying the API version to be used.

The method calls look like the following:

>>> cclient.meters.list()
 [<Meter ...>, ...]

>>> cclient.samples.list()
 [<Sample ...>, ...]

For further details about the python-ceilometerclient package, see the Python bindings to the OpenStack Ceilometer API reference.

10.4.3 Publishers

The Telemetry service provides several transport methods to forward the data collected to the ceilometer-collector service or to an external system. The consumers of this data are widely different, like monitoring systems, for which data loss is acceptable and billing systems, which require reliable data transportation. Telemetry provides methods to fulfill the requirements of both kind of systems, as it is described below.

The publisher component makes it possible to persist the data into storage through the message bus or to send it to one or more external consumers. One chain can contain multiple publishers.

To solve the above mentioned problem, the notion of multi-publisher can be configured for each datapoint within the Telemetry service, allowing the same technical meter or event to be published multiple times to multiple destinations, each potentially using a different transport.

Publishers can be specified in the publishers section for each pipeline (for further details about pipelines see Section 10.3, “Data collection, processing, and pipelines”) that is defined in the pipeline.yaml file.

The following publisher types are supported:

direct

It can be specified in the form of direct://?dispatcher=http. The dispatcher's options include database, file, http, and gnocchi. For more details on dispatcher, see Section 10.2.6, “Storing samples”. It emits data in the configured dispatcher directly, default configuration (the form is direct://) is database dispatcher. In the Mitaka release, this method can only emit data to the database dispatcher, and the form is direct://.

notifier

It can be specified in the form of notifier://?option1=value1&option2=value2. It emits data over AMQP using oslo.messaging. This is the recommended method of publishing.

rpc

It can be specified in the form of rpc://?option1=value1&option2=value2. It emits metering data over lossy AMQP. This method is synchronous and may experience performance issues. This publisher is deprecated in Liberty in favor of the notifier publisher.

udp

It can be specified in the form of udp://<host>:<port>/. It emits metering data for over UDP.

file

It can be specified in the form of file://path?option1=value1&option2=value2. This publisher records metering data into a file.

Note
Note

If a file name and location is not specified, this publisher does not log any meters, instead it logs a warning message in the configured log file for Telemetry.

kafka

It can be specified in the form of: kafka://kafka_broker_ip: kafka_broker_port?topic=kafka_topic &option1=value1.

This publisher sends metering data to a kafka broker.

Note
Note

If the topic parameter is missing, this publisher brings out metering data under a topic name, ceilometer. When the port number is not specified, this publisher uses 9092 as the broker's port.

The following options are available for rpc and notifier. The policy option can be used by kafka publisher:

per_meter_topic

The value of it is 1. It is used for publishing the samples on additional metering_topic.sample_name topic queue besides the default metering_topic queue.

policy

It is used for configuring the behavior for the case, when the publisher fails to send the samples, where the possible predefined values are the following:

default

Used for waiting and blocking until the samples have been sent.

drop

Used for dropping the samples which are failed to be sent.

queue

Used for creating an in-memory queue and retrying to send the samples on the queue on the next samples publishing period (the queue length can be configured with max_queue_length, where 1024 is the default value).

The following option is additionally available for the notifier publisher:

topic

The topic name of queue to publish to. Setting this will override the default topic defined by metering_topic and event_topic options. This option can be used to support multiple consumers. Support for this feature was added in Kilo.

The following options are available for the file publisher:

max_bytes

When this option is greater than zero, it will cause a rollover. When the size is about to be exceeded, the file is closed and a new file is silently opened for output. If its value is zero, rollover never occurs.

backup_count

If this value is non-zero, an extension will be appended to the filename of the old log, as '.1', '.2', and so forth until the specified value is reached. The file that is written and contains the newest data is always the one that is specified without any extensions.

The default publisher is notifier, without any additional options specified. A sample publishers section in the /etc/ceilometer/pipeline.yaml looks like the following:

publishers:
    - udp://10.0.0.2:1234
    - rpc://?per_meter_topic=1 (deprecated in Liberty)
    - notifier://?policy=drop&max_queue_length=512&topic=custom_target
    - direct://?dispatcher=http

10.5 Alarms

Alarms provide user-oriented Monitoring-as-a-Service for resources running on OpenStack. This type of monitoring ensures you can automatically scale in or out a group of instances through the Orchestration service, but you can also use alarms for general-purpose awareness of your cloud resources' health.

These alarms follow a tri-state model:

ok

The rule governing the alarm has been evaluated as False.

alarm

The rule governing the alarm have been evaluated as True.

insufficient data

There are not enough datapoints available in the evaluation periods to meaningfully determine the alarm state.

10.5.1 Alarm definitions

The definition of an alarm provides the rules that govern when a state transition should occur, and the actions to be taken thereon. The nature of these rules depend on the alarm type.

10.5.1.1 Threshold rule alarms

For conventional threshold-oriented alarms, state transitions are governed by:

  • A static threshold value with a comparison operator such as greater than or less than.

  • A statistic selection to aggregate the data.

  • A sliding time window to indicate how far back into the recent past you want to look.

10.5.1.2 Combination rule alarms

The Telemetry service also supports the concept of a meta-alarm, which aggregates over the current state of a set of underlying basic alarms combined via a logical operator (AND or OR).

10.5.2 Alarm dimensioning

A key associated concept is the notion of dimensioning which defines the set of matching meters that feed into an alarm evaluation. Recall that meters are per-resource-instance, so in the simplest case an alarm might be defined over a particular meter applied to all resources visible to a particular user. More useful however would be the option to explicitly select which specific resources you are interested in alarming on.

At one extreme you might have narrowly dimensioned alarms where this selection would have only a single target (identified by resource ID). At the other extreme, you could have widely dimensioned alarms where this selection identifies many resources over which the statistic is aggregated. For example all instances booted from a particular image or all instances with matching user metadata (the latter is how the Orchestration service identifies autoscaling groups).

10.5.3 Alarm evaluation

Alarms are evaluated by the alarm-evaluator service on a periodic basis, defaulting to once every minute.

10.5.3.1 Alarm actions

Any state transition of individual alarm (to ok, alarm, or insufficient data) may have one or more actions associated with it. These actions effectively send a signal to a consumer that the state transition has occurred, and provide some additional context. This includes the new and previous states, with some reason data describing the disposition with respect to the threshold, the number of datapoints involved and most recent of these. State transitions are detected by the alarm-evaluator, whereas the alarm-notifier effects the actual notification action.

Webhooks

These are the de facto notification type used by Telemetry alarming and simply involve an HTTP POST request being sent to an endpoint, with a request body containing a description of the state transition encoded as a JSON fragment.

Log actions

These are a lightweight alternative to webhooks, whereby the state transition is simply logged by the alarm-notifier, and are intended primarily for testing purposes.

10.5.3.2 Workload partitioning

The alarm evaluation process uses the same mechanism for workload partitioning as the central and compute agents. The Tooz library provides the coordination within the groups of service instances. For further information about this approach, see the section called Section 10.2.3, “Support for HA deployment”.

To use this workload partitioning solution set the evaluation_service option to default. For more information, see the alarm section in the OpenStack Configuration Reference.

10.5.4 Using alarms

10.5.4.1 Alarm creation

An example of creating a threshold-oriented alarm, based on an upper bound on the CPU utilization for a particular instance:

$ ceilometer alarm-threshold-create --name cpu_hi \
  --description 'instance running hot' \
  --meter-name cpu_util --threshold 70.0 \
  --comparison-operator gt --statistic avg \
  --period 600 --evaluation-periods 3 \
  --alarm-action 'log://' \
  --query resource_id=INSTANCE_ID

This creates an alarm that will fire when the average CPU utilization for an individual instance exceeds 70% for three consecutive 10 minute periods. The notification in this case is simply a log message, though it could alternatively be a webhook URL.

Note
Note

Alarm names must be unique for the alarms associated with an individual project. Administrator can limit the maximum resulting actions for three different states, and the ability for a normal user to create log:// and test:// notifiers is disabled. This prevents unintentional consumption of disk and memory resources by the Telemetry service.

The sliding time window over which the alarm is evaluated is 30 minutes in this example. This window is not clamped to wall-clock time boundaries, rather it's anchored on the current time for each evaluation cycle, and continually creeps forward as each evaluation cycle rolls around (by default, this occurs every minute).

The period length is set to 600s in this case to reflect the out-of-the-box default cadence for collection of the associated meter. This period matching illustrates an important general principal to keep in mind for alarms:

Note
Note

The alarm period should be a whole number multiple (1 or more) of the interval configured in the pipeline corresponding to the target meter.

Otherwise the alarm will tend to flit in and out of the insufficient data state due to the mismatch between the actual frequency of datapoints in the metering store and the statistics queries used to compare against the alarm threshold. If a shorter alarm period is needed, then the corresponding interval should be adjusted in the pipeline.yaml file.

Other notable alarm attributes that may be set on creation, or via a subsequent update, include:

state

The initial alarm state (defaults to insufficient data).

description

A free-text description of the alarm (defaults to a synopsis of the alarm rule).

enabled

True if evaluation and actioning is to be enabled for this alarm (defaults to True).

repeat-actions

True if actions should be repeatedly notified while the alarm remains in the target state (defaults to False).

ok-action

An action to invoke when the alarm state transitions to ok.

insufficient-data-action

An action to invoke when the alarm state transitions to insufficient data.

time-constraint

Used to restrict evaluation of the alarm to certain times of the day or days of the week (expressed as cron expression with an optional timezone).

An example of creating a combination alarm, based on the combined state of two underlying alarms:

$ ceilometer alarm-combination-create --name meta \
  --alarm_ids ALARM_ID1 \
  --alarm_ids ALARM_ID2 \
  --operator or \
  --alarm-action 'http://example.org/notify'

This creates an alarm that will fire when either one of two underlying alarms transition into the alarm state. The notification in this case is a webhook call. Any number of underlying alarms can be combined in this way, using either and or or.

10.5.4.2 Alarm retrieval

You can display all your alarms via (some attributes are omitted for brevity):

$ ceilometer alarm-list
+----------+--------+-------------------+---------------------------------+
| Alarm ID | Name   | State             | Alarm condition                 |
+----------+--------+-------------------+---------------------------------+
| ALARM_ID | cpu_hi | insufficient data | cpu_util > 70.0 during 3 x 600s |
+----------+--------+-------------------+---------------------------------+

In this case, the state is reported as insufficient data which could indicate that:

  • meters have not yet been gathered about this instance over the evaluation window into the recent past (for example a brand-new instance)

  • or, that the identified instance is not visible to the user/project owning the alarm

  • or, simply that an alarm evaluation cycle hasn't kicked off since the alarm was created (by default, alarms are evaluated once per minute).

Note
Note

The visibility of alarms depends on the role and project associated with the user issuing the query:

  • admin users see all alarms, regardless of the owner

  • non-admin users see only the alarms associated with their project (as per the normal project segregation in OpenStack)

10.5.4.3 Alarm update

Once the state of the alarm has settled down, we might decide that we set that bar too low with 70%, in which case the threshold (or most any other alarm attribute) can be updated thusly:

$ ceilometer alarm-update --threshold 75 ALARM_ID

The change will take effect from the next evaluation cycle, which by default occurs every minute.

Most alarm attributes can be changed in this way, but there is also a convenient short-cut for getting and setting the alarm state:

$ ceilometer alarm-state-get ALARM_ID
$ ceilometer alarm-state-set --state ok -a ALARM_ID

Over time the state of the alarm may change often, especially if the threshold is chosen to be close to the trending value of the statistic. You can follow the history of an alarm over its lifecycle via the audit API:

$ ceilometer alarm-history ALARM_ID
+------------------+-----------+---------------------------------------+
| Type             | Timestamp | Detail                                |
+------------------+-----------+---------------------------------------+
| creation         | time0     | name: cpu_hi                          |
|                  |           | description: instance running hot     |
|                  |           | type: threshold                       |
|                  |           | rule: cpu_util > 70.0 during 3 x 600s |
| state transition | time1     | state: ok                             |
| rule change      | time2     | rule: cpu_util > 75.0 during 3 x 600s |
+------------------+-----------+---------------------------------------+

10.5.4.4 Alarm deletion

An alarm that is no longer required can be disabled so that it is no longer actively evaluated:

$ ceilometer alarm-update --enabled False -a ALARM_ID

or even deleted permanently (an irreversible step):

$ ceilometer alarm-delete ALARM_ID
Note
Note

By default, alarm history is retained for deleted alarms.

10.6 Measurements

The Telemetry service collects meters within an OpenStack deployment. This section provides a brief summary about meters format and origin and also contains the list of available meters.

Telemetry collects meters by polling the infrastructure elements and also by consuming the notifications emitted by other OpenStack services. For more information about the polling mechanism and notifications see Section 10.2, “Data collection”. There are several meters which are collected by polling and by consuming. The origin for each meter is listed in the tables below.

Note
Note

You may need to configure Telemetry or other OpenStack services in order to be able to collect all the samples you need. For further information about configuration requirements see the Telemetry chapter in the Installation Tutorials and Guides. Also check the Telemetry manual installation description.

Telemetry uses the following meter types:

Type

Description

Cumulative

Increasing over time (instance hours)

Delta

Changing over time (bandwidth)

Gauge

Discrete items (floating IPs, image uploads) and fluctuating values (disk I/O)

Telemetry provides the possibility to store metadata for samples. This metadata can be extended for OpenStack Compute and OpenStack Object Storage.

In order to add additional metadata information to OpenStack Compute you have two options to choose from. The first one is to specify them when you boot up a new instance. The additional information will be stored with the sample in the form of resource_metadata.user_metadata.*. The new field should be defined by using the prefix metering.. The modified boot command look like the following:

$ openstack server create --property metering.custom_metadata=a_value my_vm

The other option is to set the reserved_metadata_keys to the list of metadata keys that you would like to be included in resource_metadata of the instance related samples that are collected for OpenStack Compute. This option is included in the DEFAULT section of the ceilometer.conf configuration file.

You might also specify headers whose values will be stored along with the sample data of OpenStack Object Storage. The additional information is also stored under resource_metadata. The format of the new field is resource_metadata.http_header_$name, where $name is the name of the header with - replaced by _.

For specifying the new header, you need to set metadata_headers option under the [filter:ceilometer] section in proxy-server.conf under the swift folder. You can use this additional data for instance to distinguish external and internal users.

Measurements are grouped by services which are polled by Telemetry or emit notifications that this service consumes.

Note
Note

The Telemetry service supports storing notifications as events. This functionality was added later, therefore the list of meters still contains existence type and other event related items. The proper way of using Telemetry is to configure it to use the event store and turn off the collection of the event related meters. For further information about events see Events section in the Telemetry documentation. For further information about how to turn on and off meters see Section 10.3.1, “Pipeline configuration”. Please also note that currently no migration is available to move the already existing event type samples to the event store.

10.6.1 OpenStack Compute

The following meters are collected for OpenStack Compute:

Name

Type

Unit

Resource

Origin

Support

Note

Meters added in the Icehouse release or earlier

instance

Gauge

instance

instance ID

Notification, Pollster

Libvirt, vSphere

Existence of instance

instance:<type>

Gauge

instance

instance ID

Notification, Pollster

Libvirt, vSphere

Existence of instance <type> (OpenStack types)

memory

Gauge

MB

instance ID

Notification

Libvirt

Volume of RAM allocated to the instance

memory.usage

Gauge

MB

instance ID

Pollster

vSphere

Volume of RAM used by the instance from the amount of its allocated memory

cpu

Cumulative

ns

instance ID

Pollster

Libvirt

CPU time used

cpu_util

Gauge

%

instance ID

Pollster

vSphere

Average CPU utilization

vcpus

Gauge

vcpu

instance ID

Notification

Libvirt

Number of virtual CPUs allocated to the instance

disk.read.requests

Cumulative

request

instance ID

Pollster

Libvirt

Number of read requests

disk.read.requests.rate

Gauge

request/s

instance ID

Pollster

Libvirt, vSphere

Average rate of read requests

disk.write.requests

Cumulative

request

instance ID

Pollster

Libvirt

Number of write requests

disk.write.requests.rate

Gauge

request/s

instance ID

Pollster

Libvirt, vSphere

Average rate of write requests

disk.read.bytes

Cumulative

B

instance ID

Pollster

Libvirt

Volume of reads

disk.read.bytes.rate

Gauge

B/s

instance ID

Pollster

Libvirt, vSphere

Average rate of reads

disk.write.bytes

Cumulative

B

instance ID

Pollster

Libvirt

Volume of writes

disk.write.bytes.rate

Gauge

B/s

instance ID

Pollster

Libvirt, vSphere

Average rate of writes

disk.root.size

Gauge

GB

instance ID

Notification

Libvirt

Size of root disk

disk.ephemeral.size

Gauge

GB

instance ID

Notification

Libvirt

Size of ephemeral disk

network.incoming.bytes

Cumulative

B

interface ID

Pollster

Libvirt

Number of incoming bytes

network.incoming.bytes.rate

Gauge

B/s

interface ID

Pollster

Libvirt, vSphere

Average rate of incoming bytes

network.outgoing.bytes

Cumulative

B

interface ID

Pollster

Libvirt

Number of outgoing bytes

network.outgoing.bytes.rate

Gauge

B/s

interface ID

Pollster

Libvirt, vSphere

Average rate of outgoing bytes

network.incoming.packets

Cumulative

packet

interface ID

Pollster

Libvirt

Number of incoming packets

network.incoming.packets.rate

Gauge

packet/s

interface ID

Pollster

Libvirt, vSphere

Average rate of incoming packets

network.outgoing.packets

Cumulative

packet

interface ID

Pollster

Libvirt

Number of outgoing packets

network.outgoing.packets.rate

Gauge

packet/s

interface ID

Pollster

Libvirt, vSphere

Average rate of outgoing packets

Meters added or hypervisor support changed in the Juno release

instance

Gauge

instance

instance ID

Notification, Pollster

Libvirt, vSphere, XenAPI

Existence of instance

instance:<type>

Gauge

instance

instance ID

Notification, Pollster

Libvirt, vSphere, XenAPI

Existence of instance <type> (OpenStack types)

memory.usage

Gauge

MB

instance ID

Pollster

vSphere, XenAPI

Volume of RAM used by the instance from the amount of its allocated memory

cpu_util

Gauge

%

instance ID

Pollster

vSphere, XenAPI

Average CPU utilization

disk.read.bytes.rate

Gauge

B/s

instance ID

Pollster

Libvirt, vSphere, XenAPI

Average rate of reads

disk.write.bytes.rate

Gauge

B/s

instance ID

Pollster

Libvirt, vSphere, XenAPI

Average rate of writes

disk.device.read.requests

Cumulative

request

disk ID

Pollster

Libvirt

Number of read requests

disk.device.read.requests.rate

Gauge

request/s

disk ID

Pollster

Libvirt, vSphere

Average rate of read requests

disk.device.write.requests

Cumulative

request

disk ID

Pollster

Libvirt

Number of write requests

disk.device.write.requests.rate

Gauge

request/s

disk ID

Pollster

Libvirt, vSphere

Average rate of write requests

disk.device.read.bytes

Cumulative

B

disk ID

Pollster

Libvirt

Volume of reads

disk.device.read.bytes .rate

Gauge

B/s

disk ID

Pollster

Libvirt, vSphere

Average rate of reads

disk.device.write.bytes

Cumulative

B

disk ID

Pollster

Libvirt

Volume of writes

disk.device.write.bytes .rate

Gauge

B/s

disk ID

Pollster

Libvirt, vSphere

Average rate of writes

network.incoming.bytes.rate

Gauge

B/s

interface ID

Pollster

Libvirt, vSphere, XenAPI

Average rate of incoming bytes

network.outgoing.bytes.rate

Gauge

B/s

interface ID

Pollster

Libvirt, vSphere, XenAPI

Average rate of outgoing bytes

network.incoming.packets.rate

Gauge

packet/s

interface ID

Pollster

Libvirt, vSphere, XenAPI

Average rate of incoming packets

network.outgoing.packets.rate

Gauge

packet/s

interface ID

Pollster

Libvirt, vSphere, XenAPI

Average rate of outgoing packets

Meters added or hypervisor support changed in the Kilo release

memory.usage

Gauge

MB

instance ID

Pollster

Libvirt, vSphere, XenAPI

Volume of RAM used by the instance from the amount of its allocated memory

memory.resident

Gauge

MB

instance ID

Pollster

Libvirt

Volume of RAM used by the instance on the physical machine

disk.capacity

Gauge

B

instance ID

Pollster

Libvirt

The amount of disk that the instance can see

disk.allocation

Gauge

B

instance ID

Pollster

Libvirt

The amount of disk occupied by the instance on the host machine

disk.usage

Gauge

B

instance ID

Pollster

Libvirt

The physical size in bytes of the image container on the host

disk.device.capacity

Gauge

B

disk ID

Pollster

Libvirt

The amount of disk per device that the instance can see

disk.device.allocation

Gauge

B

disk ID

Pollster

Libvirt

The amount of disk per device occupied by the instance on the host machine

disk.device.usage

Gauge

B

disk ID

Pollster

Libvirt

The physical size in bytes of the image container on the host per device

Meters deprecated in the Kilo release

instance:<type>

Gauge

instance

instance ID

Notification, Pollster

Libvirt, vSphere, XenAPI

Existence of instance <type> (OpenStack types)

Meters added in the Liberty release

cpu.delta

Delta

ns

instance ID

Pollster

Libvirt

CPU time used since previous datapoint

Meters added in the Newton release

cpu_l3_cache

Gauge

B

instance ID

Pollster

Libvirt

L3 cache used by the instance

memory.bandwidth.total

Gauge

B/s

instance ID

Pollster

Libvirt

Total system bandwidth from one level of cache

memory.bandwidth.local

Gauge

B/s

instance ID

Pollster

Libvirt

Bandwidth of memory traffic for a memory controller

perf.cpu.cycles

Gauge

cycle

instance ID

Pollster

Libvirt

the number of cpu cycles one instruction needs

perf.instructions

Gauge

instruction

instance ID

Pollster

Libvirt

the count of instructions

perf.cache.references

Gauge

count

instance ID

Pollster

Libvirt

the count of cache hits

perf.cache.misses

Gauge

count

instance ID

Pollster

Libvirt

the count of cache misses

Note
Note
  • The instance:<type> meter can be replaced by using extra parameters in both the samples and statistics queries. Sample queries look like:

statistics:

  ceilometer statistics -m instance -g resource_metadata.instance_type

samples:

  ceilometer sample-list -m instance -q metadata.instance_type=<value>

The Telemetry service supports to create new meters by using transformers. For more details about transformers see Section 10.3.1.1, “Transformers”. Among the meters gathered from libvirt there are a few ones which are generated from other meters. The list of meters that are created by using the rate_of_change transformer from the above table is the following:

  • cpu_util

  • disk.read.requests.rate

  • disk.write.requests.rate

  • disk.read.bytes.rate

  • disk.write.bytes.rate

  • disk.device.read.requests.rate

  • disk.device.write.requests.rate

  • disk.device.read.bytes.rate

  • disk.device.write.bytes.rate

  • network.incoming.bytes.rate

  • network.outgoing.bytes.rate

  • network.incoming.packets.rate

  • network.outgoing.packets.rate

Note
Note

To enable the libvirt memory.usage support, you need to install libvirt version 1.1.1+, QEMU version 1.5+, and you also need to prepare suitable balloon driver in the image. It is applicable particularly for Windows guests, most modern Linux distributions already have it built in. Telemetry is not able to fetch the memory.usage samples without the image balloon driver.

OpenStack Compute is capable of collecting CPU related meters from the compute host machines. In order to use that you need to set the compute_monitors option to ComputeDriverCPUMonitor in the nova.conf configuration file. For further information see the Compute configuration section in the Compute chapter of the OpenStack Configuration Reference.

The following host machine related meters are collected for OpenStack Compute:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Icehouse release or earlier

compute.node.cpu.frequency

Gauge

MHz

host ID

Notification

CPU frequency

compute.node.cpu.kernel.time

Cumulative

ns

host ID

Notification

CPU kernel time

compute.node.cpu.idle.time

Cumulative

ns

host ID

Notification

CPU idle time

compute.node.cpu.user.time

Cumulative

ns

host ID

Notification

CPU user mode time

compute.node.cpu.iowait.time

Cumulative

ns

host ID

Notification

CPU I/O wait time

compute.node.cpu.kernel.percent

Gauge

%

host ID

Notification

CPU kernel percentage

compute.node.cpu.idle.percent

Gauge

%

host ID

Notification

CPU idle percentage

compute.node.cpu.user.percent

Gauge

%

host ID

Notification

CPU user mode percentage

compute.node.cpu.iowait.percent

Gauge

%

host ID

Notification

CPU I/O wait percentage

compute.node.cpu.percent

Gauge

%

host ID

Notification

CPU utilization

10.6.2 Bare metal service

Telemetry captures notifications that are emitted by the Bare metal service. The source of the notifications are IPMI sensors that collect data from the host machine.

Note
Note

The sensor data is not available in the Bare metal service by default. To enable the meters and configure this module to emit notifications about the measured values see the Installation Guide for the Bare metal service.

The following meters are recorded for the Bare metal service:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Juno release

hardware.ipmi.fan

Gauge

RPM

fan sensor

Notification

Fan rounds per minute (RPM)

hardware.ipmi.temperature

Gauge

C

temperature sensor

Notification

Temperature reading from sensor

hardware.ipmi.current

Gauge

W

current sensor

Notification

Current reading from sensor

hardware.ipmi.voltage

Gauge

V

voltage sensor

Notification

Voltage reading from sensor

10.6.3 IPMI based meters

Another way of gathering IPMI based data is to use IPMI sensors independently from the Bare metal service's components. Same meters as Section 10.6.2, “Bare metal service” could be fetched except that origin is Pollster instead of Notification.

You need to deploy the ceilometer-agent-ipmi on each IPMI-capable node in order to poll local sensor data. For further information about the IPMI agent see Section 10.2.2.2, “IPMI agent”.

Warning
Warning

To avoid duplication of metering data and unnecessary load on the IPMI interface, do not deploy the IPMI agent on nodes that are managed by the Bare metal service and keep the conductor.send_sensor_data option set to False in the ironic.conf configuration file.

Besides generic IPMI sensor data, the following Intel Node Manager meters are recorded from capable platform:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Juno release

hardware.ipmi.node.power

Gauge

W

host ID

Pollster

Current power of the system

hardware.ipmi.node.temperature

Gauge

C

host ID

Pollster

Current temperature of the system

Meters added in the Kilo release

hardware.ipmi.node.inlet_temperature

Gauge

C

host ID

Pollster

Inlet temperature of the system

hardware.ipmi.node.outlet_temperature

Gauge

C

host ID

Pollster

Outlet temperature of the system

hardware.ipmi.node.airflow

Gauge

CFM

host ID

Pollster

Volumetric airflow of the system, expressed as 1/10th of CFM

hardware.ipmi.node.cups

Gauge

CUPS

host ID

Pollster

CUPS(Compute Usage Per Second) index data of the system

hardware.ipmi.node.cpu_util

Gauge

%

host ID

Pollster

CPU CUPS utilization of the system

hardware.ipmi.node.mem_util

Gauge

%

host ID

Pollster

Memory CUPS utilization of the system

hardware.ipmi.node.io_util

Gauge

%

host ID

Pollster

IO CUPS utilization of the system

Meters renamed in the Kilo release

Original Name

New Name

hardware.ipmi.node.temperature

hardware.ipmi.node.inlet_temperature

hardware.ipmi.node.inlet_temperature

hardware.ipmi.node.temperature

10.6.4 SNMP based meters

Telemetry supports gathering SNMP based generic host meters. In order to be able to collect this data you need to run snmpd on each target host.

The following meters are available about the host machines by using SNMP:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Kilo release

hardware.cpu.load.1min

Gauge

process

host ID

Pollster

CPU load in the past 1 minute

hardware.cpu.load.5min

Gauge

process

host ID

Pollster

CPU load in the past 5 minutes

hardware.cpu.load.15min

Gauge

process

host ID

Pollster

CPU load in the past 15 minutes

hardware.disk.size.total

Gauge

KB

disk ID

Pollster

Total disk size

hardware.disk.size.used

Gauge

KB

disk ID

Pollster

Used disk size

hardware.memory.total

Gauge

KB

host ID

Pollster

Total physical memory size

hardware.memory.used

Gauge

KB

host ID

Pollster

Used physical memory size

hardware.memory.buffer

Gauge

KB

host ID

Pollster

Physical memory buffer size

hardware.memory.cached

Gauge

KB

host ID

Pollster

Cached physical memory size

hardware.memory.swap.total

Gauge

KB

host ID

Pollster

Total swap space size

hardware.memory.swap.avail

Gauge

KB

host ID

Pollster

Available swap space size

hardware.network.incoming.bytes

Cumulative

B

interface ID

Pollster

Bytes received by network interface

hardware.network.outgoing.bytes

Cumulative

B

interface ID

Pollster

Bytes sent by network interface

hardware.network.outgoing.errors

Cumulative

packet

interface ID

Pollster

Sending error of network interface

hardware.network.ip.incoming.datagrams

Cumulative

datagrams

host ID

Pollster

Number of received datagrams

hardware.network.ip.outgoing.datagrams

Cumulative

datagrams

host ID

Pollster

Number of sent datagrams

hardware.system_stats.io.incoming.blocks

Cumulative

blocks

host ID

Pollster

Aggregated number of blocks received to block device

hardware.system_stats.io.outgoing.blocks

Cumulative

blocks

host ID

Pollster

Aggregated number of blocks sent to block device

hardware.system_stats.cpu.idle

Gauge

%

host ID

Pollster

CPU idle percentage

Meters added in the Mitaka release

hardware.cpu.util

Gauge

%

host ID

Pollster

cpu usage percentage

10.6.5 OpenStack Image service

The following meters are collected for OpenStack Image service:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Icehouse release or earlier

image

Gauge

image

image ID

Notification, Pollster

Existence of the image

image.size

Gauge

image

image ID

Notification, Pollster

Size of the uploaded image

image.update

Delta

image

image ID

Notification

Number of updates on the image

image.upload

Delta

image

image ID

Notification

Number of uploads on the image

image.delete

Delta

image

image ID

Notification

Number of deletes on the image

image.download

Delta

B

image ID

Notification

Image is downloaded

image.serve

Delta

B

image ID

Notification

Image is served out

10.6.6 OpenStack Block Storage

The following meters are collected for OpenStack Block Storage:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Icehouse release or earlier

volume

Gauge

volume

volume ID

Notification

Existence of the volume

volume.size

Gauge

GB

volume ID

Notification

Size of the volume

Meters added in the Juno release

snapshot

Gauge

snapshot

snapshot ID

Notification

Existence of the snapshot

snapshot.size

Gauge

GB

snapshot ID

Notification

Size of the snapshot

Meters added in the Kilo release

volume.create.(start|end)

Delta

volume

volume ID

Notification

Creation of the volume

volume.delete.(start|end)

Delta

volume

volume ID

Notification

Deletion of the volume

volume.update.(start|end)

Delta

volume

volume ID

Notification

Update the name or description of the volume

volume.resize.(start|end)

Delta

volume

volume ID

Notification

Update the size of the volume

volume.attach.(start|end)

Delta

volume

volume ID

Notification

Attaching the volume to an instance

volume.detach.(start|end)

Delta

volume

volume ID

Notification

Detaching the volume from an instance

snapshot.create.(start|end)

Delta

snapshot

snapshot ID

Notification

Creation of the snapshot

snapshot.delete.(start|end)

Delta

snapshot

snapshot ID

Notification

Deletion of the snapshot

volume.backup.create.(start|end)

Delta

volume

backup ID

Notification

Creation of the volume backup

volume.backup.delete.(start|end)

Delta

volume

backup ID

Notification

Deletion of the volume backup

volume.backup.restore.(start|end)

Delta

volume

backup ID

Notification

Restoration of the volume backup

10.6.7 OpenStack Object Storage

The following meters are collected for OpenStack Object Storage:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Icehouse release or earlier

storage.objects

Gauge

object

storage ID

Pollster

Number of objects

storage.objects.size

Gauge

B

storage ID

Pollster

Total size of stored objects

storage.objects.containers

Gauge

container

storage ID

Pollster

Number of containers

storage.objects.incoming.bytes

Delta

B

storage ID

Notification

Number of incoming bytes

storage.objects.outgoing.bytes

Delta

B

storage ID

Notification

Number of outgoing bytes

storage.api.request

Delta

request

storage ID

Notification

Number of API requests against OpenStack Object Storage

storage.containers.objects

Gauge

object

storage ID/container

Pollster

Number of objects in container

storage.containers.objects.size

Gauge

B

storage ID/container

Pollster

Total size of stored objects in container

meters deprecated in the Kilo release

storage.objects.in| Delta | B | storage ID | Notific| Number of incomcoming.bytes | | | | ation | ing bytes

storage.objects.outgoing.bytes

Delta

B

storage ID

Notification

Number of outgoing bytes

storage.api.request

Delta

request

storage ID

Notification

Number of API requests against OpenStack Object Storage

10.6.8 Ceph Object Storage

In order to gather meters from Ceph, you have to install and configure the Ceph Object Gateway (radosgw) as it is described in the Installation Manual. You have to enable usage logging in order to get the related meters from Ceph. You will also need an admin user with users, buckets, metadata and usagecaps configured.

In order to access Ceph from Telemetry, you need to specify a service group for radosgw in the ceilometer.conf configuration file along with access_key and secret_key of the admin user mentioned above.

The following meters are collected for Ceph Object Storage:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Kilo release

radosgw.objects

Gauge

object

storage ID

Pollster

Number of objects

radosgw.objects.size

Gauge

B

storage ID

Pollster

Total size of stored objects

radosgw.objects.containers

Gauge

container

storage ID

Pollster

Number of containers

radosgw.api.request

Gauge

request

storage ID

Pollster

Number of API requests against Ceph Object Gateway (radosgw)

radosgw.containers.objects

Gauge

object

storage ID/container

Pollster

Number of objects in container

radosgw.containers.objects.size

Gauge

B

storage ID/container

Pollster

Total size of stored objects in container

Note
Note

The usage related information may not be updated right after an upload or download, because the Ceph Object Gateway needs time to update the usage properties. For instance, the default configuration needs approximately 30 minutes to generate the usage logs.

10.6.9 OpenStack Identity

The following meters are collected for OpenStack Identity:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Juno release

identity.authenticate.success

Delta

user

user ID

Notification

User successfully authenticated

identity.authenticate.pending

Delta

user

user ID

Notification

User pending authentication

identity.authenticate.failure

Delta

user

user ID

Notification

User failed to authenticate

identity.user.created

Delta

user

user ID

Notification

User is created

identity.user.deleted

Delta

user

user ID

Notification

User is deleted

identity.user.updated

Delta

user

user ID

Notification

User is updated

identity.group.created

Delta

group

group ID

Notification

Group is created

identity.group.deleted

Delta

group

group ID

Notification

Group is deleted

identity.group.updated

Delta

group

group ID

Notification

Group is updated

identity.role.created

Delta

role

role ID

Notification

Role is created

identity.role.deleted

Delta

role

role ID

Notification

Role is deleted

identity.role.updated

Delta

role

role ID

Notification

Role is updated

identity.project.created

Delta

project

project ID

Notification

Project is created

identity.project.deleted

Delta

project

project ID

Notification

Project is deleted

identity.project.updated

Delta

project

project ID

Notification

Project is updated

identity.trust.created

Delta

trust

trust ID

Notification

Trust is created

identity.trust.deleted

Delta

trust

trust ID

Notification

Trust is deleted

Meters added in the Kilo release

identity.role_assignment.created

Delta

role_assignment

role ID

Notification

Role is added to an actor on a target

identity.role_assignment.deleted

Delta

role_assignment

role ID

Notification

Role is removed from an actor on a target

All meters thoroughly deprecated in the liberty release

10.6.10 OpenStack Networking

The following meters are collected for OpenStack Networking:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Icehouse release or earlier

network

Gauge

network

network ID

Notification

Existence of network

network.create

Delta

network

network ID

Notification

Creation requests for this network

network.update

Delta

network

network ID

Notification

Update requests for this network

subnet

Gauge

subnet

subnet ID

Notification

Existence of subnet

subnet.create

Delta

subnet

subnet ID

Notification

Creation requests for this subnet

subnet.update

Delta

subnet

subnet ID

Notification

Update requests for this subnet

port

Gauge

port

port ID

Notification

Existence of port

port.create

Delta

port

port ID

Notification

Creation requests for this port

port.update

Delta

port

port ID

Notification

Update requests for this port

router

Gauge

router

router ID

Notification

Existence of router

router.create

Delta

router

router ID

Notification

Creation requests for this router

router.update

Delta

router

router ID

Notification

Update requests for this router

ip.floating

Gauge

ip

ip ID

Notification, Pollster

Existence of IP

ip.floating.create

Delta

ip

ip ID

Notification

Creation requests for this IP

ip.floating.update

Delta

ip

ip ID

Notification

Update requests for this IP

bandwidth

Delta

B

label ID

Notification

Bytes through this l3 metering label

10.6.11 SDN controllers

The following meters are collected for SDN:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Icehouse release or earlier

switch

Gauge

switch

switch ID

Pollster

Existence of switch

switch.port

Gauge

port

switch ID

Pollster

Existence of port

switch.port.receive.packets

Cumulative

packet

switch ID

Pollster

Packets received on port

switch.port.transmit.packets

Cumulative

packet

switch ID

Pollster

Packets transmitted on port

switch.port.receive.bytes

Cumulative

B

switch ID

Pollster

Bytes received on port

switch.port.transmit.bytes

Cumulative

B

switch ID

Pollster

Bytes transmitted on port

switch.port.receive.drops

Cumulative

packet

switch ID

Pollster

Drops received on port

switch.port.transmit.drops

Cumulative

packet

switch ID

Pollster

Drops transmitted on port

switch.port.receive.errors

Cumulative

packet

switch ID

Pollster

Errors received on port

switch.port.transmit.errors

Cumulative

packet

switch ID

Pollster

Errors transmitted on port

switch.port.receive.frame_error

Cumulative

packet

switch ID

Pollster

Frame alignment errors received on port

switch.port.receive.overrun_error

Cumulative

packet

switch ID

Pollster

Overrun errors received on port

switch.port.receive.crc_error

Cumulative

packet

switch ID

Pollster

CRC errors received on port

switch.port.collision.count

Cumulative

count

switch ID

Pollster

Collisions on port

switch.table

Gauge

table

switch ID

Pollster

Duration of table

switch.table.active.entries

Gauge

entry

switch ID

Pollster

Active entries in table

switch.table.lookup.packets

Gauge

packet

switch ID

Pollster

Lookup packets for table

switch.table.matched.packets

Gauge

packet

switch ID

Pollster

Packets matches for table

switch.flow

Gauge

flow

switch ID

Pollster

Duration of flow

switch.flow.duration.seconds

Gauge

s

switch ID

Pollster

Duration of flow in seconds

switch.flow.duration.nanoseconds

Gauge

ns

switch ID

Pollster

Duration of flow in nanoseconds

switch.flow.packets

Cumulative

packet

switch ID

Pollster

Packets received

switch.flow.bytes

Cumulative

B

switch ID

Pollster

Bytes received

These meters are available for OpenFlow based switches. In order to enable these meters, each driver needs to be properly configured.

10.6.12 Load-Balancer-as-a-Service (LBaaS v1)

The following meters are collected for LBaaS v1:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Juno release

network.services.lb.pool

Gauge

pool

pool ID

Notification, Pollster

Existence of a LB pool

network.services.lb.vip

Gauge

vip

vip ID

Notification, Pollster

Existence of a LB VIP

network.services.lb.member

Gauge

member

member ID

Notification, Pollster

Existence of a LB member

network.services.lb.health_monitor

Gauge

health_monitor

monitor ID

Notification, Pollster

Existence of a LB health probe

network.services.lb.total.connections

Cumulative

connection

pool ID

Pollster

Total connections on a LB

network.services.lb.active.connections

Gauge

connection

pool ID

Pollster

Active connections on a LB

network.services.lb.incoming.bytes

Gauge

B

pool ID

Pollster

Number of incoming Bytes

network.services.lb.outgoing.bytes

Gauge

B

pool ID

Pollster

Number of outgoing Bytes

Meters added in the Kilo release

network.services.lb.pool.create

Delta

pool

pool ID

Notification

LB pool was created

network.services.lb.pool.update

Delta

pool

pool ID

Notification

LB pool was updated

network.services.lb.vip.create

Delta

vip

vip ID

Notification

LB VIP was created

network.services.lb.vip.update

Delta

vip

vip ID

Notification

LB VIP was updated

network.services.lb.member.create

Delta

member

member ID

Notification

LB member was created

network.services.lb.member.update

Delta

member

member ID

Notification

LB member was updated

network.services.lb.health_monitor.create

Delta

health_monitor

monitor ID

Notification

LB health probe was created

network.services.lb.health_monitor.update

Delta

health_monitor

monitor ID

Notification

LB health probe was updated

10.6.13 Load-Balancer-as-a-Service (LBaaS v2)

The following meters are collected for LBaaS v2. They are added in Mitaka release:

Name

Type

Unit

Resource

Origin

Note

network.services.lb.pool

Gauge

pool

pool ID

Notification, Pollster

Existence of a LB pool

network.services.lb.listener

Gauge

listener

listener ID

Notification, Pollster

Existence of a LB listener

network.services.lb.member

Gauge

member

member ID

Notification, Pollster

Existence of a LB member

network.services.lb.health_monitor

Gauge

health_monitor

monitor ID

Notification, Pollster

Existence of a LB health probe

network.services.lb.loadbalancer

Gauge

loadbalancer

loadbalancer ID

Notification, Pollster

Existence of a LB loadbalancer

network.services.lb.total.connections

Cumulative

connection

pool ID

Pollster

Total connections on a LB

network.services.lb.active.connections

Gauge

connection

pool ID

Pollster

Active connections on a LB

network.services.lb.incoming.bytes

Gauge

B

pool ID

Pollster

Number of incoming Bytes

network.services.lb.outgoing.bytes

Gauge

B

pool ID

Pollster

Number of outgoing Bytes

network.services.lb.pool.create

Delta

pool

pool ID

Notification

LB pool was created

network.services.lb.pool.update

Delta

pool

pool ID

Notification

LB pool was updated

network.services.lb.listener.create

Delta

listener

listener ID

Notification

LB listener was created

network.services.lb.listener.update

Delta

listener

listener ID

Notification

LB listener was updated

network.services.lb.member.create

Delta

member

member ID

Notification

LB member was created

network.services.lb.member.update

Delta

member

member ID

Notification

LB member was updated

network.services.lb.healthmonitor.create

Delta

health_monitor

monitor ID

Notification

LB health probe was created

network.services.lb.healthmonitor.update

Delta

health_monitor

monitor ID

Notification

LB health probe was updated

network.services.lb.loadbalancer.create

Delta

loadbalancer

loadbalancer ID

Notification

LB loadbalancer was created

network.services.lb.loadbalancer.update

Delta

loadbalancer

loadbalancer ID

Notification

LB loadbalancer was updated

Note
Note

The above meters are experimental and may generate a large load against the Neutron APIs. The future enhancement will be implemented when Neutron supports the new APIs.

10.6.14 VPN-as-a-Service (VPNaaS)

The following meters are collected for VPNaaS:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Juno release

network.services.vpn

Gauge

vpnservice

vpn ID

Notification, Pollster

Existence of a VPN

network.services.vpn.connections

Gauge

ipsec_site_connection

connection ID

Notification, Pollster

Existence of an IPSec connection

Meters added in the Kilo release

network.services.vpn.create

Delta

vpnservice

vpn ID

Notification

VPN was created

network.services.vpn.update

Delta

vpnservice

vpn ID

Notification

VPN was updated

network.services.vpn.connections.create

Delta

ipsec_site_connection

connection ID

Notification

IPSec connection was created

network.services.vpn.connections.update

Delta

ipsec_site_connection

connection ID

Notification

IPSec connection was updated

network.services.vpn.ipsecpolicy

Gauge

ipsecpolicy

ipsecpolicy ID

Notification, Pollster

Existence of an IPSec policy

network.services.vpn.ipsecpolicy.create

Delta

ipsecpolicy

ipsecpolicy ID

Notification

IPSec policy was created

network.services.vpn.ipsecpolicy.update

Delta

ipsecpolicy

ipsecpolicy ID

Notification

IPSec policy was updated

network.services.vpn.ikepolicy

Gauge

ikepolicy

ikepolicy ID

Notification, Pollster

Existence of an Ike policy

network.services.vpn.ikepolicy.create

Delta

ikepolicy

ikepolicy ID

Notification

Ike policy was created

network.services.vpn.ikepolicy.update

Delta

ikepolicy

ikepolicy ID

Notification

Ike policy was updated

10.6.15 Firewall-as-a-Service (FWaaS)

The following meters are collected for FWaaS:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Juno release

network.services.firewall

Gauge

firewall

firewall ID

Notification, Pollster

Existence of a firewall

network.services.firewall.policy

Gauge

firewall_policy

firewall ID

Notification, Pollster

Existence of a firewall policy

Meters added in the Kilo release

network.services.firewall.create

Delta

firewall

firewall ID

Notification

Firewall was created

network.services.firewall.update

Delta

firewall

firewall ID

Notification

Firewall was updated

network.services.firewall.policy.create

Delta

firewall_policy

policy ID

Notification

Firewall policy was created

network.services.firewall.policy.update

Delta

firewall_policy

policy ID

Notification

Firewall policy was updated

network.services.firewall.rule

Gauge

firewall_rule

rule ID

Notification

Existence of a firewall rule

network.services.firewall.rule.create

Delta

firewall_rule

rule ID

Notification

Firewall rule was created

network.services.firewall.rule.update

Delta

firewall_rule

rule ID

Notification

Firewall rule was updated

10.6.16 Orchestration service

The following meters are collected for the Orchestration service:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Icehouse release or earlier

stack.create

Delta

stack

stack ID

Notification

Stack was successfully created

stack.update

Delta

stack

stack ID

Notification

Stack was successfully updated

stack.delete

Delta

stack

stack ID

Notification

Stack was successfully deleted

stack.resume

Delta

stack

stack ID

Notification

Stack was successfully resumed

stack.suspend

Delta

stack

stack ID

Notification

Stack was successfully suspended

All meters thoroughly deprecated in the Liberty release

10.6.17 Data processing service for OpenStack

The following meters are collected for the Data processing service for OpenStack:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Juno release

cluster.create

Delta

cluster

cluster ID

Notification

Cluster was successfully created

cluster.update

Delta

cluster

cluster ID

Notification

Cluster was successfully updated

cluster.delete

Delta

cluster

cluster ID

Notification

Cluster was successfully deleted

All meters thoroughly deprecated in the Liberty release

10.6.18 Key Value Store module

The following meters are collected for the Key Value Store module:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Kilo release

magnetodb.table.create

Gauge

table

table ID

Notification

Table was successfully created

magnetodb.table.delete

Gauge

table

table ID

Notification

Table was successfully deleted

magnetodb.table.index.count

Gauge

index

table ID

Notification

Number of indices created in a table

Note
Note

The the Key Value Store meters are not supported in the Newton release and later.

10.6.19 Energy

The following energy related meters are available:

Name

Type

Unit

Resource

Origin

Note

Meters added in the Icehouse release or earlier

energy

Cumulative

kWh

probe ID

Pollster

Amount of energy

power

Gauge

W

probe ID

Pollster

Power consumption

10.7 Events

In addition to meters, the Telemetry service collects events triggered within an OpenStack environment. This section provides a brief summary of the events format in the Telemetry service.

While a sample represents a single, numeric datapoint within a time-series, an event is a broader concept that represents the state of a resource at a point in time. The state may be described using various data types including non-numeric data such as an instance's flavor. In general, events represent any action made in the OpenStack system.

10.7.1 Event configuration

To enable the creation and storage of events in the Telemetry service store_events option needs to be set to True. For further configuration options, see the event section in the OpenStack Configuration Reference.

Note
Note

It is advisable to set disable_non_metric_meters to True when enabling events in the Telemetry service. The Telemetry service historically represented events as metering data, which may create duplication of data if both events and non-metric meters are enabled.

10.7.2 Event structure

Events captured by the Telemetry service are represented by five key attributes:

event_type

A dotted string defining what event occurred such as "compute.instance.resize.start".

message_id

A UUID for the event.

generated

A timestamp of when the event occurred in the system.

traits

A flat mapping of key-value pairs which describe the event. The event's traits contain most of the details of the event. Traits are typed, and can be strings, integers, floats, or datetimes.

raw

Mainly for auditing purpose, the full event message can be stored (unindexed) for future evaluation.

10.7.3 Event indexing

The general philosophy of notifications in OpenStack is to emit any and all data someone might need, and let the consumer filter out what they are not interested in. In order to make processing simpler and more efficient, the notifications are stored and processed within Ceilometer as events. The notification payload, which can be an arbitrarily complex JSON data structure, is converted to a flat set of key-value pairs. This conversion is specified by a config file.

Note
Note

The event format is meant for efficient processing and querying. Storage of complete notifications for auditing purposes can be enabled by configuring store_raw option.

10.7.3.1 Event conversion

The conversion from notifications to events is driven by a configuration file defined by the definitions_cfg_file in the ceilometer.conf configuration file.

This includes descriptions of how to map fields in the notification body to Traits, and optional plug-ins for doing any programmatic translations (splitting a string, forcing case).

The mapping of notifications to events is defined per event_type, which can be wildcarded. Traits are added to events if the corresponding fields in the notification exist and are non-null.

Note
Note

The default definition file included with the Telemetry service contains a list of known notifications and useful traits. The mappings provided can be modified to include more or less data according to user requirements.

If the definitions file is not present, a warning will be logged, but an empty set of definitions will be assumed. By default, any notifications that do not have a corresponding event definition in the definitions file will be converted to events with a set of minimal traits. This can be changed by setting the option drop_unmatched_notifications in the ceilometer.conf file. If this is set to True, any unmapped notifications will be dropped.

The basic set of traits (all are TEXT type) that will be added to all events if the notification has the relevant data are: service (notification's publisher), tenant_id, and request_id. These do not have to be specified in the event definition, they are automatically added, but their definitions can be overridden for a given event_type.

10.7.3.2 Event definitions format

The event definitions file is in YAML format. It consists of a list of event definitions, which are mappings. Order is significant, the list of definitions is scanned in reverse order to find a definition which matches the notification's event_type. That definition will be used to generate the event. The reverse ordering is done because it is common to want to have a more general wildcarded definition (such as compute.instance.*) with a set of traits common to all of those events, with a few more specific event definitions afterwards that have all of the above traits, plus a few more.

Each event definition is a mapping with two keys:

event_type

This is a list (or a string, which will be taken as a 1 element list) of event_types this definition will handle. These can be wildcarded with unix shell glob syntax. An exclusion listing (starting with a !) will exclude any types listed from matching. If only exclusions are listed, the definition will match anything not matching the exclusions.

traits

This is a mapping, the keys are the trait names, and the values are trait definitions.

Each trait definition is a mapping with the following keys:

fields

A path specification for the field(s) in the notification you wish to extract for this trait. Specifications can be written to match multiple possible fields. By default the value will be the first such field. The paths can be specified with a dot syntax (payload.host). Square bracket syntax (payload[host]) is also supported. In either case, if the key for the field you are looking for contains special characters, like ., it will need to be quoted (with double or single quotes): payload.image_meta.’org.openstack__1__architecture’. The syntax used for the field specification is a variant of JSONPath

type

(Optional) The data type for this trait. Valid options are: text, int, float, and datetime. Defaults to text if not specified.

plugin

(Optional) Used to execute simple programmatic conversions on the value in a notification field.

10.7.3.3 Event delivery to external sinks

You can configure the Telemetry service to deliver the events into external sinks. These sinks are configurable in the /etc/ceilometer/event_pipeline.yaml file.

10.8 Troubleshoot Telemetry

10.8.1 Logging in Telemetry

The Telemetry service has similar log settings as the other OpenStack services. Multiple options are available to change the target of logging, the format of the log entries and the log levels.

The log settings can be changed in ceilometer.conf. The list of configuration options are listed in the logging configuration options table in the Telemetry section in the OpenStack Configuration Reference.

By default stderr is used as standard output for the log messages. It can be changed to either a log file or syslog. The debug and verbose options are also set to false in the default settings, the default log levels of the corresponding modules can be found in the table referred above.

10.8.2 Recommended order of starting services

As it can be seen in Bug 1355809, the wrong ordering of service startup can result in data loss.

When the services are started for the first time or in line with the message queue service restart, it takes time while the ceilometer-collector service establishes the connection and joins or rejoins to the configured exchanges. Therefore, if the ceilometer-agent-compute, ceilometer-agent-central, and the ceilometer-agent-notification services are started before the ceilometer-collector service, the ceilometer-collector service may lose some messages while connecting to the message queue service.

The possibility of this issue to happen is higher, when the polling interval is set to a relatively short period. In order to avoid this situation, the recommended order of service startup is to start or restart the ceilometer-collector service after the message queue. All the other Telemetry services should be started or restarted after and the ceilometer-agent-compute should be the last in the sequence, as this component emits metering messages in order to send the samples to the collector.

10.8.3 Notification agent

In the Icehouse release of OpenStack a new service was introduced to be responsible for consuming notifications that are coming from other OpenStack services.

If the ceilometer-agent-notification service is not installed and started, samples originating from notifications will not be generated. In case of the lack of notification based samples, the state of this service and the log file of Telemetry should be checked first.

For the list of meters that are originated from notifications, see the Telemetry Measurements Reference.

10.8.4 Recommended auth_url to be used

When using the Telemetry command-line client, the credentials and the os_auth_url have to be set in order for the client to authenticate against OpenStack Identity. For further details about the credentials that have to be provided see the Telemetry Python API.

The service catalog provided by OpenStack Identity contains the URLs that are available for authentication. The URLs have different ports, based on whether the type of the given URL is public, internal or admin.

OpenStack Identity is about to change API version from v2 to v3. The adminURL endpoint (which is available via the port: 35357) supports only the v3 version, while the other two supports both.

The Telemetry command line client is not adapted to the v3 version of the OpenStack Identity API. If the adminURL is used as os_auth_url, the ceilometer command results in the following error message:

$ ceilometer meter-list
  Unable to determine the Keystone version to authenticate with \
  using the given auth_url: http://10.0.2.15:35357/v2.0

Therefore when specifying the os_auth_url parameter on the command line or by using environment variable, use the internalURL or publicURL.

For more details check the bug report Bug 1351841.

10.9 Telemetry best practices

The following are some suggested best practices to follow when deploying and configuring the Telemetry service. The best practices are divided into data collection and storage.

10.9.1 Data collection

  1. The Telemetry service collects a continuously growing set of data. Not all the data will be relevant for an administrator to monitor.

    • Based on your needs, you can edit the pipeline.yaml configuration file to include a selected number of meters while disregarding the rest.

    • By default, Telemetry service polls the service APIs every 10 minutes. You can change the polling interval on a per meter basis by editing the pipeline.yaml configuration file.

      Warning
      Warning

      If the polling interval is too short, it will likely cause increase of stored data and the stress on the service APIs.

    • Expand the configuration to have greater control over different meter intervals.

      Note
      Note

      For more information, see the Section 10.3.1, “Pipeline configuration”.

  2. If you are using the Kilo version of Telemetry, you can delay or adjust polling requests by enabling the jitter support. This adds a random delay on how the polling agents send requests to the service APIs. To enable jitter, set shuffle_time_before_polling_task in the ceilometer.conf configuration file to an integer greater than 0.

  3. If you are using Juno or later releases, based on the number of resources that will be polled, you can add additional central and compute agents as necessary. The agents are designed to scale horizontally.

  4. If you are using Juno or later releases, use the notifier:// publisher rather than rpc:// as there is a certain level of overhead that comes with RPC.

    Note
    Note

    For more information on RPC overhead, see RPC overhead info.

10.9.2 Data storage

  1. We recommend that you avoid open-ended queries. In order to get better performance you can use reasonable time ranges and/or other query constraints for retrieving measurements.

    For example, this open-ended query might return an unpredictable amount of data:

    $ ceilometer sample-list --meter cpu -q 'resource_id=INSTANCE_ID_1'

    Whereas, this well-formed query returns a more reasonable amount of data, hence better performance:

    $ ceilometer sample-list --meter cpu -q 'resource_id=INSTANCE_ID_1;timestamp > 2015-05-01T00:00:00;timestamp < 2015-06-01T00:00:00'
    Note
    Note

    As of the Liberty release, the number of items returned will be restricted to the value defined by default_api_return_limit in the ceilometer.conf configuration file. Alternatively, the value can be set per query by passing limit option in request.

  2. You can install the API behind mod_wsgi, as it provides more settings to tweak, like threads and processes in case of WSGIDaemon.

    Note
    Note

    For more information on how to configure mod_wsgi, see the Telemetry Install Documentation.

  3. The collection service provided by the Telemetry project is not intended to be an archival service. Set a Time to Live (TTL) value to expire data and minimize the database size. If you would like to keep your data for longer time period, you may consider storing it in a data warehouse outside of Telemetry.

    Note
    Note

    For more information on how to set the TTL, see Section 10.2.6, “Storing samples”.

  4. We recommend that you do not use SQLAlchemy back end prior to the Juno release, as it previously contained extraneous relationships to handle deprecated data models. This resulted in extremely poor query performance.

  5. We recommend that you do not run MongoDB on the same node as the controller. Keep it on a separate node optimized for fast storage for better performance. Also it is advisable for the MongoDB node to have a lot of memory.

    Note
    Note

    For more information on how much memory you need, see MongoDB FAQ.

  6. Use replica sets in MongoDB. Replica sets provide high availability through automatic failover. If your primary node fails, MongoDB will elect a secondary node to replace the primary node, and your cluster will remain functional.

    For more information on replica sets, see the MongoDB replica sets docs.

  7. Use sharding in MongoDB. Sharding helps in storing data records across multiple machines and is the MongoDB’s approach to meet the demands of data growth.

    For more information on sharding, see the MongoDB sharding docs.

Print this page