SUSE® Observability

Introduction

SUSE® Observability, formerly known as StackState can be used for Observability of your Kubernetes clusters and their workloads.

The installation of SUSE® Observability, the SUSE® Observability UI extension and the SUSE® Observability Agents takes about 30 minutes in total.

Getting help

For support please file a support case in SUSE Customer Center (SCC).

Prerequisites

License key

A license key for SUSE® Observability server can be obtained via the SUSE Customer Center in the Subscription tab and is be shown as "SUSE® Observability" Registration Code. This license is valid until the end of your Rancher Prime subscription.

Requirements

To install SUSE® Observability, ensure that the cluster has enough CPU and memory capacity.

For information on requirements, see System requirements and sizing.

Storage

SUSE® Observability uses persistent volume claims for the services that need to store data. The default storage class for the cluster will be used for all services unless this is overridden by values specified on the command line or in a values.yaml file. All services come with a pre-configured volume size that should be good to get you started, but can be customized later using variables as required.

SUSE® Observability requires the underlying storage to be based on flash memory (SSD) or similar in performance.

For production environments, NFS is not recommended and supported for storage provisioning in SUSE® Observability due to the potential risk of data corruption.

For our different installation profiles, the following are the defaulted storage requirements:

	trial	10 non-HA	20 non-HA	50 non-HA	100 non-HA	150 HA	250 HA	500 HA	4000 HA
Retention (days)	3	30	30	30	30	30	30	30	30
Storage requirement	154Gi	335Gi	375Gi	475Gi	525Gi	2595Gi	2695Gi	3595Gi	6995Gi

For more details on the defaults used, see the page Configure storage.

Helm

SUSE® Observability is installed through Helm, which needs to be installed with a minimum version of 3.13.1.

The different components

SUSE® Observability Server

This is the on-prem hosted server part of the installation. It contains a set of services to store observability data:

Topology (StackGraph)
Metrics (VictoriaMetrics)
Traces (ClickHouse)
Logs (ElasticSearch)

Next to this, it contains a set of services for all the observability tasks. e.g. Notifications, State management, Monitoring, etc.

SUSE® Observability Agent

The lightweight SUSE® Observability agent is installed on your downstream worker nodes. It collects and reports metrics, events, traces and logs, and it provides real-time observability and insights, enabling proactive monitoring and troubleshooting of your IT environment.

The SUSE® Observability version of the Agent also uses eBPF as a lightweight way to monitor all your workloads and their communication. It also decodes the RED (Rate, Errors and Duration) signals for most of the common L7 protocols like TCP, HTTP, TLS, Redis, etc.

Rancher Prime - Observability UI extension

This is an UI extension to Rancher Manager that integrates the health signals observed by SUSE® Observability. It gives direct access to the health of any resource and a link to SUSE® Observability’s UI for further investigation.

Installing SUSE® Observability server and SUSE® Observability

For information on installing SUSE® Observability along with the server, see Installing SUSE Observability.

Accessing SUSE® Observability

The SUSE® Observability Helm chart has support for creating an Ingress resource to make SUSE® Observability accessible outside of the cluster. Follow these instructions to set that up when you have an ingress controller in the cluster. Make sure that the resulting URL uses TLS with a valid, not self-signed, certificate.

If you prefer to use a load balancer instead of ingress, expose the suse-observability-router service. The URL for the loadbalancer needs to use a valid, not self-signed, TLS certificate.

Installing UI Extensions

For information on installing the SUSE® Observability UI, see Installing UI extensions.

Installing the SUSE® Observability Agent

For information on installing the SUSE® Observability agent, see Quick start guide.

Required privileges

The deployment of the SUSE® Observability agent requires the following system privileges:

hostPID: true: This privilege is required to associate process identifiers (PIDs) with their corresponding control groups (cgroups). This association is essential for accurately mapping processes to their respective containers.
hostNetwork: true (Optional): By default, the node agent runs with hostNetwork: true to facilitate scraping of open metrics data from all configured pods on the node without requiring additional network policies. If disabled, appropriate network policies must be defined to ensure the agent can access the necessary endpoints.
securityContext.privileged: true: This elevated privilege is required for several critical functions. Primarily, it permits the agent to inject eBPF (extended Berkeley Packet Filter) programs into each network namespace for monitoring purposes. It is also necessary for reading the connection tracking (conntrack) tables across all network namespaces. While this list is not exhaustive, future development aims to replace this broad privilege with more granular Linux capabilities where feasible.

Furthermore, the agent requires container runtime sockets to be mounted within its pod. This configuration is essential as it facilitates direct communication with the container runtime daemons, which is a prerequisite for scraping metrics and metadata from all containers on the host system.

Rancher-Restricted PSA Template

The Rancher-restricted configuration (Pod Security Admission (PSA) Configuration Templates) is a highly restrictive setup that aligns with current best practices for securing pods.

When running Rancher on a Kubernetes cluster that enforces a restrictive security policy by default, there are two ways to install the SUSE Observability Helm chart:

Exempt the entire chart namespace, along with other required Rancher namespaces.
Disable the privileged Elasticsearch init container by setting elasticsearch.sysctlInitContainer.enabled to false. This requires you to manually increase the virtual memory settings (vm.max_map_count) on the nodes. See also Elasticsearch Required Permissions.

Since the SUSE Observability Agent must run in privileged mode, the recommended approach is to install it into a namespace that you plan to exempt from the restrictive policy.

All SUSE Observability Helm chart containers are configured with the following securityContext settings starting from version v2.3.8 and onwards:

securityContext.capabilities.drop is ["ALL"]
securityContext.seccompProfile.type is "RuntimeDefault"
securityContext.runAsNonRoot is true
securityContext.allowPrivilegeEscalation is false

Single Sign On

To enable Single sign-on with your own authentication provider please see here.

Frequently asked questions & Observations:

Is it mandatory to install a SUSE® Observability agent before proceeding with adding the UI extension?
- No this is not mandatory, the UI extension can be installed independent.
Is it mandatory to install SUSE® Observability Server before we proceed with UI extensions?
- Yes this is mandatory since you need to provide a SUSE® Observability endpoint in the configuration
Can we install SUSE® Observability on a local cluster or on a downstream cluster?
- Both options are possible.
To monitor the downstream clusters, should we install the SUSE® Observability agent from the app store or add a new instance from the SUSE® Observability UI?
- Both options are possible depending on users preference.

Open Issues

When you uninstall and reinstall the UI extensions for Observability, we noticed that service token is not deleted and is reused upon reinstallation. Whenever we uninstall the extensions, service token should be removed.
- This information should be deleted when the UI extensions are uninstalled.
After the extensions are installed, the SUSE® Observability UI opens in the same tab as the Rancher UI.
- You can use shift-click to open in a new tab, this will become the default behaviour
Be aware upgrading or downgrading from HA to NON-HA and vice-versa is not yet supported.

Troubleshooting

For any queries regarding the installation of the UI extension for Observability, see Extension troubleshooting guide.