Enable Monitoring
As an administrator or cluster owner, you can configure Rancher to deploy Prometheus to monitor your Kubernetes cluster.
This page describes how to enable monitoring and alerting within a cluster using the new monitoring application.
You can enable monitoring with or without SSL.
Requirements
-
Allow traffic on port 9796 for each of your nodes. Prometheus scrapes metrics from these ports.
-
You may also need to allow traffic on port 10254 for each of your nodes, if PushProx is disabled (
ingressNginx.enabled
set tofalse
), or you’ve upgraded from a previous Rancher version that had v1 monitoring already installed.
-
-
Make sure that your cluster fulfills the resource requirements. The cluster should have at least 1950Mi memory available, 2700m CPU, and 50Gi storage. See Configuring Resource Limits and Requests for a breakdown of the resource limits and requests.
-
When you install monitoring on an RKE cluster that uses RancherOS or Flatcar Linux nodes, change the etcd node certificate directory to
/opt/rke/etc/kubernetes/ssl
. -
For clusters that have been provisioned with the RKE CLI and that have the address set to a hostname instead of an IP address, set
rkeEtcd.clients.useLocalhost
totrue
when you configure the Values during installation. For example:
rkeEtcd:
clients:
useLocalhost: true
If you want to set up Alertmanager, Grafana or Ingress, it has to be done with the settings on the Helm chart deployment. It’s problematic to create Ingress outside the deployment. |
Setting Resource Limits and Requests
The resource requests and limits can be configured when installing rancher-monitoring
. To configure Prometheus resources from the Rancher UI, click in the upper left corner.
For more information about the default limits, see this page.
Install the Monitoring Application
Enable Monitoring for use without SSL
-
Click ☰ > Cluster Management.
-
Go to the cluster that you created and click Explore.
-
Click Cluster Tools (bottom left corner).
-
Click Install by Monitoring.
-
Optional: Customize requests, limits and more for Alerting, Prometheus, and Grafana in the Values step. For help, refer to the configuration reference.
Result: The monitoring app is deployed in the cattle-monitoring-system
namespace.
Enable Monitoring for use with SSL
-
Follow the steps on this page to create a secret in order for SSL to be used for alerts.
-
The secret should be created in the
cattle-monitoring-system
namespace. If it doesn’t exist, create it first. -
Add the
ca
,cert
, andkey
files to the secret.
-
-
In the upper left corner, click ☰ > Cluster Management.
-
On the Clusters page, go to the cluster where you want to enable monitoring for use with SSL and click Explore.
-
Click
. -
Click Monitoring.
-
Click Install or Update, depending on whether you have already installed Monitoring.
-
Check the box for Customize Helm options before install and click Next.
-
Click Alerting.
-
In the Additional Secrets field, add the secrets created earlier.
Result: The monitoring app is deployed in the cattle-monitoring-system
namespace.
When creating a receiver, SSL-enabled receivers such as email or webhook will have a SSL section with fields for CA File Path, Cert File Path, and Key File Path. Fill in these fields with the paths to each of ca
, cert
, and key
. The path will be of the form /etc/alertmanager/secrets/name-of-file-in-secret
.
For example, if you created a secret with these key-value pairs:
ca.crt=`base64-content`
cert.pem=`base64-content`
key.pfx=`base64-content`
Then Cert File Path would be set to /etc/alertmanager/secrets/cert.pem
.
Rancher Performance Dashboard
When monitoring is installed on the upstream (local) cluster, you are given basic health metrics about the Rancher pods, such as CPU and memory data. To get advanced metrics for your local Rancher server, you must additionally enable the Rancher Performance Dashboard for Grafana.
This dashboard provides access to the following advanced metrics:
-
Handler Average Execution Times Over Last 5 Minutes
-
Rancher API Average Request Times Over Last 5 Minutes
-
Subscribe Average Request Times Over Last 5 Minutes
-
Lasso Controller Work Queue Depth (Top 20)
-
Number of Rancher Requests (Top 20)
-
Number of Failed Rancher API Requests (Top 20)
-
K8s Proxy Store Average Request Times Over Last 5 Minutes (Top 20)
-
K8s Proxy Client Average Request Times Over Last 5 Minutes (Top 20)
-
Cached Objects by GroupVersionKind (Top 20)
-
Lasso Handler Executions (Top 20)
-
Handler Executions Over Last 2 Minutes (Top 20)
-
Total Handler Executions with Error (Top 20)
-
Data Transmitted by Remote Dialer Sessions (Top 20)
-
Errors for Remote Dialer Sessions (Top 20)
-
Remote Dialer Connections Removed (Top 20)
-
Remote Dialer Connections Added by Client (Top 20)
Profiling data (such as advanced memory or CPU analysis) is not present as it is a very context-dependent technique that’s meant for debugging and not intended for normal observation. |
Enabling the Rancher Performance Dashboard
To enable the Rancher Performance Dashboard:
-
Helm
-
UI
Use the following options with the Helm CLI:
--set extraEnv\[0\].name="CATTLE_PROMETHEUS_METRICS" --set-string extraEnv\[0\].value=true
You can also include the following snippet in your Rancher Helm chart’s values.yaml file:
extraEnv:
- name: "CATTLE_PROMETHEUS_METRICS"
value: "true"
-
Click ☰ > Cluster Management.
-
Go to the row of the
local
cluster and click Explore. -
Click
. -
Use the dropdown menu at the top to filter for All Namespaces.
-
Under the
cattle-system
namespace, go to therancher
row and click ⋮ > Edit Config -
Under Environment Variables, click Add Variable.
-
For Type, select
Key/Value Pair
. -
For Variable Name, enter
CATTLE_PROMETHEUS_METRICS
. -
For Value, enter
true
. -
Click Save to apply the change.