Monitoring

Overview

For in-cluster monitoring, Kyma uses Prometheus as the open source monitoring and alerting toolkit that collects and stores metrics data. This data is consumed by several addons, including Grafana for analytics and monitoring, and Alertmanager for handling alerts.

Monitoring in Kyma is configured to collect all metrics relevant for observing the in-cluster Istio Service Mesh. For diagrams of the default setup and the monitoring flow including Istio, see Monitoring Architecture.

Learn how to enable Grafana visualization and enable mTLS for custom metrics.

Limitations

In the production profile, Prometheus stores up to 15 GB of data for a maximum period of 30 days. If the default size or time is exceeded, the oldest records are removed first. The evaluation profile has lower limits. For more information about profiles, see Install Kyma: Choose resource consumption.

The configured memory limits of the Prometheus and Prometheus-Istio instances define the number of time series samples that can be ingested.

The default resource configuration of the monitoring component in the production profile is sufficient to serve 800K time series in the Prometheus Pod, and 400K time series in the Prometheus-Istio Pod. The samples are deleted after 30 days or when reaching the storage limit of 15 GB.

The amount of generated time series in a Kyma cluster depends on the following factors:

  • Number of Pods in the cluster
  • Number of Nodes in the cluster
  • Amount of exported (custom) metrics
  • Label cardinality of metrics
  • Number of buckets for histogram metrics
  • Frequency of Pod recreation
  • Topology of the Istio Service Mesh

You can see the number of ingested time series samples from the prometheus_tsdb_head_series metric, which is exported by the Prometheus itself. Furthermore, you can identify expensive metrics with the TSDB Status page.