IaaS monitoring (experimental)
This component is marked as experimental, and it is not part of the reference SCS installation available at https://monitoring.scs.community.
IaaS monitoring currently integrates and is able to observe the following targets:
Prerequisites
To test the Monitoring of the IaaS layer we expect running Kubernetes cluster that already contains SCS monitoring platform.
Local environment use case - KinD/K3s cluster deployed locally
KinD
Install the SCS monitoring solution into the KinD Kubernetes cluster following the instructions provided in the quickstart guide.
K3s
Install the SCS monitoring solution into the K3s Kubernetes cluster following the instructions provided in the k3s guide.
OSISM use case - K3s cluster in OSISM deployment
OSISM utilizes the k3s distribution of Kubernetes as a management cluster for the OSISM IaaS platform. This management cluster is then used as a host for the SCS monitoring solution. Subsequently, the management cluster becomes an Observer cluster as it hosts the SCS monitoring solution. From that point, the Observer cluster observes itself (i.e., k3s cluster control plane components and nodes) and is used for observing the IaaS layer around the k3s cluster.
In the case of the existing OSISM IaaS deployment >= 7.0.3 on baremetal, testbed or cloud in the box we expect a management k3s Kubernetes cluster with the deployed SCS monitoring platform. If your OSISM installation does not meet the above requirements, apply the following plays:
osism apply kubernetes
osism apply kubernetes-monitoring
Deploy IaaS monitoring components
OpenStack
Prometheus metrics and alerts
The OpenStack exporter for Prometheus could be deployed using the SCS openstack-exporter-helm-chart.
This exporter contains a bunch of Prometheus alerts and rules
that are deployed together with the exporter.
Visit the iaas/openstack-exporter-values.yaml
file to validate the Helm configuration options.
Ensure valid OpenStack API credentials are set under the clouds_yaml_config
section. This MUST be overridden!
helm upgrade --install prometheus-openstack-exporter oci://registry.scs.community/openstack-exporter/prometheus-openstack-exporter \
--version 0.4.5 \
-f iaas/openstack-exporter-values.yaml # --set "endpoint_type=public" --set "serviceMonitor.scrapeTimeout=1m"
Tip: If you want to test the exporter basic functionality with public OpenStack API, configure endpoint_type
to public
(--set "endpoint_type=public"
). Note that configuring endpoint_type
as public
will result in
incomplete functionality for the Grafana dashboard.
Tip: Requesting and collecting metrics from the OpenStack API can be time-consuming, especially if the API is not
performing well. In such cases, you may observe timeouts on the Prometheus server when it tries to fetch OpenStack
metrics. To mitigate this, consider increasing the scrape interval to e.g. 1 minute (--set "serviceMonitor.scrapeTimeout=1m"
).
Grafana dashboards
The Grafana dashboard designed to visualize metrics collected from an OpenStack cloud through the OpenStack exporter
is publicly available at https://grafana.com/grafana/dashboards/21085. Its source code is located in the
iaas/dashboards
directory. Feel free to import it to the Grafana via its source or ID.
For automatic integration into the SCS monitoring solution proceed to the next step.
Update the SCS monitoring deployment
This step deploys the Grafana dashboards and instructs the monitoring stack to add the OpenStack exporter target into the Prometheus configuration:
helm upgrade dnation-kubernetes-monitoring-stack dnationcloud/dnation-kubernetes-monitoring-stack --reset-then-reuse-values -f iaas/values-observer-iaas.yaml
- Note: The
--reset-then-reuse-values
option requires Helm v3.14.0 or later. Alternatively, you can use the original values by applying-f values-observer.yaml
, see full command:helm upgrade dnation-kubernetes-monitoring-stack dnationcloud/dnation-kubernetes-monitoring-stack -f values-observer.yaml -f iaas/values-observer-iaas.yaml
Access the OpenStack dashboard
At this point, you should have the ability to access the Grafana UI, and OpenStack dashboard.
Log in to the Grafana UI and find the OpenStack dashboard in IaaS directory:
http://localhost:30000
or directly access the OpenStack dashboard:
http://localhost:30000/d/openstack-overview
- Use the following credentials:
- username:
admin
- password:
pass
- username:
Ceph
The SCS IaaS reference implementation (OSISM) currently supports ceph-ansible method for deploying Ceph. Support for the rook operator deployment method will be available soon.
This guide covers Ceph cluster monitoring for both deployment methods. While both expose the same metrics via the same endpoint, there are some differences in Prometheus configuration and alerts.
Prometheus metrics and alerts
Ceph contains 2 build-in sources of metrics a.k.a. exporters. The Ceph exporter (introduced in Reef release of Ceph) is the main source of Ceph performance metrics. It runs as a dedicated daemon. This daemon runs on every Ceph cluster host and exposes a metrics end point where all the performance counters exposed by all the Ceph daemons running in the host are published in the form of Prometheus metrics.
The second source of metrics is the Prometheus manager module. It exposes metrics related to the whole cluster, basically metrics that are not produced by individual Ceph daemons.
Read the related Ceph docs. Since these exporters are integrated with Ceph, deploying a third-party Ceph exporter is unnecessary.
Prometheus alerts
Both Ceph deployment strategies use the ceph-mixins project as a source of alerts. The ceph-ansible and rook projects each maintain a rendered version of these alerts, but the rook repository contains some differences, primarily because rook does not use the cephadm tool as a backend. Therefore, find and apply one of the following commands to create a custom observer rules values file for either the ceph-ansible or ceph-rook deployment (yq tool required):
# ceph-ansible
curl -s https://raw.githubusercontent.com/ceph/ceph/main/monitoring/ceph-mixin/prometheus_alerts.yml | \
yq '{"kube-prometheus-stack": {"additionalPrometheusRulesMap": {"ceph-ansible-rules": (. + {"additionalLabels": {"prometheus_rule": "1"}})}}}' > iaas/values-observer-ceph-rules.yaml
# rook
curl -s https://raw.githubusercontent.com/rook/rook/master/deploy/charts/rook-ceph-cluster/prometheus/localrules.yaml | \
yq '{"kube-prometheus-stack": {"additionalPrometheusRulesMap": {"ceph-rook-rules": (. + {"additionalLabels": {"prometheus_rule": "1"}})}}}' > iaas/values-observer-ceph-rules.yaml
Grafana dashboards
We've tested and could recommend 2 sources of Grafana dashboards that are suitable for both Ceph deployment strategies (ansible and rook):
- dashboards linked in rook docs
- ceph-mixins dashboards
- Built version of ceph-mixins dashboards could be found e.g. here
We consider the dashboards created within the Rook project as a solid starting point for Ceph metrics visualization.
If you want to see more detailed dashboards, uncomment and use the ceph-mixin dashboards in the values-observer-ceph-rook.yaml
or values-observer-ceph-ansible.yaml
file. You can use both.
Update the SCS monitoring deployment
This step deploys Grafana dashboards, Prometheus rules and instruct monitoring stack to add the Ceph exporter targets into the Prometheus configuration.
Ensure that you add the monitoring targets' IPs and ports to values-observer-ceph-ansible.yaml
for Ceph-ansible deployment.
helm upgrade dnation-kubernetes-monitoring-stack dnationcloud/dnation-kubernetes-monitoring-stack --reset-then-reuse-values \
-f iaas/values-observer-ceph-rules.yaml \
-f iaas/values-observer-ceph-[rook|ansible].yaml # use values file for either the ceph-ansible or ceph-rook deployment
- Note: The
--reset-then-reuse-values
option requires Helm v3.14.0 or later. Alternatively, you can use the original values by applying-f values-observer.yaml
, see full command:helm upgrade dnation-kubernetes-monitoring-stack dnationcloud/dnation-kubernetes-monitoring-stack -f values-observer.yaml -f iaas/values-observer-ceph-rules.yaml -f iaas/values-observer-ceph-[rook|ansible].yaml