| |
@@ -0,0 +1,115 @@
|
| |
+ Monitoring / Metrics with Prometheus
|
| |
+ ========================
|
| |
+
|
| |
+ For deployment, we used combination for configuration of prometheus operator and application-monitoring operator.
|
| |
+
|
| |
+ Beware, most of the deployment notes could be mostly obsolete in really short time.
|
| |
+ The POC was done on OpenShift 3.11, which limited us in using older version of prometheus operator,
|
| |
+ as well as the no longer maintained application-monitoring operator.
|
| |
+
|
| |
+ In openshift 4.x that we plan to use in the near future, there is supported way integrated in the openshift deployment:
|
| |
+
|
| |
+ * https://docs.openshift.com/container-platform/4.7/monitoring/understanding-the-monitoring-stack.html
|
| |
+ * https://docs.openshift.com/container-platform/4.7/monitoring/configuring-the-monitoring-stack.html#configuring-the-monitoring-stack
|
| |
+ * https://docs.openshift.com/container-platform/4.7/monitoring/enabling-monitoring-for-user-defined-projects.html
|
| |
+
|
| |
+ The supported stack is more limited, especially w.r.t. adding user defined pod- and service-monitors, but even if we would want to
|
| |
+ run additional prometheus instances, we should be able to skip the instalation of the necessary operators, as all of them should already be present.
|
| |
+
|
| |
+
|
| |
+ Notes on operator deployment
|
| |
+ -------------------
|
| |
+
|
| |
+ Operator pattern is often used with kubernetes and openshift for more complex deployments.
|
| |
+ Instead of applying all of the configuration to dpeloy your services, you deploy a special,
|
| |
+ smaller service called operator, that has necessary permissions to deploy and configure the complex service.
|
| |
+ Once the operator is running, instead of configuring the service itself with servie-specific config-maps,
|
| |
+ you create operator specific kubernetes objects, so-alled CRDs.
|
| |
+
|
| |
+ The deployment of the operator in question was done by configuring the CRDs, roles and rolebinding and operator setup:
|
| |
+
|
| |
+ The definitions are as follows:
|
| |
+ - https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/prometheus-operator-crd
|
| |
+ - https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/rbac/prometheus-operator-crd
|
| |
+ - https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/rbac/prometheus-operator
|
| |
+
|
| |
+ Once the operator is correctly running, you just define a prometheus crd and it will create prometheus deployment for you.
|
| |
+
|
| |
+ The POC lives in https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/openshift-apps/application-monitoring.yml
|
| |
+
|
| |
+
|
| |
+ Notes on application monitoring operator deployment
|
| |
+ ---------------------------------------------------
|
| |
+
|
| |
+ The application-monitoring operator was created to solve the integration of Prometheus, Alertmanager and Grafana.
|
| |
+ After you configure it, it configures the relevant operators responsible for these services.
|
| |
+
|
| |
+ The most interesting difference between configuring this shared operator,
|
| |
+ compared to configuring these operators individually is that it configures some of the integrations,
|
| |
+ and it integrates well with openshifts auth system through oauth proxy.
|
| |
+
|
| |
+ The biggest drawback is, that the application-monitoring operator is orphanned project,
|
| |
+ but because it mostly configures other operators, it is relatively simple to just recreate
|
| |
+ the configuration for both prometheus and alertmanager to be deployed,
|
| |
+ and deploy the prometheus and alertmanager operators without the help or the application-monitoring operator.
|
| |
+
|
| |
+ Notes on persistence
|
| |
+ --------------------
|
| |
+
|
| |
+ Prometheus by default expects to have a writable /prometheus folder,
|
| |
+ that can serve as persistent storage.
|
| |
+
|
| |
+ For the persistent volume to work for this purpose, it has to
|
| |
+ **needs to have POSIX-compliant filesystem**, and NFS we currently have configured is not.
|
| |
+ This is discussed in the `operational aspects <https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects>`_
|
| |
+ of Prmetheus documentation
|
| |
+
|
| |
+ The easiest supported way to have a POSIX-compliant `filesystem is to setup local-storage <https://docs.openshift.com/container-platform/3.11/install_config/configuring_local.html>`_
|
| |
+ in the cluster.
|
| |
+
|
| |
+ In 4.x versions of OpenShift `there is a local-storage-operator <https://docs.openshift.com/container-platform/4.7/storage/persistent_storage/persistent-storage-local.html>`_ for this purpose.
|
| |
+
|
| |
+ This is the simplest way to have working persistence, but it prevents us to have multiple instanes
|
| |
+ across openshift nodes, as the pod is using the underlying gilesystem on the node.
|
| |
+
|
| |
+ To ask the operator to create persisted prometheus, you specify in its configuration i.e.:
|
| |
+
|
| |
+ ::
|
| |
+
|
| |
+ storage:
|
| |
+ volumeClaimTemplate:
|
| |
+ spec:
|
| |
+ retention: 24h
|
| |
+ storageClassName: local
|
| |
+ resources:
|
| |
+ requests:
|
| |
+ storage: 10Gi
|
| |
+
|
| |
+ By default retention is set to 24 hours and can be over-ridden
|
| |
+
|
| |
+
|
| |
+ Notes on long term storage
|
| |
+ --------------------
|
| |
+
|
| |
+ Usually, the prometheus itself is setup to store its metrics for shorter ammount of time,
|
| |
+ and it is expected that for longterm storage and analysis, there is some other storage solution,
|
| |
+ such as influxdb, timescale.
|
| |
+
|
| |
+ We are currently running a POC that sychronizes Prometheus with Timescaledb (running on Postgresql)
|
| |
+ through a middleware service called `promscale <https://github.com/timescale/promscale>`_ .
|
| |
+
|
| |
+ Promscale just needs an access to a appropriate postgresql database:
|
| |
+ and can be configured through PROMSCALE_DB_PASSWORD, PROMSCALE_DB_HOST.
|
| |
+
|
| |
+ By default it will ensure the database has timescale installed and cofigures its database
|
| |
+ automatically.
|
| |
+
|
| |
+ We setup the prometheus with directive to use promscale service as a backend:
|
| |
+ https://github.com/timescale/promscale
|
| |
+
|
| |
+ ::
|
| |
+
|
| |
+ remote_write:
|
| |
+ - url: "http://promscale:9201/write"
|
| |
+ remote_read:
|
| |
+ - url: "http://promscale:9201/read"
|
| |
\ No newline at end of file
|
| |