#9751 Requirements for allowing the application-monitoring project to have access to all of the openshift pods
Closed: Fixed 3 years ago by kevin. Opened 3 years ago by asaleh.

Describe what you would like us to do:


In ARC team we are currently running a POC for application-monitoring of openshift applications.
We ran into a problem, where the nice part of auto-defining what you want to have monitored works, but the prometheus instance couldn't actually read the endpoints in other namespaces.

We solved this in staging by

oc adm pod-network make-projects-global application-monitoring

that gives network access for pods in this project to pods in all other namespaces.

Because we will want to use the same setup in production, we need agreement if

a) this is enough as this is only a single namespace, with clear purpose of monitoring other apps
b) we shouldn't give greater access to pods in application-monitoring namespace at all, even if this breaks the service-discovery
c) we should use it, we just need to figure out a way to make the access to other namespaces less indiscriminate, i.e restrict the access to specific pods based on labels
d) we should use it, we just need to figure out how to make appropriate user-access restrictions to the prometheus itself

This isn't really either/or, think of it more in a line, how far do we need c) and/or d) so that we can go ahead with all of the features and not throw them away.

Links:

Example prometheus graph:

https://prometheus-operated-application-monitoring.app.os.stg.fedoraproject.org/graph?g0.range_input=1h&g0.expr=sum(rate(pyramid_request_count%7Bservice%3D%22bodhi-web%22%7D%5B5m%5D))&g0.tab=0

Definition of application-monitoring:

https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/openshift-apps/application-monitoring.yml

Definition of the ServiceMonitor for bodhi-web:

https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/bodhi/files/servicemonitor.yml


Need to have discussion around this.

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

3 years ago

Update.

To give the prometheus pod access to just those pods it is supposed to monitor, we would have needed to to use redhat/openshift-ovs-networkpolicy as a network-plugin.

Currently we are using redhat/openshift-ovs-multitenant, where there seem to be only three available configurations for a project.

a) project has a unique non-zero netid - is isolated from all projects
b) project shares a non-zero netid with other projects - projects within the network id can connect to each other
c) project has netid 0 - project can connect to any service in the cluster, only projects with netid 0 can connect to services in it

I probably wouldn't experiment with changing the network plugin in our production 3.x version openshift, and would go with netid 0 for application-monitoring project.

Currently we are already running cluster-monitoring in netid 0, alongside service-catalog and the default project.

(By the way, once the debuginfod service is rolled out, it will export /metrics on its public-facing url. An internal prometheus instance should be able to connect to it via that rather than intra-cluster shortcuts.)

So, as I noted elsewhere, we are likely going to spin up a new RHOS4 cluster soon, and we should make sure to configure it the way we need for this from the start. ;)

Yeah, doing netid 0 seems fine for now.

Do we want to keep this ticket open for anything? Or just close it ?

Lets close out, feel free to reopen if there's anything further to address here.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Done