#1578 Zuul CI namespace
Closed: Fixed 3 days ago by arrfab. Opened a month ago by mhuin.

The Cloud SIG would like to host Zuul components (ie the application control plane itself; worker nodes would run on external cloud resources) using Software Factory's sf-operator (https://softwarefactory-project.io) on a namespace on CentOS Infra's OCP.

Please answer the following questions so that we understand your requirement.

  • How does your project relates to Fedora/CentOS?

Running CI workloads for the Cloud SIG. Software Factory itself already has tight integrations with Fedora and CentOS, running some CI jobs for the projects.

  • Describe your workflow and if you need any special permissions (other than
    admin access to namespace), please tell us and provide a reason for them.

The control plane would be set up on a user namespace. Ideally, but not mandatorily, we would need the following permissions:

  1. Ability to manage operators in our namespace: the control plane and several of its dependencies are easier to maintain when deployed through operators.
  2. Ability to manage TLS certificates issuance: ideally via the cert-manager operator
  3. Nested containers support: this requires at least OpenShift 4.18 so this might come at a later time.
  • Do you need bare-metal/vms checkout capability? (we prefer your workflow
    containerized)

No, everything runs in containers. VM worker nodes will be managed somewhere else.

  • Resources required

For a comparable deployment than the one we are targeting here, we see the following use in terms or resource quotas ("k8s units"):

  • cpu 25
  • memory 24G

We currently do CI logs' storage on a persistent volume, which would require at least 1TB. But we can work on supporting storage on an S3-api compatible bucket.

Project_name: softwarefactory-project.io
Project_admins:
 - mhuin
 - TBA

Metadata Update from @arrfab:
- Issue assigned to arrfab

a month ago

Metadata Update from @arrfab:
- Issue tagged with: centos-ci-infra, high-gain, medium-trouble, namespace-request, need-more-info

a month ago

WRT operators : you'll have full admin rights in your namespace/project but if you need cluster-wise operators to be installed, we can have a look and install what's required (if supported and not conflicting with any other ones)

TLS: we currently have automation for letsencrypt certs for *.apps.ocp.cloud.ci.centos.org , renewed automatically and updated for ingress traffic.
Nothing is stopping you (as you'll have admin rights) to also add another TLS cert and route but you'll have to do that yourself.
Never looked at the cert-manager operator but that can be investigated though

We're still on openshift 4.15.x stable branch, as we discovered that some deployments will have to be updated : 4.16 drops deploymentconfig and we have some tenants still using it, so that upgrade (and then the next ones) will be out-of-scope for this ticket (but communicated on the ci-users list

For PVs, we're relying on AWS EFS, but 1TB seems quite a lot for logs, versus what we usually offer for tenants, so let me see about that requirement

WRT operators : you'll have full admin rights in your namespace/project but if you need cluster-wise operators to be installed, we can have a look and install what's required (if supported and not conflicting with any other ones)

namespace-scoped operators should be more than enough. It would be nice to be allowed access to prometheus resources as provided by the prometheus operator (so that we could deploy a prometheus instance in our namespace), would that be possible? Some cluster admins do not allow this as it might interfere with proper metrics monitoring on the cluster if it isn't done properly.

TLS: we currently have automation for letsencrypt certs for *.apps.ocp.cloud.ci.centos.org , renewed automatically and updated for ingress traffic.
Nothing is stopping you (as you'll have admin rights) to also add another TLS cert and route but you'll have to do that yourself.
Never looked at the cert-manager operator but that can be investigated though

As long as there is a way to manage certificates under our own domain, this is all that matters. Is this process documented somewhere?

We're still on openshift 4.15.x stable branch, as we discovered that some deployments will have to be updated : 4.16 drops deploymentconfig and we have some tenants still using it, so that upgrade (and then the next ones) will be out-of-scope for this ticket (but communicated on the ci-users list

Nested containers support is explained here (prerequisites and needed config): https://developers.redhat.com/articles/2024/12/02/enable-nested-containers-openshift-dev-spaces-user-namespaces# Is this something that could be enabled on this cluster at some point in the future, or would there be strong opposition against it?

For PVs, we're relying on AWS EFS, but 1TB seems quite a lot for logs, versus what we usually offer for tenants, so let me see about that requirement

We can work on alternatives to host these logs so this is not a hard blocker.

Regarding the resource quotas I mentioned above, are these okay? What resource quota limits do you have in place? Is there a process in place to request quota increases?

Just fyi:

We're still on openshift 4.15.x stable branch, as we discovered that some deployments will have to be updated : 4.16 drops deploymentconfig and we have some tenants still using it, so that upgrade (and then the next ones) will be out-of-scope for this ticket (but communicated on the ci-users list

deploymentconfig is not completely dropped in 4.16 or 4.17 (although it might be in 4.18?). It's just deprecated, so it warns you about it, but everything is still there and it works.

Would it be possible to allow our namespace to run privileged containers? That would be an alternative for enabling nested containers.

Would it be possible to allow our namespace to run privileged containers? That would be an alternative for enabling nested containers.

Because it's a shared cluster for multiple CI tenants, we initially decided to not let any tenant run privileged containers, no

I got most of my immediate questions answered except the one about quotas. I am waiting on my team to provide me their ids to be added as project admins. Once I add this list to this ticket I guess we can proceed with the namespace creation if all is fine on your side.

As for ongoing support and feature requests, I assume this issue tracker is the right one to use?

@mhuin : currently we don't enforce any quota, as long as each onboard CI tenant on that cluster is a "good citizen" and so we don't have resources problem.
Should that become the case, we'd investigate either expanding resources or implementing quota on specific projects/namespaces

Here is the final list of project admins:

  • mhuin
  • fserucas
  • tonyb
  • fbo
  • vcherkas
  • calancha
  • danpawlik
  • tdecacqu
  • nhicher
  • apevec

FAS/ACO group created : https://accounts.centos.org/group/ocp-cico-cloud-softwarefactory/
So that automatically created the cloud-softwarefactory ns/project on ocp.

Can you confirm that you have correct access ?
Also, let us know about the PVs we have to create and we'll then create PVCs that you'll be able to reclaim accordingly

@mhuin : do you still need some PVs for your zuul deployment ? or can I close the ticket (as group and namespace in ocp exist now).

Metadata Update from @arrfab:
- Issue priority set to: Waiting on Reporter (was: Needs Review)

17 days ago

FAS/ACO group created : https://accounts.centos.org/group/ocp-cico-cloud-softwarefactory/
So that automatically created the cloud-softwarefactory ns/project on ocp.

Can you confirm that you have correct access ?
Also, let us know about the PVs we have to create and we'll then create PVCs that you'll be able to reclaim accordingly

I can see the project properly, thanks!

As for PVs, let's start with:

  • 4 PVs, 1GB
  • 2 PVs, 5GB
  • 1 PV, 30GB
  • 2 PVs, 300GB

Would that work?

sure but can you give us some metadata so that when we'll create the PVs it will be easier for you to request/consume ? (basically they'll be provided with some claimRef so that your PVC[s] will match with corresponding PVs

Assuming the PVC's name and labels are enough, here is a csv list (labels are |-separated):

# name; size; labels; accessModes
git-server-git-server-0;1Gi;app=sf|run=git-server; ReadWriteOnce
logserver-logserver-0;300GI;app=sf|run=logserver; ReadWriteOnce
mariadb-logs-mariadb-0;1Gi:app=sf|run=mariadb; ReadWriteOnce
mariadb-mariadb-0;5Gi:app=sf|run=mariadb; ReadWriteOnce
nodepool-builder-nodepool-builder-0;300Gi:app=sf|run=nodepool-builder; ReadWriteOnce
zookeeper-data-zookeeper-0;5Gi:app=sf|run=zookeeper; ReadWriteOnce
zookeeper-logs-zookeeper-0;1Gi:app=sf|run=zookeeper; ReadWriteOnce
zuul-merger-zuul-merger-0;30Gi:app=sf|run=zuul-merger; ReadWriteOnce
zuul-scheduler-zuul-scheduler-0;1Gi:app=sf|run=zuul-scheduler; ReadWriteOnce

The following PVs were created and so are available from within your cloud-softwarefactory namespace :

     1  pv-1gi-cloud-softwarefactory-git-server-4cddf1d8-2a35-5479-8973-35764e588922           1Gi        RWO,ROX,RWX    Retain           Available   cloud-softwarefactory/git-server                                 7m31s
     2  pv-1gi-cloud-softwarefactory-mariadb-logs-669c3f5b-21d9-53cb-bb0d-4614c0669e19         1Gi        RWO,ROX,RWX    Retain           Available   cloud-softwarefactory/mariadb-logs                               5m49s
     3  pv-1gi-cloud-softwarefactory-zookeeper-logs-e0b075ef-1125-596b-8630-c223f37af3ee       1Gi        RWO,ROX,RWX    Retain           Available   cloud-softwarefactory/zookeeper-logs                             116s
     4  pv-1gi-cloud-softwarefactory-zuul-scheduler-a341e0e4-27ff-53ea-8de8-9c0331195019       1Gi        RWO,ROX,RWX    Retain           Available   cloud-softwarefactory/zuul-scheduler                             27s
     5  pv-300gi-cloud-softwarefactory-logserver-49cde16f-cad9-51ee-9df9-691d4aa45004          300Gi      RWO,ROX,RWX    Retain           Available   cloud-softwarefactory/logserver                                  6m23s
     6  pv-300gi-cloud-softwarefactory-nodepool-builder-4419083e-2b75-5730-8b52-4df8d8ee0a1b   300Gi      RWO,ROX,RWX    Retain           Available   cloud-softwarefactory/nodepool-builder                           4m25s
     7  pv-30gi-cloud-softwarefactory-zuul-merger-6c929ffa-14e9-5022-a0ee-98fbf050e29c         30Gi       RWO,ROX,RWX    Retain           Available   cloud-softwarefactory/zuul-merger                                75s
     8  pv-5gi-cloud-softwarefactory-mariadb-5e462c91-6321-5fd5-9430-fe2410db703b              5Gi        RWO,ROX,RWX    Retain           Available   cloud-softwarefactory/mariadb                                    5m17s
     9  pv-5gi-cloud-softwarefactory-zookeeper-data-32821d09-cf2a-5231-bb2b-cfb4998f2026       5Gi        RWO,ROX,RWX    Retain           Available   cloud-softwarefactory/zookeeper-data                             3m41s

So, worth knowing that each PV was configured to match a corresponding PVC that you'll have to create when deploying your app, and so in the case of the first PV, the corresponding PVC should have git-server as name in the PVC to match the PV, that is waiting to be bound with that claimRef :

oc get pv/pv-1gi-cloud-softwarefactory-git-server-4cddf1d8-2a35-5479-8973-35764e588922 -o yaml|grep -A 2 claimRef
  claimRef:
    name: git-server
    namespace: cloud-softwarefactory

and so on for the other PVs.
I'll let you have a look but it should be all done from a centos infra side that is.

PS : myself on "travel mode" this week due to Fosdem so slower to react on comments on issues tracker

This is an issue for us as our PVCs are created by statefulsets, meaning that they are defined as templates and follow a predetermined naming convention from kubernetes: <VolumeClaimTemplate name>-<statefulset name>-<replica number>.

We know that at least in the beginning, we'll never go over replica values of 1. Could you edit the claimRefs to match the names that were listed above?

hey @mhuin .
Had 5 minutes before diving into other meeting today so quickly edited the PVs and updated the claimref so it should match with your deployment expectations :

oc get pv/pv-1gi-cloud-softwarefactory-git-server-4cddf1d8-2a35-5479-8973-35764e588922 -o yaml|grep -A 2 claimRef
  claimRef:
    name: git-server-git-server-0
    namespace: cloud-softwarefactory

Can you just verify and post feedback ?

No feedback but we think (centos infra side) that it should be ok, so closing this request (but feel free to reopen in case of small needed change)

Metadata Update from @arrfab:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 days ago

Log in to comment on this ticket.

Metadata