The Cloud SIG would like to host Zuul components (ie the application control plane itself; worker nodes would run on external cloud resources) using Software Factory's sf-operator (https://softwarefactory-project.io) on a namespace on CentOS Infra's OCP.
Please answer the following questions so that we understand your requirement.
Running CI workloads for the Cloud SIG. Software Factory itself already has tight integrations with Fedora and CentOS, running some CI jobs for the projects.
The control plane would be set up on a user namespace. Ideally, but not mandatorily, we would need the following permissions:
No, everything runs in containers. VM worker nodes will be managed somewhere else.
For a comparable deployment than the one we are targeting here, we see the following use in terms or resource quotas ("k8s units"):
We currently do CI logs' storage on a persistent volume, which would require at least 1TB. But we can work on supporting storage on an S3-api compatible bucket.
Project_name: softwarefactory-project.io Project_admins: - mhuin - TBA
Metadata Update from @arrfab: - Issue assigned to arrfab
Metadata Update from @arrfab: - Issue tagged with: centos-ci-infra, high-gain, medium-trouble, namespace-request, need-more-info
WRT operators : you'll have full admin rights in your namespace/project but if you need cluster-wise operators to be installed, we can have a look and install what's required (if supported and not conflicting with any other ones)
TLS: we currently have automation for letsencrypt certs for *.apps.ocp.cloud.ci.centos.org , renewed automatically and updated for ingress traffic. Nothing is stopping you (as you'll have admin rights) to also add another TLS cert and route but you'll have to do that yourself. Never looked at the cert-manager operator but that can be investigated though
*.apps.ocp.cloud.ci.centos.org
cert-manager
We're still on openshift 4.15.x stable branch, as we discovered that some deployments will have to be updated : 4.16 drops deploymentconfig and we have some tenants still using it, so that upgrade (and then the next ones) will be out-of-scope for this ticket (but communicated on the ci-users list
For PVs, we're relying on AWS EFS, but 1TB seems quite a lot for logs, versus what we usually offer for tenants, so let me see about that requirement
namespace-scoped operators should be more than enough. It would be nice to be allowed access to prometheus resources as provided by the prometheus operator (so that we could deploy a prometheus instance in our namespace), would that be possible? Some cluster admins do not allow this as it might interfere with proper metrics monitoring on the cluster if it isn't done properly.
As long as there is a way to manage certificates under our own domain, this is all that matters. Is this process documented somewhere?
Nested containers support is explained here (prerequisites and needed config): https://developers.redhat.com/articles/2024/12/02/enable-nested-containers-openshift-dev-spaces-user-namespaces# Is this something that could be enabled on this cluster at some point in the future, or would there be strong opposition against it?
We can work on alternatives to host these logs so this is not a hard blocker.
Regarding the resource quotas I mentioned above, are these okay? What resource quota limits do you have in place? Is there a process in place to request quota increases?
Just fyi:
deploymentconfig is not completely dropped in 4.16 or 4.17 (although it might be in 4.18?). It's just deprecated, so it warns you about it, but everything is still there and it works.
Would it be possible to allow our namespace to run privileged containers? That would be an alternative for enabling nested containers.
Because it's a shared cluster for multiple CI tenants, we initially decided to not let any tenant run privileged containers, no
I got most of my immediate questions answered except the one about quotas. I am waiting on my team to provide me their ids to be added as project admins. Once I add this list to this ticket I guess we can proceed with the namespace creation if all is fine on your side.
As for ongoing support and feature requests, I assume this issue tracker is the right one to use?
@mhuin : currently we don't enforce any quota, as long as each onboard CI tenant on that cluster is a "good citizen" and so we don't have resources problem. Should that become the case, we'd investigate either expanding resources or implementing quota on specific projects/namespaces
Here is the final list of project admins:
FAS/ACO group created : https://accounts.centos.org/group/ocp-cico-cloud-softwarefactory/ So that automatically created the cloud-softwarefactory ns/project on ocp.
cloud-softwarefactory
Can you confirm that you have correct access ? Also, let us know about the PVs we have to create and we'll then create PVCs that you'll be able to reclaim accordingly
@mhuin : do you still need some PVs for your zuul deployment ? or can I close the ticket (as group and namespace in ocp exist now).
Metadata Update from @arrfab: - Issue priority set to: Waiting on Reporter (was: Needs Review)
FAS/ACO group created : https://accounts.centos.org/group/ocp-cico-cloud-softwarefactory/ So that automatically created the cloud-softwarefactory ns/project on ocp. Can you confirm that you have correct access ? Also, let us know about the PVs we have to create and we'll then create PVCs that you'll be able to reclaim accordingly
I can see the project properly, thanks!
As for PVs, let's start with:
Would that work?
sure but can you give us some metadata so that when we'll create the PVs it will be easier for you to request/consume ? (basically they'll be provided with some claimRef so that your PVC[s] will match with corresponding PVs
claimRef
Assuming the PVC's name and labels are enough, here is a csv list (labels are |-separated):
# name; size; labels; accessModes git-server-git-server-0;1Gi;app=sf|run=git-server; ReadWriteOnce logserver-logserver-0;300GI;app=sf|run=logserver; ReadWriteOnce mariadb-logs-mariadb-0;1Gi:app=sf|run=mariadb; ReadWriteOnce mariadb-mariadb-0;5Gi:app=sf|run=mariadb; ReadWriteOnce nodepool-builder-nodepool-builder-0;300Gi:app=sf|run=nodepool-builder; ReadWriteOnce zookeeper-data-zookeeper-0;5Gi:app=sf|run=zookeeper; ReadWriteOnce zookeeper-logs-zookeeper-0;1Gi:app=sf|run=zookeeper; ReadWriteOnce zuul-merger-zuul-merger-0;30Gi:app=sf|run=zuul-merger; ReadWriteOnce zuul-scheduler-zuul-scheduler-0;1Gi:app=sf|run=zuul-scheduler; ReadWriteOnce
The following PVs were created and so are available from within your cloud-softwarefactory namespace :
1 pv-1gi-cloud-softwarefactory-git-server-4cddf1d8-2a35-5479-8973-35764e588922 1Gi RWO,ROX,RWX Retain Available cloud-softwarefactory/git-server 7m31s 2 pv-1gi-cloud-softwarefactory-mariadb-logs-669c3f5b-21d9-53cb-bb0d-4614c0669e19 1Gi RWO,ROX,RWX Retain Available cloud-softwarefactory/mariadb-logs 5m49s 3 pv-1gi-cloud-softwarefactory-zookeeper-logs-e0b075ef-1125-596b-8630-c223f37af3ee 1Gi RWO,ROX,RWX Retain Available cloud-softwarefactory/zookeeper-logs 116s 4 pv-1gi-cloud-softwarefactory-zuul-scheduler-a341e0e4-27ff-53ea-8de8-9c0331195019 1Gi RWO,ROX,RWX Retain Available cloud-softwarefactory/zuul-scheduler 27s 5 pv-300gi-cloud-softwarefactory-logserver-49cde16f-cad9-51ee-9df9-691d4aa45004 300Gi RWO,ROX,RWX Retain Available cloud-softwarefactory/logserver 6m23s 6 pv-300gi-cloud-softwarefactory-nodepool-builder-4419083e-2b75-5730-8b52-4df8d8ee0a1b 300Gi RWO,ROX,RWX Retain Available cloud-softwarefactory/nodepool-builder 4m25s 7 pv-30gi-cloud-softwarefactory-zuul-merger-6c929ffa-14e9-5022-a0ee-98fbf050e29c 30Gi RWO,ROX,RWX Retain Available cloud-softwarefactory/zuul-merger 75s 8 pv-5gi-cloud-softwarefactory-mariadb-5e462c91-6321-5fd5-9430-fe2410db703b 5Gi RWO,ROX,RWX Retain Available cloud-softwarefactory/mariadb 5m17s 9 pv-5gi-cloud-softwarefactory-zookeeper-data-32821d09-cf2a-5231-bb2b-cfb4998f2026 5Gi RWO,ROX,RWX Retain Available cloud-softwarefactory/zookeeper-data 3m41s
So, worth knowing that each PV was configured to match a corresponding PVC that you'll have to create when deploying your app, and so in the case of the first PV, the corresponding PVC should have git-server as name in the PVC to match the PV, that is waiting to be bound with that claimRef :
git-server
oc get pv/pv-1gi-cloud-softwarefactory-git-server-4cddf1d8-2a35-5479-8973-35764e588922 -o yaml|grep -A 2 claimRef claimRef: name: git-server namespace: cloud-softwarefactory
and so on for the other PVs. I'll let you have a look but it should be all done from a centos infra side that is.
PS : myself on "travel mode" this week due to Fosdem so slower to react on comments on issues tracker
This is an issue for us as our PVCs are created by statefulsets, meaning that they are defined as templates and follow a predetermined naming convention from kubernetes: <VolumeClaimTemplate name>-<statefulset name>-<replica number>.
We know that at least in the beginning, we'll never go over replica values of 1. Could you edit the claimRefs to match the names that were listed above?
hey @mhuin . Had 5 minutes before diving into other meeting today so quickly edited the PVs and updated the claimref so it should match with your deployment expectations :
oc get pv/pv-1gi-cloud-softwarefactory-git-server-4cddf1d8-2a35-5479-8973-35764e588922 -o yaml|grep -A 2 claimRef claimRef: name: git-server-git-server-0 namespace: cloud-softwarefactory
Can you just verify and post feedback ?
No feedback but we think (centos infra side) that it should be ok, so closing this request (but feel free to reopen in case of small needed change)
Metadata Update from @arrfab: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.