#574 Fix solr crash (packages-static)
Merged 2 years ago by mymindstorm. Opened 2 years ago by mymindstorm.
fedora-infra/ mymindstorm/ansible solr-fix  into  main

@@ -26,7 +26,6 @@ 

          - solr-precreate

          args:

          - packages

-         - /opt/solr/server/solr/configsets/packages

          ports:

          - containerPort: 8983

          resources: {}

no initial comment

@kevin I don't seem to have permissions to merge PRs.

Try again now? sorry about that.

You may have to logout and back on to see the new group...

rebased onto a61c121

2 years ago

rebased onto a61c121

2 years ago

Pull-Request has been merged by mymindstorm

2 years ago

Thanks, Solr is alive now. I am unable to get my image builds (fedora-packages-static-build) to complete. It to worked a few times when it was initially created, and the build config has not changed at all since it broke. It is unable to reach pagure for some reason. Do have any suggestions on how to troubleshoot that?

Cloning "https://pagure.io/fedora-packages-static.git" ...
WARNING: timed out waiting for git server, will wait 1m4s
WARNING: timed out waiting for git server, will wait 4m16s
error: fatal: unable to access 'https://pagure.io/fedora-packages-static.git/': Failed to connect to 2620:52:3:1:dead:beef:cafe:fed8: Network is unreachable

looks like it's trying to use ipv6... and our iad2 datacenter doesn't have ipv6 enabled. :)

I'm not sure why it is, but might as a brute force workaround 'git clone -4 ...' ?

I'm pretty sure that I can't change that because it's an openshift build config: https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/fedora-packages-static/templates/buildconfig.yml

It can't even ipv4 with http/https:

Cloning "http://8.43.85.76/fedora-packages-static.git " ...
WARNING: timed out waiting for git server, will wait 1m4s
WARNING: timed out waiting for git server, will wait 4m16s
error: fatal: unable to access 'http://8.43.85.76/fedora-packages-static.git/':  Failed connect to 8.43.85.76:80; Connection timed out

Could you destroy the project again? I don't seem to be able to do so.

There is a way you can. ;)

Look at: playbooks/openshift-apps/coreos-ostree-importer.yml for an example:

actions to delete the project from OpenShift

to run: sudo rbac-playbook -l os_masters_stg[0] -t delete openshift-apps/coreos-ostree-importer.yml

  • role: openshift/object-delete
    app: coreos-ostree-importer
    objecttype: project
    objectname: coreos-ostree-importer
    tags: [ never, delete ]

Just add that to your playbook and call it with -t delete. :)

That works, but it has ruined the PV allocations. :(
(fedora-packages-static-db-storage-stg, fedora-packages-static-storage-stg, and solr-storage-stg)

It seems that oc adm pod-network join-projects --to=solr fedora-packages-static is what caused the networking issues. I'm going to try using an ip whitelist on the route instead. Is there a variable in ansible with iad2/openshift's ip block? I can't seem to find one.

That works, but it has ruined the PV allocations. :(
(fedora-packages-static-db-storage-stg, fedora-packages-static-storage-stg, and solr-storage-stg)

Yeah, anytime you drop a pvc, they go into 'Reclaim' and need an admin to delete/readd them. ;(
I've now done this and they are back on track. ;)

Don't know of any variable there. ;(

Yeah, anytime you drop a pvc, they go into 'Reclaim' and need an admin to delete/readd them. ;(
I've now done this and they are back on track. ;)

fedora-packages-static-storage-stg has not been reclaimed.

I've moved solr into the same project as packages-static to avoid all of this overcomplicated networking config the cluster has. So, solr-storage-stg has moved and needs to be reclaimed also. Additionally, it seems that because I re-created the project, the uid openshift was using for all the PV files changed. I've added securityContext: supplementalGroups to try to avoid this, but if you could destroy all of the incorrect files currently in solr-storage-stg, that would be appreciated.

fedora-packages-static-storage-stg is still showing as pending.

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fedora-packages-static-storage{{ '-stg' if env == 'staging' else '' }}
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
  storageClassName: ""

Oops. I must not have recreated that one the other day. Done.

So, it is running successfully in openshift! The health check is green and I can curl the service from the container itself. The only issue is that https://packages.stg.fedoraproject.org does not connect properly. The container is listening on port 8080, does this route look correct?

  - role: openshift/route
    app: fedora-packages-static
    routename: fedora-packages-static
    host: "packages{{ env_suffix }}.fedoraproject.org"
    serviceport: 8080-tcp
    servicename: fedora-packages-static
kind: Service
metadata:
  labels:
    app: fedora-packages-static
  name: fedora-packages-static
spec:
  ports:
  - name: 8080-tcp
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: fedora-packages-static
    deploymentconfig: fedora-packages-static

That looks fine, but our openshift is internal... so you need to tell proxies to proxy that in.

Take a look at playbooks/includes/prox* you need a website and a reverseproxy I think here. Also, we will want to get a ssl cert for it.

I can work up a PR on it sometime, or you can take a stab?

Metadata