fedora-infra / ansible

#574 Fix solr crash (packages-static)

Merged 2 years ago by mymindstorm. Opened 2 years ago by mymindstorm.

fedora-infra/ mymindstorm/ansible solr-fix into main

Brendan Early • 2 years ago

a61c121

roles/openshift-apps/solr/templates/deploymentconfig.yml

file modified

-1

		`@@ -26,7 +26,6 @@`
		`- solr-precreate`
		`args:`
		`- packages`
		`- - /opt/solr/server/solr/configsets/packages`
		`ports:`
		`- containerPort: 8983`
		`resources: {}`

no initial comment

zuul commented 2 years ago

Build succeeded.

fi-ansible--ansible-review-diff : SUCCESS in 2m 12s

mymindstorm commented 2 years ago

@kevin I don't seem to have permissions to merge PRs.

kevin commented 2 years ago

Try again now? sorry about that.

You may have to logout and back on to see the new group...

rebased onto a61c121

2 years ago

rebased onto a61c121

2 years ago

Pull-Request has been merged by mymindstorm

2 years ago

zuul commented 2 years ago

Build succeeded.

fi-ansible--ansible-review-diff : SUCCESS in 2m 46s

mymindstorm commented 2 years ago

Thanks, Solr is alive now. I am unable to get my image builds (fedora-packages-static-build) to complete. It to worked a few times when it was initially created, and the build config has not changed at all since it broke. It is unable to reach pagure for some reason. Do have any suggestions on how to troubleshoot that?

Edited 2 years ago by mymindstorm

kevin commented 2 years ago

Cloning "https://pagure.io/fedora-packages-static.git" ...
WARNING: timed out waiting for git server, will wait 1m4s
WARNING: timed out waiting for git server, will wait 4m16s
error: fatal: unable to access 'https://pagure.io/fedora-packages-static.git/': Failed to connect to 2620:52:3:1:dead:beef:cafe:fed8: Network is unreachable

looks like it's trying to use ipv6... and our iad2 datacenter doesn't have ipv6 enabled. :)

I'm not sure why it is, but might as a brute force workaround 'git clone -4 ...' ?

mymindstorm commented 2 years ago

I'm pretty sure that I can't change that because it's an openshift build config: https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/fedora-packages-static/templates/buildconfig.yml

It can't even ipv4 with http/https:

Cloning "http://8.43.85.76/fedora-packages-static.git " ...
WARNING: timed out waiting for git server, will wait 1m4s
WARNING: timed out waiting for git server, will wait 4m16s
error: fatal: unable to access 'http://8.43.85.76/fedora-packages-static.git/':  Failed connect to 8.43.85.76:80; Connection timed out

Could you destroy the project again? I don't seem to be able to do so.

kevin commented 2 years ago

There is a way you can. ;)

Look at: playbooks/openshift-apps/coreos-ostree-importer.yml for an example:

actions to delete the project from OpenShift

to run: sudo rbac-playbook -l os_masters_stg[0] -t delete openshift-apps/coreos-ostree-importer.yml

role: openshift/object-delete
app: coreos-ostree-importer
objecttype: project
objectname: coreos-ostree-importer
tags: [ never, delete ]

Just add that to your playbook and call it with -t delete. :)

mymindstorm commented 2 years ago

That works, but it has ruined the PV allocations. :(
(fedora-packages-static-db-storage-stg, fedora-packages-static-storage-stg, and solr-storage-stg)

It seems that oc adm pod-network join-projects --to=solr fedora-packages-static is what caused the networking issues. I'm going to try using an ip whitelist on the route instead. Is there a variable in ansible with iad2/openshift's ip block? I can't seem to find one.

Edited 2 years ago by mymindstorm

kevin commented 2 years ago

That works, but it has ruined the PV allocations. :(
(fedora-packages-static-db-storage-stg, fedora-packages-static-storage-stg, and solr-storage-stg)

Yeah, anytime you drop a pvc, they go into 'Reclaim' and need an admin to delete/readd them. ;(
I've now done this and they are back on track. ;)

Don't know of any variable there. ;(

mymindstorm commented 2 years ago

Yeah, anytime you drop a pvc, they go into 'Reclaim' and need an admin to delete/readd them. ;(
I've now done this and they are back on track. ;)

fedora-packages-static-storage-stg has not been reclaimed.

I've moved solr into the same project as packages-static to avoid all of this overcomplicated networking config the cluster has. So, solr-storage-stg has moved and needs to be reclaimed also. Additionally, it seems that because I re-created the project, the uid openshift was using for all the PV files changed. I've added securityContext: supplementalGroups to try to avoid this, but if you could destroy all of the incorrect files currently in solr-storage-stg, that would be appreciated.

kevin commented 2 years ago

Done.

mymindstorm commented 2 years ago

fedora-packages-static-storage-stg is still showing as pending.

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fedora-packages-static-storage{{ '-stg' if env == 'staging' else '' }}
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
  storageClassName: ""

kevin commented 2 years ago

Oops. I must not have recreated that one the other day. Done.

mymindstorm commented 2 years ago

So, it is running successfully in openshift! The health check is green and I can curl the service from the container itself. The only issue is that https://packages.stg.fedoraproject.org does not connect properly. The container is listening on port 8080, does this route look correct?

  - role: openshift/route
    app: fedora-packages-static
    routename: fedora-packages-static
    host: "packages{{ env_suffix }}.fedoraproject.org"
    serviceport: 8080-tcp
    servicename: fedora-packages-static

kind: Service
metadata:
  labels:
    app: fedora-packages-static
  name: fedora-packages-static
spec:
  ports:
  - name: 8080-tcp
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: fedora-packages-static
    deploymentconfig: fedora-packages-static

kevin commented 2 years ago

That looks fine, but our openshift is internal... so you need to tell proxies to proxy that in.

Take a look at playbooks/includes/prox* you need a website and a reverseproxy I think here. Also, we will want to get a ssl cert for it.

I can work up a PR on it sometime, or you can take a stab?

Metadata

Assignee

None

Tags

No Tags

Changes Summary 1

+0 -1

file changed

roles/openshift-apps/solr/templates/deploymentconfig.yml

fedora-infra / ansible

Source Code

#574 Fix solr crash (packages-static) Merged 2 years ago by mymindstorm. Opened 2 years ago by mymindstorm. fedora-infra/ mymindstorm/ansible solr-fix into main

actions to delete the project from OpenShift

to run: sudo rbac-playbook -l os_masters_stg[0] -t delete openshift-apps/coreos-ostree-importer.yml

Metadata

Changes Summary 1

#574 Fix solr crash (packages-static)

Merged 2 years ago by mymindstorm. Opened 2 years ago by mymindstorm.

fedora-infra/ mymindstorm/ansible solr-fix into main