Describe what you need us to do: I was trying to check which images we have on the candidate registry using https://candidate-registry.fedoraproject.org/v2/_catalog but it returns
When do you need this? (YYYY/MM/DD) ASAP
When is this no longer needed or useful? (YYYY/MM/DD)
If we cannot complete your request, what is the impact? All OSBS build will fail
Looking at the inventory in ansible, I'm seeing:
docker-candidate-registry01.stg.phx2.fedoraproject.org docker-candidate-registry01.phx2.fedoraproject.org docker-candidate-registry01.stg.phx2.fedoraproject.org
The staging instance is reachable: https://candidate-registry.stg.fedoraproject.org/v2/_catalog and I could access it via ssh, the prod instance seems unreachable via ssh
This might be related with @codeblock work to deploy the new registry in production. I think there were renamed oci-candidate-registry01.phx2.fedoraproject.org (looking at the commit history of the ansible repo)
https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=171c5c1054d81c0af92ca9b6d1ac804e85cf5353
Hm, there is a playbooks/groups/releng-compose.yml playbook that seems to do something related with the candidate-registry, but it contains hosts: releng-compose:releng-stg neither of these groups include this host... :(
playbooks/groups/releng-compose.yml
hosts: releng-compose:releng-stg
I believe this should fix it --> https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=1a0009d700ae29188bbf1152a282857305424710
So here is what I tried:
I ran the playbooks/groups/oci-registry.yml a few times and had a few issues with it:
playbooks/groups/oci-registry.yml
TASK [gluster/consolidated : Configure Gluster volume.] ************************************************************** Wednesday 22 August 2018 09:39:02 +0000 (0:00:00.220) 0:08:29.999 ****** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: None fatal: [docker-registry01.stg.phx2.fedoraproject.org]: FAILED! => {"changed": false, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume add-brick registry docker-registry01.stg.phx2.fedoraproject.org:/srv/glusterfs/ docker-registry02.stg.phx2.fedoraproject.org:/srv/glusterfs/ force) command (rc=1): volume add-brick: failed: Brick: docker-registry01.stg.phx2.fedoraproject.org:/srv/glusterfs not available. Brick may be containing or be contained by an existing brick.\n"} ... TASK [gluster/consolidated : Configure Gluster volume.] ************************************************************** Wednesday 22 August 2018 09:39:10 +0000 (0:00:00.179) 0:08:38.442 ****** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: None fatal: [oci-registry02.phx2.fedoraproject.org]: FAILED! => {"changed": false, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume add-brick registry oci-registry01.phx2.fedoraproject.org:/srv/glusterfs/ oci-registry02.phx2.fedoraproject.org:/srv/glusterfs/ force) command (rc=1): volume add-brick: failed: Brick: oci-registry02.phx2.fedoraproject.org:/srv/glusterfs not available. Brick may be containing or be contained by an existing brick.\n"}
It seems the ssh fingerprint kept on changing:
TASK [basessh : make sure there is no old ssh host key for the host still around] .... changed: [oci-registry02.phx2.fedoraproject.org -> localhost] => (item=/root/.ssh/known_hosts)
(^ Happened at every run)
The authenticity of host 'oci-registry02.phx2.fedoraproject.org (xxxxx)' can't be established. RSA key fingerprint is ... Are you sure you want to continue connecting (yes/no)?
Host became unreachable:
fatal: [oci-registry02.phx2.fedoraproject.org]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"oci-registry02.phx2.fedoraproject.org\". Make sure this host can be reached over ssh", "unreachable": true}
End of the run:
oci-registry01.phx2.fedoraproject.org : ok=1 changed=0 unreachable=1 failed=0 oci-registry02.phx2.fedoraproject.org : ok=123 changed=3 unreachable=1 failed=0
To push out @cverna's patch, I ran the playbook:
playbooks/groups/proxies.yml -t haproxy
Which finished fine.
Except that now both of these URLs are unreachable :( - https://candidate-registry.stg.fedoraproject.org/v2/_catalog - https://candidate-registry.fedoraproject.org/v2/_catalog
I'm considering either reverting @cverna's patch to see if that fixes the stg host or just wait for someone more qualified than me to help sort this out.
Sorry I couldn't help further, I hope I didn't do too much of a mess :(
@kevin fixes this.
There was one issue with the openvpn certificate: https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=d84e1df and another in the haproxy since stg has not had the rename that prod did: https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=450230a
Thanks @kevin :)
Metadata Update from @pingou: - Issue close_status updated to: Fixed
Thanks @kevin and @pingou :dragon:
Login to comment on this ticket.