#7411 aarch64 osbs cluster production
Closed: Fixed 2 years ago by kevin. Opened 2 years ago by cverna.

  • Describe what you need us to do:

I have finally the OSBS building aarch64 images in stg, so we can move forward an get some machine to deploy a cluster in production.

Link to the original RFR https://pagure.io/fedora-infrastructure/issue/7184

Port 8443 needs to be open with the following hosts:

  • When do you need this? (YYYY/MM/DD)
    When time allows
  • When is this no longer needed or useful? (YYYY/MM/DD)

  • If we cannot complete your request, what is the impact?

Metadata Update from @smooge:
- Issue assigned to smooge

2 years ago

Can you clarify how many nodes we need here? Can we just do one like we did for staging?

Can you clarify how many nodes we need here? Can we just do one like we did for staging?

it does work with one VM being master and node at the same time (this is what we have in stg). But I am not sure how this will scale, If possible I think it would be better with 2 VMs in prod, 1 master and 1 node ? if we can't get that lets go with only 1 VM and see how that performs.

Just for everyone at home to understand the problem. Every OSBS server we bring up drops down the number of aarch64 regular builders we can have on the 23 systems we can use in the moonshot. Those are the second 'slowest' part of the build system so we like to have a lot of them to spread around the load. So we can either have 3 OSBS-nodes or we can have 3 builders.

osbs-aarch64-master has been built. Please see how this works

And it won't because of something in the plays:

TASK [osbs-namespace : query osbs namespace] ********************************************************************************************************************************
Friday 07 December 2018  22:34:17 +0000 (0:00:00.093)       0:25:02.744 *******
fatal: [osbs-aarch64-master01.arm.fedoraproject.org]: FAILED! => {"msg": "The conditional check 'namespace_result.rc != 0 and ('not found' not in namespace_result.stderr)' failed. The error was: error while evaluating conditional (namespace_result.rc != 0 and ('not found' not in namespace_result.stderr)): Unable to look up a name or access an attribute in template string ({% if namespace_result.rc != 0 and ('not found' not in namespace_result.stderr) %} True {% else %} False {% endif %}).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'StrictUndefined' is not iterable"}

PLAY RECAP ******************************************************************************************************************************************************************
osbs-aarch64-master01.arm.fedoraproject.org : ok=163  changed=121  unreachable=0    failed=1   

Thanks @smooge, I am taking it from here.

The Origin container image are not yet available in our registry for aarch64 so they need to be manually pulled on the box for the first deployment

Metadata Update from @cverna:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

So it seems that osbs-control01.phx2.fedoraproject.org cannot ssh to osbs-aarch64-master01.arm.fedoraproject.org. I checked that osbs-aarch64-master01.arm.fedoraproject.org add the correct ssh key in .ssh/authorized_keys. Is this something related to the firewall ?

Full error from the playbook

TASK [Gathering Facts] ********************************************************************************************************************************************************************************************
Wednesday 12 December 2018  14:10:02 +0000 (0:00:00.081)       0:00:01.139 **** 
fatal: [osbs-aarch64-master01.arm.fedoraproject.org]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"osbs-aarch64-master01.arm.fedoraproject.org\". Make sure this host can be reached over ssh", "unreachable": true}

Metadata Update from @cverna:
- Issue status updated to: Open (was: Closed)

2 years ago

Metadata Update from @bowlofeggs:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: request-for-resources

2 years ago

So I tried to manually ssh from osbs-control01.phx2.fp.o to the aarch64 master and I get a Connection refused.

ssh -vv osbs-aarch64-master01.arm.fedoraproject.org
OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 58: Applying options for *
debug2: resolving "osbs-aarch64-master01.arm.fedoraproject.org" port 22
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to osbs-aarch64-master01.arm.fedoraproject.org [] port 22.
debug1: connect to address port 22: Connection refused
ssh: connect to host osbs-aarch64-master01.arm.fedoraproject.org port 22: Connection refused

That makes me thinks that the firewall does not allow ssh between this 2 boxes, since I can successfully ssh to osbs-aarch64-master01.arm.fedoraproject.org through bastion

The network firewall is indeed putting a block here. I will put in a ticket to have this opened.

This should be all done now. @cverna can you please test

Reopen if you hit some issue.


Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.