#10145 Koji fails to build container due to DNS resolve fail
Closed: Fixed 2 years ago by ogutierrez. Opened 2 years ago by ogutierrez.

Describe what you would like us to do:

Check the container building process.

Original bug: https://bugzilla.redhat.com/show_bug.cgi?id=1988886

The following are two failed builds.

https://koji.fedoraproject.org/koji/taskinfo?taskID=73021255
https://koji.fedoraproject.org/koji/taskinfo?taskID=73255639


When do you need this to be done by?

As soon as you can.


I am not sure how to track this down. Might be a firewall issue on osbs?

Perhaps @mobrien knows more from setting up the aarch64 one?

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

2 years ago

I can reproduce this issue by building the container directly on the node.

However if I change

FROM registry.fedoraproject.org/fedora:35

to

FROM registry.fedoraproject.org/fedora:34

in the Dockerfile the build works correctly.
I think the issue appears to be with the base image. However the build works locally with the f35 base image so this needs a bit more investigation

So, since this is just rawhide... I think this might be https://bugzilla.redhat.com/show_bug.cgi?id=1985499

basically rawhide glibc is using clone3 and rhel7 docker can't deal with it.

@fweimer is there going to be a rhel7 docker update? or some way we can get rhel7 docker to be ok with this?

Which docker binary are we talking about? Has it even been shipped by Red Hat?

# rpm -qi docker-1.13.1-208.git7d71120.el7_9.x86_64
Name        : docker
Epoch       : 2
Version     : 1.13.1
Release     : 208.git7d71120.el7_9
Architecture: x86_64
Install Date: Tue 06 Jul 2021 19:24:36 UTC
Group       : Unspecified
Size        : 66722196
License     : ASL 2.0
Signature   : RSA/SHA256, Mon 07 Jun 2021 09:13:20 UTC, Key ID 199e2f91fd431d51
Source RPM  : docker-1.13.1-208.git7d71120.el7_9.src.rpm
Build Date  : Fri 04 Jun 2021 10:20:48 UTC
Build Host  : x86-040.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor      : Red Hat, Inc.
URL         : https://github.com/docker/docker
Summary     : Automates deployment of containerized applications
Description :
Docker is an open-source engine that automates the deployment of any
application as a lightweight, portable, self-sufficient container that will
run virtually anywhere.

Docker containers can encapsulate any payload, and will run consistently on
and between virtually any server. The same container that a developer builds
and tests on a laptop will run at scale, in production*, on VMs, bare-metal
servers, OpenStack clusters, public instances, or combinations of the above.

Packages come from rhel-7-openshift-3.11-rpms repository from Red Hat. [Edited: hit control-return by accident.] Does this help track down the package set? I can look on our mirror box for more info.

Okay, so it's an internal OCP package. I'm not sure you should be using that for anything else, but then the builder apparently depends on the API of that particular version of docker.

I think we have an internal build that disables seccomp filters, including for docker build.

Are there any plans to move to RHEL8/podman?

Hi there, this problem is still around. We need to build an image for Fedora 36 and update the F35 images but we are unable to build them because of this.

Are there any plans on fixing this?

So, I guess there's not going to be a rhel7 fix. ;(

I think the most likely way forward is to do a docker build ourselves in epel7-infra with the changes needed and update to that.

@fweimer can you (privately via email?) give me a pointer to that internal build? Would it be ok for us to take that and build it in public koji?

Some other general things:

No, we cannot upgrade to rhel8. OSBS is Openshift 3.x only and that does not support rhel8 that I am aware of.

So, does this affect aarch64 also? Our aarch64 OSBS cluster is actually running fedora 33, and moby-engine-19.03.13-1.ce.git4484c46.fc33.aarch64 can someone seeing this confirm a build on aarch64?

I sent a freeze break to update docker on the osbs nodes...

So, does this affect aarch64 also? Our aarch64 OSBS cluster is actually
running fedora 33, and moby-engine-19.03.13-1.ce.git4484c46.fc33.aarch64
can someone seeing this confirm a build on aarch64?

I tried it out just now, and the aarch64 build did complete successfully. It only failed for x86_64.

I sent a freeze break to update docker on the osbs nodes...

Thanks. Is it now a matter of waiting for the freeze break request to go through and land? Can I somehow track it?

ok. Can someone try a build now?

The freeze break was acked and I just (oddly) downgraded docker to the version with the fix. I am assuming it's ok for us to use it as long as we don't distribute it.

Looks like the build did go through! Thanks a lot, Kevin.

Thank you all very much for fixing this. As the main problem is fixed I will close the ticket. Feel free to reopen it if you think there is anything more to be done.

Metadata Update from @ogutierrez:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Today I tried to rebuild the images again and thei started to throw the same exact error.

@kevin Could you take a look and check nothing has changed since yesterday?

Metadata Update from @ogutierrez:
- Issue status updated to: Open (was: Closed)

2 years ago

Nothing should have changed. The same packages are there.

Ah ha. One of the nodes wasn't fixed right.

Please try again.

Thanks for checking Kevin. Seems this is working again :)

Closing!

Metadata Update from @ogutierrez:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Done