#510 Openshift Origin 3.9 fails with error message "Console install failed" on F29 Atomic Host nightly
Closed: Fixed 6 years ago Opened 6 years ago by sinnykumari.

I tried to set-up basic 3 node OpenShift cluster using playbook playbooks/deploy_cluster.yml from openshift-ansible/openshift-ansible-3.9.41-1 on Fedora 29 Atomic Host latest nightly AMIs (Fedora-AtomicHost-28-20180902.0.x86_64).

It fails during task run [openshift_web_console : Verify that the console is running] with error message "Console install failed."

After doing ssh into master node and runningoc get nodes says:

$ oc get nodes
NAME         STATUS     ROLES     AGE       VERSION
node1   NotReady   master    2h        v1.9.1+a0ce1bc657
node2   NotReady   compute   2h        v1.9.1+a0ce1bc657
node3    NotReady   compute   2h        v1.9.1+a0ce1bc657

Running kubectl on master node gives following failure message:

$ kubectl describe node node1
...
Ready            False   Wed, 05 Sep 2018 12:11:39 +0000   Wed, 05 Sep 2018 10:08:02 +0000   KubeletNotReady              Failed to start ContainerManager Delegation not available for unit type
...

Following package versions are available on F29 AH nightly:
runc-1.0.0-50.dev.git20aff4f.fc29.x86_64
systemd-239-3.fc29.x86_64
docker-1.13.1-62.git9cb56fd.fc29.x86_64

Note: Same set-up works fine with latest F28 Atomic Host


Note we had the same problem in f28 and the systemd change was reverted in order to allow the upstreams (runc/kube) to catch up.

There are three questions here:
- Is runc-1.0.0-50.dev.git20aff4f.fc29 new enough?
- Is origin 3.9 new enough?
- is origin 3.10 new enough?

cc @mpatel @gscrivano

runc is new enough. I am not sure about origin version. You would need to go and check if they vendor the libcontainer with the runc patch.

ok so I just looked. origin v3.10.0 and origin v3.11.0-alpha.0 are currently using openshift-runc@da78c1f which does not include the fix mentioned in https://github.com/opencontainers/runc/pull/1776

@sjenning - is there anything we can do to get that fix pulled in?

v3.11 is in code freeze right now so we'd need to wait for a 3.11 z-stream. Is this to enable something that is different on Fedora vs RHEL?

it turns out the release-3.11 branch has a newer commit which includes the fix so we should be good for 3.11 once it is released. I'll confirm this with a recent build of origin 3.11.

@jcajka FYI ^^ maybe we need to jump straight to 3.11 in f29 origin rpm.

FYI for now you can try oc cluster up with: oc cluster up --image='quay.io/openshift/origin-${component}:v3.11'

@dustymabe awesome I have been pulling my hairs over it for past months. I'm not sure how masochistic I want to be and just stay on 3.10 with backports or just stop caring an rebase to 3.11 or even beyond(as I havnen't even done the container images as upstream have renamed bunch of them, finishing 3.6/3.9 f27/f28 respectively now with old names).

@dustymabe maybe best question is what will you/we be missing if we stay with 3.10(patched) in f29?

Oh, and other issue is if upstream will ever backport the fix and rebuild the images for the 3.10...(I don't plan to switch to fedora built images yet).

Pondering it more I feel that calling 3.10 fubar and moving to the 3.11 will be probably best course of actions. Any comments?

FYI for now you can try oc cluster up with: oc cluster up --image='quay.io/openshift/origin-${component}:v3.11'

Ran it on F29 AH beta 1.5 compose and it gives error panic: assignment to entry in nil map. Does it need some additional stuff to do?

Complete logs are available below:

[fedora@ip-10-0-0-162 ~]$ sudo oc cluster up --image='quay.io/openshift/origin-${component}:v3.11'
Getting a Docker client ...
Checking if image quay.io/openshift/origin-control-plane:v3.11 is available ...
Pulling image quay.io/openshift/origin-control-plane:v3.11
Pulled 1/5 layers, 21% complete
Pulled 2/5 layers, 42% complete
Pulled 3/5 layers, 72% complete
Pulled 4/5 layers, 84% complete
Pulled 4/5 layers, 94% complete
Pulled 5/5 layers, 100% complete
Extracting
Image pull complete
Pulling image quay.io/openshift/origin-cli:v3.11
Image pull complete
Pulling image quay.io/openshift/origin-node:v3.11
Pulled 5/6 layers, 85% complete
Pulled 6/6 layers, 100% complete
Extracting
Image pull complete
Checking type of volume mount ...
Determining server IP ...
Checking if OpenShift is already running ...
Checking for supported Docker version (=>1.22) ...
Checking if insecured registry is configured properly in Docker ...
Checking if required ports are available ...
Checking if OpenShift client is configured properly ...
Checking if image quay.io/openshift/origin-control-plane:v3.11 is available ...
Starting OpenShift using quay.io/openshift/origin-control-plane:v3.11 ...
I0924 10:41:54.773567    1487 config.go:42] Running "create-master-config"
panic: assignment to entry in nil map

goroutine 1 [running]:
github.com/openshift/origin/pkg/oc/clusterup/coreinstall/kubeapiserver.KubeAPIServerStartConfig.MakeMasterConfig(0xc000b6c570, 0x2c, 0xc000765e60, 0x6, 0x6, 0x2f32360, 0xc000446480, 0xc000e179b0, 0x2a, 0x0, ...)
        /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/coreinstall/kubeapiserver/config.go:91 +0xba6
github.com/openshift/origin/pkg/oc/clusterup.(*ClusterUpConfig).makeMasterConfig(0xc000229900, 0xc0005de4e0, 0x1, 0x0, 0x2eca6a0)
        /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/run_self_hosted.go:365 +0x4dd
github.com/openshift/origin/pkg/oc/clusterup.(*ClusterUpConfig).BuildConfig(0xc000229900, 0xc000b6c4e0, 0x2c, 0xd)
        /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/run_self_hosted.go:278 +0xe9b
github.com/openshift/origin/pkg/oc/clusterup.(*ClusterUpConfig).StartSelfHosted(0xc000229900, 0x2eca660, 0xc00000e018, 0x20, 0xc000c9db30)
        /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/run_self_hosted.go:127 +0x43
github.com/openshift/origin/pkg/oc/clusterup.(*ClusterUpConfig).Start(0xc000229900, 0x2eca660, 0xc00000e018, 0x0, 0x4)
        /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/up.go:473 +0x128
github.com/openshift/origin/pkg/oc/clusterup.NewCmdUp.func1(0xc000229b80, 0xc000254710, 0x0, 0x1)
        /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/up.go:112 +0xe7
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc000229b80, 0xc0002546f0, 0x1, 0x1, 0xc000229b80, 0xc0002546f0)
        /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:757 +0x2cc
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc0006a1900, 0x17717f1, 0xc00029a200, 0xc0006a1900)
        /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:843 +0x2fd
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(0xc0006a1900, 0x2, 0xc0006a1900)
        /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:791 +0x2b
main.main()
        /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/cmd/oc/oc.go:42 +0x2e5
[fedora@ip-10-0-0-162 ~]$ echo $?
2

@jcajka
Pondering it more I feel that calling 3.10 fubar and moving to the 3.11 will be probably best course of actions. Any comments?

:thumbsup:

@sinnykumari
Ran it on F29 AH beta 1.5 compose and it gives error panic: assignment to entry in nil map. Does it need some additional stuff to do?

other than setting up the insecure registry (which it looks like you've already done) I don't think I had to do anything else. The command I gave you is operating against the release branch for 3.11 (i.e. moving target) so it could have changed since I ran it. I'll try today and see if I get the same results

@sinnykumari I'm getting the same with 3.10 binaries and 3.11 images. As this is not supported configuration by upstream. I would guess it is caused by too old binaries compared to the images. I'm atm re-basing(finding what have been changed, split off) rawhide origin to 3.11, please stand by.

@sinnykumari I'm getting the same with 3.10 binaries and 3.11 images. As this is not supported configuration by upstream. I would guess it is caused by too old binaries compared to the images. I'm atm re-basing(finding what have been changed, split off) rawhide origin to 3.11, please stand by.

+1

@dustymabe @sinnykumari 3.11.alpha1 is building in the koji https://koji.fedoraproject.org/koji/taskinfo?taskID=29901404 now. I would much appreciate if you can test it on f28 and f29/rawhide. I have been doing some preliminary tests and it is not looking good for me.

@jcajka - 3.11.alpha1 isn't new enough :( but the active 3.11 branch is new enough (i.e. we have to wait for the next 3.11 tag to get created).

read carefully my first comment and followup comment that explains this.

@dustymabe I'm building from top of the release branch for the origin other components are following latest tagged releases(or something... there is no uniformity across the openshift repositories...).

@dustymabe So I have re-tested in clean enviroment with the upstream images and everything seems to work. I would like to have it confirm by you before I will move forward with f29.

I'm not sure about what process should be followed for altering the change proposal. I guess I will reopen the FESCO ticket.

@dustymabe So I have re-tested in clean enviroment with the upstream images and everything seems to work. I would like to have it confirm by you before I will move forward with f29.

how did you test? when I did it before I just ran oc cluster up --image='quay.io/openshift/origin-${component}:v3.11'

how did you test? when I did it before I just ran oc cluster up --image='quay.io/openshift/origin-${component}:v3.11'

Note that when I did this I also pulled the quay.io/openshift/origin-node:v3.11 container first and copied the oc binary out of it and into /usr/local/bin on the host and used that to do the oc cluster up to make sure i got the same version

I tested in following ways on a F29 AH machine (ostree version - 29.20180926.n.0) and it seems to work fine.
Installed origin-clients from F30 which @jcajka built.

$ sudo rpm-ostree install https://kojipkgs.fedoraproject.org//packages/origin/3.11.0/0.alpha1.0.fc30/x86_64/origin-clients-3.11.0-0.alpha1.0.fc30.x86_64.rpm

Updated insecure registries in /etc/containers/registries.conf to 172.30.0.0/16 and then sudo oc cluster up.

@dustymabe origin source tar used in origin-clients-3.11.0-0.alpha1.0.fc30 package is from commit 777c966dcc84fba1d32ec928831869ad56fe3e38, which contains opencontainer-runc with fix. The same opencontainer-runc version is included in origin release 3.11 branch

@dustymabe origin source tar used in origin-clients-3.11.0-0.alpha1.0.fc30 package is from commit 777c966dcc84fba1d32ec928831869ad56fe3e38, which contains opencontainer-runc with fix. The same opencontainer-runc version is included in origin release 3.11 branch

cool, didn't know that. Thanks @sinnykumari. @jcajka can you trust that sinny's testing? I don't know if I'll have time to run through this soon.

@dustymabe I would say so. Thanks for @sinnykumari for testing and mentioning how to test before I have managed to reply.

@jcajka @dustymabe F29 origin-3.11.0-0.alpha1.0.fc29 build works as expected. Added karma for same.

origin-3.11.0-0.alpha1.0.fc29 has been pushed to f29.
openshift cluster set-up works as expected with openshift-ansible-3.11.21-1-2-g8d402991for 3.11on latest Fedora 29 Atomic Host (ostree version: 29.20181008.n.0).

Thanks all!

Metadata Update from @sinnykumari:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

6 years ago

Log in to comment on this ticket.

Metadata