I tried to set-up basic 3 node OpenShift cluster using playbook playbooks/deploy_cluster.yml from openshift-ansible/openshift-ansible-3.9.41-1 on Fedora 29 Atomic Host latest nightly AMIs (Fedora-AtomicHost-28-20180902.0.x86_64).
It fails during task run [openshift_web_console : Verify that the console is running] with error message "Console install failed."
After doing ssh into master node and runningoc get nodes says:
oc get nodes
$ oc get nodes NAME STATUS ROLES AGE VERSION node1 NotReady master 2h v1.9.1+a0ce1bc657 node2 NotReady compute 2h v1.9.1+a0ce1bc657 node3 NotReady compute 2h v1.9.1+a0ce1bc657
Running kubectl on master node gives following failure message:
$ kubectl describe node node1 ... Ready False Wed, 05 Sep 2018 12:11:39 +0000 Wed, 05 Sep 2018 10:08:02 +0000 KubeletNotReady Failed to start ContainerManager Delegation not available for unit type ...
Following package versions are available on F29 AH nightly: runc-1.0.0-50.dev.git20aff4f.fc29.x86_64 systemd-239-3.fc29.x86_64 docker-1.13.1-62.git9cb56fd.fc29.x86_64
Note: Same set-up works fine with latest F28 Atomic Host
Note:
Note we had the same problem in f28 and the systemd change was reverted in order to allow the upstreams (runc/kube) to catch up.
There are three questions here: - Is runc-1.0.0-50.dev.git20aff4f.fc29 new enough? - Is origin 3.9 new enough? - is origin 3.10 new enough?
runc-1.0.0-50.dev.git20aff4f.fc29
cc @mpatel @gscrivano
runc is new enough. I am not sure about origin version. You would need to go and check if they vendor the libcontainer with the runc patch.
ok so I just looked. origin v3.10.0 and origin v3.11.0-alpha.0 are currently using openshift-runc@da78c1f which does not include the fix mentioned in https://github.com/opencontainers/runc/pull/1776
@sjenning - is there anything we can do to get that fix pulled in?
v3.11 is in code freeze right now so we'd need to wait for a 3.11 z-stream. Is this to enable something that is different on Fedora vs RHEL?
it turns out the release-3.11 branch has a newer commit which includes the fix so we should be good for 3.11 once it is released. I'll confirm this with a recent build of origin 3.11.
@jcajka FYI ^^ maybe we need to jump straight to 3.11 in f29 origin rpm.
FYI for now you can try oc cluster up with: oc cluster up --image='quay.io/openshift/origin-${component}:v3.11'
oc cluster up --image='quay.io/openshift/origin-${component}:v3.11'
@dustymabe awesome I have been pulling my hairs over it for past months. I'm not sure how masochistic I want to be and just stay on 3.10 with backports or just stop caring an rebase to 3.11 or even beyond(as I havnen't even done the container images as upstream have renamed bunch of them, finishing 3.6/3.9 f27/f28 respectively now with old names).
@dustymabe maybe best question is what will you/we be missing if we stay with 3.10(patched) in f29?
Oh, and other issue is if upstream will ever backport the fix and rebuild the images for the 3.10...(I don't plan to switch to fedora built images yet).
Pondering it more I feel that calling 3.10 fubar and moving to the 3.11 will be probably best course of actions. Any comments?
Ran it on F29 AH beta 1.5 compose and it gives error panic: assignment to entry in nil map. Does it need some additional stuff to do?
panic: assignment to entry in nil map
Complete logs are available below:
[fedora@ip-10-0-0-162 ~]$ sudo oc cluster up --image='quay.io/openshift/origin-${component}:v3.11' Getting a Docker client ... Checking if image quay.io/openshift/origin-control-plane:v3.11 is available ... Pulling image quay.io/openshift/origin-control-plane:v3.11 Pulled 1/5 layers, 21% complete Pulled 2/5 layers, 42% complete Pulled 3/5 layers, 72% complete Pulled 4/5 layers, 84% complete Pulled 4/5 layers, 94% complete Pulled 5/5 layers, 100% complete Extracting Image pull complete Pulling image quay.io/openshift/origin-cli:v3.11 Image pull complete Pulling image quay.io/openshift/origin-node:v3.11 Pulled 5/6 layers, 85% complete Pulled 6/6 layers, 100% complete Extracting Image pull complete Checking type of volume mount ... Determining server IP ... Checking if OpenShift is already running ... Checking for supported Docker version (=>1.22) ... Checking if insecured registry is configured properly in Docker ... Checking if required ports are available ... Checking if OpenShift client is configured properly ... Checking if image quay.io/openshift/origin-control-plane:v3.11 is available ... Starting OpenShift using quay.io/openshift/origin-control-plane:v3.11 ... I0924 10:41:54.773567 1487 config.go:42] Running "create-master-config" panic: assignment to entry in nil map goroutine 1 [running]: github.com/openshift/origin/pkg/oc/clusterup/coreinstall/kubeapiserver.KubeAPIServerStartConfig.MakeMasterConfig(0xc000b6c570, 0x2c, 0xc000765e60, 0x6, 0x6, 0x2f32360, 0xc000446480, 0xc000e179b0, 0x2a, 0x0, ...) /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/coreinstall/kubeapiserver/config.go:91 +0xba6 github.com/openshift/origin/pkg/oc/clusterup.(*ClusterUpConfig).makeMasterConfig(0xc000229900, 0xc0005de4e0, 0x1, 0x0, 0x2eca6a0) /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/run_self_hosted.go:365 +0x4dd github.com/openshift/origin/pkg/oc/clusterup.(*ClusterUpConfig).BuildConfig(0xc000229900, 0xc000b6c4e0, 0x2c, 0xd) /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/run_self_hosted.go:278 +0xe9b github.com/openshift/origin/pkg/oc/clusterup.(*ClusterUpConfig).StartSelfHosted(0xc000229900, 0x2eca660, 0xc00000e018, 0x20, 0xc000c9db30) /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/run_self_hosted.go:127 +0x43 github.com/openshift/origin/pkg/oc/clusterup.(*ClusterUpConfig).Start(0xc000229900, 0x2eca660, 0xc00000e018, 0x0, 0x4) /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/up.go:473 +0x128 github.com/openshift/origin/pkg/oc/clusterup.NewCmdUp.func1(0xc000229b80, 0xc000254710, 0x0, 0x1) /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/pkg/oc/clusterup/up.go:112 +0xe7 github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc000229b80, 0xc0002546f0, 0x1, 0x1, 0xc000229b80, 0xc0002546f0) /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:757 +0x2cc github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc0006a1900, 0x17717f1, 0xc00029a200, 0xc0006a1900) /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:843 +0x2fd github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(0xc0006a1900, 0x2, 0xc0006a1900) /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:791 +0x2b main.main() /builddir/build/BUILD/origin-dd10d172758d4d02f6d2e24869234fac6c7841a7/_output/local/go/src/github.com/openshift/origin/cmd/oc/oc.go:42 +0x2e5 [fedora@ip-10-0-0-162 ~]$ echo $? 2
@jcajka Pondering it more I feel that calling 3.10 fubar and moving to the 3.11 will be probably best course of actions. Any comments?
:thumbsup:
@sinnykumari Ran it on F29 AH beta 1.5 compose and it gives error panic: assignment to entry in nil map. Does it need some additional stuff to do?
other than setting up the insecure registry (which it looks like you've already done) I don't think I had to do anything else. The command I gave you is operating against the release branch for 3.11 (i.e. moving target) so it could have changed since I ran it. I'll try today and see if I get the same results
@sinnykumari I'm getting the same with 3.10 binaries and 3.11 images. As this is not supported configuration by upstream. I would guess it is caused by too old binaries compared to the images. I'm atm re-basing(finding what have been changed, split off) rawhide origin to 3.11, please stand by.
+1
@dustymabe @sinnykumari 3.11.alpha1 is building in the koji https://koji.fedoraproject.org/koji/taskinfo?taskID=29901404 now. I would much appreciate if you can test it on f28 and f29/rawhide. I have been doing some preliminary tests and it is not looking good for me.
@jcajka - 3.11.alpha1 isn't new enough :( but the active 3.11 branch is new enough (i.e. we have to wait for the next 3.11 tag to get created).
read carefully my first comment and followup comment that explains this.
@dustymabe I'm building from top of the release branch for the origin other components are following latest tagged releases(or something... there is no uniformity across the openshift repositories...).
@dustymabe So I have re-tested in clean enviroment with the upstream images and everything seems to work. I would like to have it confirm by you before I will move forward with f29.
I'm not sure about what process should be followed for altering the change proposal. I guess I will reopen the FESCO ticket.
how did you test? when I did it before I just ran oc cluster up --image='quay.io/openshift/origin-${component}:v3.11'
Note that when I did this I also pulled the quay.io/openshift/origin-node:v3.11 container first and copied the oc binary out of it and into /usr/local/bin on the host and used that to do the oc cluster up to make sure i got the same version
quay.io/openshift/origin-node:v3.11
/usr/local/bin
oc cluster up
I tested in following ways on a F29 AH machine (ostree version - 29.20180926.n.0) and it seems to work fine. Installed origin-clients from F30 which @jcajka built.
$ sudo rpm-ostree install https://kojipkgs.fedoraproject.org//packages/origin/3.11.0/0.alpha1.0.fc30/x86_64/origin-clients-3.11.0-0.alpha1.0.fc30.x86_64.rpm
Updated insecure registries in /etc/containers/registries.conf to 172.30.0.0/16 and then sudo oc cluster up.
sudo oc cluster up
@dustymabe origin source tar used in origin-clients-3.11.0-0.alpha1.0.fc30 package is from commit 777c966dcc84fba1d32ec928831869ad56fe3e38, which contains opencontainer-runc with fix. The same opencontainer-runc version is included in origin release 3.11 branch
cool, didn't know that. Thanks @sinnykumari. @jcajka can you trust that sinny's testing? I don't know if I'll have time to run through this soon.
@dustymabe I would say so. Thanks for @sinnykumari for testing and mentioning how to test before I have managed to reply.
@jcajka @dustymabe F29 origin-3.11.0-0.alpha1.0.fc29 build works as expected. Added karma for same.
origin-3.11.0-0.alpha1.0.fc29 has been pushed to f29. openshift cluster set-up works as expected with openshift-ansible-3.11.21-1-2-g8d402991for 3.11on latest Fedora 29 Atomic Host (ostree version: 29.20181008.n.0).
Thanks all!
Metadata Update from @sinnykumari: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
thanks @sinnykumari @jcajka !
Log in to comment on this ticket.