I have been trying to get a local install of Openshift Origin 3.9 going on a fresh Fedora Atomic 28 server install and the openshift-ansible installer fails because none of the nodes get put into the Ready state.
# oc get nodes NAME STATUS ROLES AGE VERSION atomic28 NotReady master 12h v1.9.1+a0ce1bc657
Doing oc describe nodes shows this error:
oc describe nodes
# oc describe nodes *snip* Warning KubeletSetupFailed 20m kubelet, apex.example.net Failed to start ContainerManager Delegation not available for unit type *snip*
Which lead me to this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1558425 which seems to think this issue is due to the recent changes in systemd 238 that removes can_delegate from slices.
Is there an easy way to downgrade just systemd on Atomic to test to see if that fixes the issue?
Grab an older version of the systemd RPMs then try rpm-ostree override replace <systemd1.rpm> <systemd2.rpm> ...
systemd
rpm-ostree override replace <systemd1.rpm> <systemd2.rpm> ...
I've also hit this issue, but haven't found an older version of systemd that I've been able to use override replace with and then successfully boot up the host. For example, I used this version of systemd-237
override replace
systemd-237
https://kojipkgs.fedoraproject.org//packages/systemd/237/6.git84c8da5.fc28/x86_64/systemd-container-237-6.git84c8da5.fc28.x86_64.rpm
...and it appears that systemd-tmpfiles-setup fails to start and that sends the whole system into a tizzy.
systemd-tmpfiles-setup
# journalctl -b -u systemd-tmpfiles-setup --no-pager -- Logs begin at Tue 2018-04-10 20:26:09 UTC, end at Tue 2018-04-10 20:30:50 UTC. -- Apr 10 20:26:09 localhost systemd-tmpfiles[200]: [/usr/lib/tmpfiles.d/rpcbind.conf:2] Unknown user 'rpc'. Apr 10 20:26:09 localhost systemd-tmpfiles[200]: [/usr/lib/tmpfiles.d/systemd.conf:11] Unknown group 'utmp'. Apr 10 20:26:09 localhost systemd-tmpfiles[200]: [/usr/lib/tmpfiles.d/systemd.conf:19] Unknown user 'systemd-network'. Apr 10 20:26:09 localhost systemd-tmpfiles[200]: [/usr/lib/tmpfiles.d/systemd.conf:20] Unknown user 'systemd-network'. Apr 10 20:26:09 localhost systemd-tmpfiles[200]: [/usr/lib/tmpfiles.d/systemd.conf:21] Unknown user 'systemd-network'. Apr 10 20:26:09 localhost systemd-tmpfiles[200]: [/usr/lib/tmpfiles.d/systemd.conf:25] Unknown group 'systemd-journal'. Apr 10 20:26:09 localhost systemd-tmpfiles[200]: [/usr/lib/tmpfiles.d/systemd.conf:26] Unknown group 'systemd-journal'. Apr 10 20:26:09 localhost systemd-tmpfiles[200]: [/usr/lib/tmpfiles.d/systemd.conf:32] Unknown group 'systemd-journal'. Apr 10 20:26:09 localhost systemd-tmpfiles[200]: [/usr/lib/tmpfiles.d/systemd.conf:33] Unknown group 'systemd-journal'. Apr 10 20:26:09 localhost systemd-tmpfiles[200]: [/usr/lib/tmpfiles.d/systemd.conf:34] Unknown group 'systemd-journal'. Apr 10 20:26:09 localhost systemd[1]: Started Create Volatile Files and Directories. Apr 10 20:26:11 localhost systemd[1]: Stopped Create Volatile Files and Directories. Apr 10 20:26:12 atomichost-by-dustymabe systemd[1]: Starting Create Volatile Files and Directories... Apr 10 20:26:12 atomichost-by-dustymabe systemd-tmpfiles[710]: "/home" already exists and is not a directory. Apr 10 20:26:12 atomichost-by-dustymabe systemd-tmpfiles[710]: "/srv" already exists and is not a directory. Apr 10 20:26:12 atomichost-by-dustymabe systemd-tmpfiles[710]: "/tmp" already exists and is not a directory. Apr 10 20:26:12 atomichost-by-dustymabe systemd-tmpfiles[710]: Unable to fix SELinux security context of /tmp/.X11-unix: Read-only file system Apr 10 20:26:12 atomichost-by-dustymabe systemd-tmpfiles[710]: Unable to fix SELinux security context of /tmp/.ICE-unix: Read-only file system Apr 10 20:26:12 atomichost-by-dustymabe systemd-tmpfiles[710]: Unable to fix SELinux security context of /tmp/.font-unix: Read-only file system Apr 10 20:26:12 atomichost-by-dustymabe systemd[1]: systemd-tmpfiles-setup.service: Main process exited, code=exited, status=1/FAILURE Apr 10 20:26:12 atomichost-by-dustymabe systemd[1]: systemd-tmpfiles-setup.service: Failed with result 'exit-code'. Apr 10 20:26:12 atomichost-by-dustymabe systemd[1]: Failed to start Create Volatile Files and Directories.
Following the BZ to the various upstream issues, it looks like this might be fixed with a change to runc
runc
See the following PR - https://github.com/opencontainers/runc/pull/1776
Specifically this comment - https://github.com/opencontainers/runc/pull/1776#issuecomment-380206972
I hit this w/ kube, and did this workaround to get past it for now:
# cp /usr/lib/systemd/system/docker.service /etc/systemd/system/ # sed -i 's/cgroupdriver=systemd/cgroupdriver=cgroupfs/' /etc/systemd/system/docker.service # sed -i 's/cgroup-driver=systemd/cgroup-driver=cgroupfs/' /etc/systemd/system/kubelet.service.d/kubeadm.conf # systemctl daemon-reload # systemctl restart docker
Metadata Update from @miabbott: - Issue tagged with: bug, meeting
Metadata Update from @walters: - Issue assigned to walters
I don't know the reasoning for the systemd vs cgroupfs driver issue, but here's some discussion from when CoreOS switched from systemd to cgroupfs: https://github.com/coreos/bugs/issues/1435 -- is the dependency on systemd here important to us?
is the dependency on systemd here important to us?
Mmm. I don't think we should change architecture in response to a bug, at least not immediately. I personally like the idea of integrating with systemd but it's a very complex topic.
Anyways so that patch doesn't apply to the runc vendored in docker, but it WFM if I bind mount it over, and restart docker. So we could update the runc vendored in our docker and try that?
So we could update the runc vendored in our docker and try that?
Yes please. I've asked @lsm5 to do so, unless someone else is able to
A scratch builds of docker package to workaround this issue:
f28: https://koji.fedoraproject.org/koji/taskinfo?taskID=26334563
Please test it out and report if things work for you.
actually let's test the bodhi update instead with docker-1.13.1-52.git89b0e65.fc28
It looks like this needs to get solved in docker and their vendored in version of runc as well as in kubernetes
docker
kubernetes
@runcom pointed at this PR that looks like would be part of the kube fix - https://github.com/kubernetes/kubernetes/pull/61926
should we consider applying the patches from jason's earlier comment directly to the ostree for f28? It seems like the kube changes are going to take a while to apply
Agreed, @dustymabe. I think applying the workaround would make sense.
Let's consider it, but also realize that we don't have any testing/exposure to using the cgroupfs driver in the Fedora ecosystem (as far as I know).
cgroupfs
There are plenty of places upstream where it is used (and preferred!) over the systemd driver, so it's not a huge risk, in my opinion. But we should be cognizant of the possibility of problems.
Also, if we include the workaround in F28 and we get the fixes we want, what is the process/impact of reverting back to the systemd driver?
There's also the option of reverting the systemd commit.
Can you explore that option? I honestly don't know how big of a change this was or how much of a big deal it would be to revert it now. It would certainly help us support kube/openshift in the short term if we reverted and had this land at a later time.
This seems to be all based on a misunderstanding — systemd never allowed (*) other entities to muck around with parts of the cgroup hierarchy that it manages, including slice units. The difference with systemd-238 is that it's slightly clearer about this, and e.g. setting Delegate before would be silently ignored before and results in an error now. See https://github.com/systemd/systemd/issues/8645 for another discussion of this.
In particular, .slice units would be create "on demand", i.e. when another .service/.slice/whatever nested unit was requested, the parent .slice would be created, and destroyed when systemd thinks it's not needed any more.
(*) "allow" needs a clarification: there is no enforcement of this, because a user space process cannot prevent another privileged process from changing the cgroup hierarchy. So "allow"/"disallow" here is at the level of "please don't do this" or "you get to keep the pieces".
This could be discussed as an option to do this as a hack to get things to work temporarily, but it's not a long term solution.
Yes! I think we all agree we don't won't to "revert" the change for the long term. We just want to revert for now until other upstream projects (kube/runc/openshift) have had a chance to get in fixes to work with the new change.
So the options I see are:
@zbyszek WDYT?
Can somebody explain what runc does with the cgroup hierarchy of a slice on which it has set Delegate=yes?
@zbyszek there is an upstream fix for it https://github.com/opencontainers/runc/pull/1776
It looks like before runc wasn't making a difference between a scope and a slice, it used a transient scope to test for Delegate= support but used with slices as well. The fix is that now it tests separately for Delegate= support.
Can you please check the patch though? From what I understand a slice has never supported Delegate=, so probably the additional check must be dropped and just not try Delegate= at all when a slice is used.
I saw the patch, and I see the check that it does. But it doesn't answer why. In particular, I'd like to understand if runc actually starts .slice units with Delegate=yes, and if tries to touch the cgroup hierarchy that systemd sets up for this unit, and if tries to create a sub-hierarchy underneath that unit.
Yes, runc makes changes to slices/scopes created by systemd as systemd doesn't support all the knobs that runc needs.
ok, talked with @mpatel and @zbyszek in #atomic earlier. We are going to try for option 4. I opened BZ1568594 for this and proposed it as an FE.
this fix has been pushed to stable. tomorrow's run of f28 should have this in it so please wait f24 hours and then test!
Metadata Update from @dustymabe: - Issue untagged with: meeting - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Metadata Update from @dustymabe: - Issue tagged with: F28, host
Log in to comment on this ticket.