#463 Latest kernel (4.16.4) cause longer startup time for cloud images
Closed: Fixed 4 years ago Opened 4 years ago by jlebon.

I'd like to bring attention to an issue we just discovered in the latest Fedora 27 Cloud pungi compose[1]. I've just verified that the latest F27 Atomic pungi compose suffers the same issue. FAH 28 RC 1.1 ships with kernel 4.16.3 and so doesn't have that issue yet. The latest two-week F28 compose does suffer from this issue.

Essentially, services at startup that require entropy will block the boot process until enough entropy is gathered. This can result in a greater startup time before SSH comes up in cloud environments which may have limited entropy for VMs. (In my quick test here on version 28.20180429.1, it blocked for almost 4m). This makes for a deteriorated UX for AH.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1572944


On Tue, May 01, 2018 at 01:18:36PM +0000, Jonathan Lebon wrote:

I'd like to bring attention to an issue we just discovered in the
latest Fedora 27 Cloud pungi compose[1]. I've just verified that the
latest F27 Atomic pungi compose suffers the same issue. FAH 28 RC 1.1
ships with kernel 4.16.3 and so doesn't have that issue yet. The
latest two-week F28 compose does suffer from this issue.

Essentially, services at startup that require entropy will block the
boot process until enough entropy is gathered. This can result in a
greater startup time before SSH comes up in cloud environments which
may have limited entropy for VMs. (In my quick test here on version
28.20180429.1, it blocked for almost 4m). This makes for a
deteriorated UX for AH.

Do we have a suggested approach? External entropy pools? Haveged?

--
Matthew Miller mattdm@mattdm.org http://mattdm.org/
Fedora Project Leader mattdm@fedoraproject.org http://fedoraproject.org/

Thanks @jlebon - this is definitely something that we're going to need to monitor for the next two week release!

Metadata Update from @dustymabe:
- Issue tagged with: F28, bug

4 years ago

Metadata Update from @dustymabe:
- Issue assigned to jcline

4 years ago

My prediction is this gets reverted upstream. The thing everyone running VMs needs to do now is verify that they're passing through virtio-rng or equivalent. Probably we should make a push to have libvirt for example just do it by default, particularly if the host has rdrand.

There's going to be a long tail of this type of thing. Particularly in not-top-tier public clouds.

If it doesn't get reverted in the upstream kernel directly, maybe we carry a revert for a bit.

Metadata Update from @dustymabe:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Login to comment on this ticket.

Metadata