#1 Please consider adding a QA test case for shutdown
Opened 5 years ago by coremodule. Modified 5 years ago

Alan and Pat (tablepc) have done some work involving getting new testcases added to the matrix. They are proposing to add several tests that verify the function of shutting a system down. I propose we consider the benefits to the community of adding these test cases to the matrices. I'll plan to discuss this further at the next QA meeting on Monday, January 14. See the attached document for Pats summation of the test cases.

PotentialNewShutdownTest.odt


Hello,
I am very suprised that we do not have such test cases already. Normally, I would switch off and reboot the computer as part of the gdm test case or a similar one. However it is true, that those do not explicitly require to use console commands, so I am in to write those.

I have already seen the proposal and it makes sense to me to divide this into three minor test cases (for each command one test case) because that will be easier to automate.
Also, having a non-root user on the system should be a known prerequisite for any test case that requires to issue commands on a non-root console.

I can write those tests, so when you feel like that, coremodule, assign me to it.

Hi, Alan here. I was invited to the next meeting, but I checked again and I see there is probably no meeting today due to lack of content :-).

To make sure it does not get lost here: my reason for this request is to add a test for "halt" specifically. Thanks for hearing my request. I don't have a strong opinion about what you do to test reboot and poweroff/shutdown. I just (wrongly) thought there was a current test case for shutdown, that I could base off.

If you can test "halt", the screen stays on, which means you have a chance to see if there are errors in the last screenful. I suspect a lot of people did not notice the times when we had error messages during shutdown, because the screen turns off very quickly.


(I think most people do not really notice during the next boot either. 1) Even if shutdown failed to cleanly unmount ext4, "recovering journal" happens quite quickly. 2) Very few people use dmraid, so most people aren't affected by a full raid resync when shutdown failed to disassemble DM devices.

Although these shutdown bugs were easy to overlook, I argue they have bad effects on less common setups).


I have two new technical points, about analysing boot logs to check whether shutdown did a clean unmount of the filesystems, especially the root filesystem.

I tried XFS now, to see if that has a message to look out for like the "recovering journal" message from fsck.ext4. I concluded that XFS will often not show any message after an unclean unmount. (XFS is the default for Server).

So I'm not sure how useful it will be to look at the fsck messages as part of a "halt" test case, or whether they'll mostly just cause confusion. My original hope was the fsck messages would provide some specific clarity, because the "halt" messages can be difficult to interpret when you are not very familiar with everything. In particular, in some (less common?) cases systemd-shutdown will perform several retries, because of how it works without trying to understand all the dependencies. I think the initial tries would show some "busy" error messages. But given the XFS behaviour, maybe it's better if we just make sure people are looking at "halt", and hope that they will learn to interpret the onscreen messages correctly, i.e. not ignore too many real errors.

Secondly, if we do want to look at fsck logs, the command I suggested was a bit weird. I think it is technically better to use journalctl -b -u "systemd-fsck*". (My reasoning is here).

Metadata Update from @lruzicka:
- Issue assigned to lruzicka

5 years ago

@sourcejedi
Hello Alan, so I have tried to run those commands and see what it brings on my machine and I realized that I am not getting any messages during system halt, probably because of the silent boot hidden behind Plymouth. The only message I got was: blah blah blah SYSTEM HALTED and the computer is waiting for manual switch off.

Do you think this behaviour is ok because potential error messages would be reported if they happened, or that it is necessary to get rid of Plymouth first?

Let me know, what you think.

Gah, good catch! I had disabled plymouth on my laptop. Inside my VM, I guess either plymouth splash was not working at all, or I had already disabled it there as well.

You might use the magic of systemd. It should be possible to disable plymouth for the current boot/shutdown only, using systemctl mask --runtime plymouth-halt.

Otherwise you need to edit the boot options, i.e. remove rhgb from the kernel command line. But that's harder to do, and makes it harder to write & maintain a test case. (E.g. how do you do it for uboot on Raspberry PI? Or Fedora Cloud - I heard maybe they use syslinux instead of GRUB? The man page for grubby does not say it supports either of those.)

Thanks

Login to comment on this ticket.

Metadata
Attachments 1