#168 How to restart failed PCI devices
Closed: scheduled a year ago by rlengland. Opened a year ago by rlengland.

Article Summary:
The article plans on explaining how to restart a failed PCI device (for the example my ath10k Atheros card will be used) so it gets back to working without needing a full reboot.

Article Description:
For some reason, the ath10k firmware bug crashes several times on my PC, the bug isn´t restricted to Linux, as I have got it also on Windows, the best I could do on Windows was to use the Windows Troubleshooter and it would re-init and restart the device, but on Linux just recently I found the commands to use (previously I would always have to reboot).

So, basically this is the workaround:

echo "1" > /sys/bus/pci/devices/DDDD\:BB\:DD.F/remove
sleep 1
echo "1" > /sys/bus/pci/rescan

Suggestion from @glb
Just what you have here might not be quite enough content for an article. But if you could expand it a bit to include more about the sysfs kernel interface and how to find your device and query info about it, I think this could be built out into a very good article.


Metadata Update from @rlengland:
- Issue assigned to mateusrodcosta
- Issue tagged with: article, needs-image

a year ago

I didn't figure out how to move to a different board though.

I didn't figure out how to move to a different board though.

Unfortunately, the editors have to do it on this newer pagure-based system. I've moved this card to the review status and I'll look it over later and let you know if I see any problems.

Thanks!

Metadata Update from @rlengland:
- Custom field preview-link adjusted to https://fedoramagazine.org/?p=37873&preview=true&preview_id=37873

a year ago

Metadata Update from @rlengland:
- Custom field image-editor adjusted to rlengland

a year ago

On some cases, a device might crash or stop responding. For an externally connected device that is simple to solve, unplug it from the machine and plug it back again. For a internally connected device it might be a bit harder, as you ideally shouldn’t touch any of them with the machine running, so it’s up to the device or the OS how to handle it.

@mateusrodcosta: The above paragraph makes me cringe a bit; especially because in the previous paragraph you listed PS/2 devices as a prime example of "external devices". Back in the day, I encountered plenty of computers that had locked-up because someone tried to hotplug a PS/2 device. PS/2 was not hotpluggable. And on the flip-side, you've listed SATA as an example of an internal device which shouldn't be hotplugged. However, many servers do have hotpluggable SATA drives. I would suggest reworking these paragraphs to say that some devices (irrespective of being externally connected or internally connected) can be reset by physically disconnecting and reconnecting the device, but others may require a software-based approach.

On some cases, a device might crash or stop responding. For an externally connected device that is simple to solve, unplug it from the machine and plug it back again. For a internally connected device it might be a bit harder, as you ideally shouldn’t touch any of them with the machine running, so it’s up to the device or the OS how to handle it.

@mateusrodcosta: The above paragraph makes me cringe a bit; especially because in the previous paragraph you listed PS/2 devices as a prime example of "external devices". Back in the day, I encountered plenty of computers that had locked-up because someone tried to hotplug a PS/2 device. PS/2 was not hotpluggable. And on the flip-side, you've listed SATA as an example of an internal device which shouldn't be hotplugged. However, many servers do have hotpluggable SATA drives. I would suggest reworking these paragraphs to say that some devices (irrespective of being externally connected or internally connected) can be reset by physically disconnecting and reconnecting the device, but others may require a software-based approach.

@mateusrodcosta Are you able to address the concern that @glb expressed in his comments? We can publish this if you can address that.
Thanks.

While on Windows the card could be restarted by running the Network Troubleshooter and selecting the failed wireless adapter, on Linux it need to be done manually via CLI.

Also, I don't like that the above paragraph makes it sound as if Windows is better than Linux. Does the toggle switch for Wireless networking under GNOME settings not work? Even if it does though, just because someone is running Linux doesn't necessarily mean that they are running GNOME. So your article is still useful. Can we change the above sentence to something like:

Depending on your particular desktop environment and hardware, it may be possible to switch the PCI card off and back on using a GUI. But if that option doesn't work or isn't available, the following CLI method of restarting the PCI card might prove useful.

On some cases, a device might crash or stop responding. For an externally connected device that is simple to solve, unplug it from the machine and plug it back again. For a internally connected device it might be a bit harder, as you ideally shouldn’t touch any of them with the machine running, so it’s up to the device or the OS how to handle it.

@mateusrodcosta: The above paragraph makes me cringe a bit; especially because in the previous paragraph you listed PS/2 devices as a prime example of "external devices". Back in the day, I encountered plenty of computers that had locked-up because someone tried to hotplug a PS/2 device. PS/2 was not hotpluggable. And on the flip-side, you've listed SATA as an example of an internal device which shouldn't be hotplugged. However, many servers do have hotpluggable SATA drives. I would suggest reworking these paragraphs to say that some devices (irrespective of being externally connected or internally connected) can be reset by physically disconnecting and reconnecting the device, but others may require a software-based approach.

Thanks for the feedback, my knowledge about hardware is not that great (I have decided to specialize on software), so I will try to improve that section .

While on Windows the card could be restarted by running the Network Troubleshooter and selecting the failed wireless adapter, on Linux it need to be done manually via CLI.

Also, I don't like that the above paragraph makes it sound as if Windows is better than Linux. Does the toggle switch for Wireless networking under GNOME settings not work? Even if it does though, just because someone is running Linux doesn't necessarily mean that they are running GNOME. So your article is still useful. Can we change the above sentence to something like:

Depending on your particular desktop environment and hardware, it may be possible to switch the PCI card off and back on using a GUI. But if that option doesn't work or isn't available, the following CLI method of restarting the PCI card might prove useful.

Sure, the initial text was based on personal experience, while I get the same issue on both Windows and Linux (that is, the wireless card crashes probably due to firmware), on Windows I can restart it via the Troubleshooter (it will check every possible solution and attempt a restart as the final one), on Linux I require knowing speicfically how to achieve it, which this article teaches.

Does the toggle switch for Wireless networking under GNOME settings not work?

The focus of the article is when the wireless card basically died, at that point the quick toogle is mostly useless.

Anyway, I will make changes ASAP

I made a few changes, please re-review.

@glb @rlengland

Hi, was this re-reviewed already?

@mateusrodcosta

This is on my TODO list. Sorry I haven't gotten back to it yet. We have a busy schedule this week with the Fedora 38 releases. It will probably be another week before I get to this. I hope that is OK.

Metadata Update from @glb:
- Custom field editor adjusted to glb
- Custom field preview-link adjusted to https://fedoramagazine.org/use-sysfs-to-restart-failed-pci-devices/ (was: https://fedoramagazine.org/?p=37873&preview=true&preview_id=37873)
- Custom field publish adjusted to 2023-03-24

a year ago

@mateusrodcosta: I've edited and scheduled your article about using sysfs to restart PCI devices. It is scheduled to go out tomorrow, Friday 24 at 08:00 UTC. Let me know if you find any problems.

Issue status updated to: Closed (was: Open)
Issue close_status updated to: scheduled

a year ago

Login to comment on this ticket.

Metadata