#1785 Mesa/Nouveau maintainer(s) should be required to ship the locking patches from the QtWebEngine Copr
Closed: Fixed 6 years ago Opened 6 years ago by kkofler.

The Mesa and Nouveau maintainers have been aware of https://bugzilla.redhat.com/show_bug.cgi?id=1376107 (and dozens of essentially duplicate reports all having the same root cause: missing locking) for over a year now. This is a showstopper for QtWebEngine and all the applications depending on it. They have still not done anything to fix those.

Patches fixing all the reported locking issues have been available for over a year now too. (They were initially developed by Ilia Mirkin in an upstream git branch. The upstream patch has since been deleted without being merged, so I have forward-ported the patch to newer Mesa releases.) I have been providing patched Mesa builds in my QtWebEngine Copr since then, and all the users who tried them reported that the patches fix QtWebEngine and do not cause any visible issues. Ben Skeggs wrote that he considers the patches incorrect and that he would be working on a better one, but we are still waiting for that, and QtWebEngine is still not working on Nouveau. Therefore, I think that the patches that are available really need to be applied.

The following are the patches for Mesa 17.2.x:
http://copr-dist-git.fedorainfracloud.org/cgit/kkofler/qtwebengine/mesa.git/tree/0001-WIP-nouveau-add-locking.patch?id=c81234957e754d0759d3b0870e7e7d00001d1929
http://copr-dist-git.fedorainfracloud.org/cgit/kkofler/qtwebengine/mesa.git/tree/0003-nouveau-more-locking-make-sure-that-fence-work-is-al.patch?id=c81234957e754d0759d3b0870e7e7d00001d1929
http://copr-dist-git.fedorainfracloud.org/cgit/kkofler/qtwebengine/mesa.git/tree/0004-nv30-locking-fixes.patch?id=c81234957e754d0759d3b0870e7e7d00001d1929
(I also have versions for the older Mesa versions.)


These patches should be really evaluated and applied if there are no strong reasons against. QtWebEngine is used by several KDE applications, many of them (QupZilla/Falkon, KMail, Kontact, Konqueror, …) shipped with our default installation in KDE Plasma spin. All these applications tend to crash on systems running with Nouveau.

@spot How is the current situation with Chromium and Nouveau? Afair in past also Chromium was crashing.

Is there a PR against mesa/nouveau with the prosposed changes?
Did the mesa/nouveau maintainer(s)?

Please mention the respective maintainers to make sure that they get notifications for this ticket.

Metadata Update from @jforbes:
- Issue tagged with: meeting

6 years ago

I will try to get some input from Ben on this. I know he said he had a different solution in mind, but perhaps carrying the patch until that solution is ready would be acceptable. If there are other issues with the patch, we should hear them as well.

  • AGREED: Issue 1785 is delayed while we wait for input from the
    maintainer. Jforbes will ask for that input (+5,0,-0) (jforbes,
    16:19:09)

On 10/21/2017 03:12 AM, Justin Forbes wrote:

Mind taking a look at FESCo issue 1785 and giving some feedback?

I'm not aware of the exact reasons, but the author of those patches is
strongly against them being shipped/merged anywhere, which is why he
removed the branches from his repository. While they may band-aid over
the problems with multi-threaded OpenGL usage on Nouveau, I believe
they're also broken in other ways and can cause other issues.

The correct fix is really difficult to implement, especially in a
bisectable way, without breaking anything. I have been trying to do so
for quite a while now but keep getting pre-empted by more urgent
matters, and unfortunately none of the community volunteers that are
able seem willing to handle the task.

I'm working on it, and it is one of my larger priorities, but that
doesn't help for now I guess.

Ben.

On 10/21/2017 03:12 AM, Justin Forbes wrote:

Mind taking a look at FESCo issue 1785 and giving some feedback?

I'm not aware of the exact reasons, but the author of those patches is
strongly against them being shipped/merged anywhere, which is why he
removed the branches from his repository. While they may band-aid over
the problems with multi-threaded OpenGL usage on Nouveau, I believe
they're also broken in other ways and can cause other issues.

Do you have any details on the other breakage and issues?

Based on the response from Ben, I'm inclined to think that we should not require these patches to be used by the package maintainers.

What are the problems/expected issues with the patches? I'd like to get these information to get a better view on the situation. From my current experience the patches solve issues with Nouveau and QtWebEngine and I don't know any issues created by them yet. As a member of the KDE SIG I have to evaluate possible solutions (and their glitches) of our current problems with Nouveau related crashes.

AGREED: FESCo understands that this is a serious issue, but doesn't feel that it should override the decision of the subject matter experts (+6, 0, -0) (sgallagh, 16:54:22)

Metadata Update from @sgallagh:
- Issue close_status updated to: Fixed

6 years ago

For some weird definition of "Fixed". The voted resolution does not actually fix the problem at hand at all.

@lupinix's question was also left unanswered. This is really important to us, because we think that the locking patches are the best solution available for this problem at this time. This is a driver issue and needs to be fixed in the driver. The only thing QtWebEngine could do to work around it would be to stop using OpenGL on Nouveau entirely, which means degraded performance and functionality (e.g., no WebGL) (even for those people using the fixed driver from my Copr, unless I apply that ugly "noouveau" driver identifier hack). So we really need to know why the maintainer is objecting to the patch (and IMHO that is a question that FESCo should have asked, too, before deciding that the maintainer is right).

To be clear, it is the vague statement "I believe they're also broken in other ways and can cause other issues." that I am unhappy with, it does not state what the issues are. I am not aware of any, all the users of my Copr are happy with my patched builds. The distribution KaOS is also shipping those patches, apparently without issues.

While most of my computers are AMD GPU systems, I have one machine (a MacBook Pro that runs Fedora) that has an NVIDIA GPU. I have used @kkofler's COPR on that machine with great success, and I've observed no negative effects.

People in here keep saying that @bskeggs has said something about them being terrible and he has some other solution in mind, but it's been a year, and with no other solution in sight, I would rather have either the "band-aid" applied or have @bskeggs fix it as he says he'd like to.

Note that the situation is now so bad that Linux distributions are increasingly actively recommending people to not use the nouveau driver. For example, openSUSE now adds a warning whenever the nouveau DRI driver is being installed to tell the user to not use it with Qt, Blink/WebKit, etc. and vaguely suggests the manufacter (NVIDIA) has a better driver. Note that they don't even ship the proprietary driver either, so the fact is that the default system is broken.

At this point, someone needs to solve this problem, because otherwise we should just give up and stop offering nouveau and ship the proprietary nvidia driver.

Basically, I want a really good reason for why Fedora chooses to leave the default environment for people with NVIDIA GPUs (which they don't necessarily get to choose, because laptops...) absolutely broken.

Login to comment on this ticket.

Metadata