#2020 Firefox is switching from gcc to clang/llvm
Closed: Rejected 8 days ago by sgallagh. Opened 2 months ago by stransky.

Mozilla upstream switches from gcc to clang and we're going to follow upstream here due to clang performance, maintenance costs and compilation speed. Tom Stellard (clang maintainer) has asked me to file this ticket to comply with Fedora processes.


That's sad to hear. Are there some benchmarks available? Especially in Fedora context.

You can directly compare fedora firefox builds with binaries provided by mozilla. The speed difference is about 10-20%, mainly due to PGO/LTO. Plain clang build is ~5% faster than the gcc one.

Benchmarks:
https://browserbench.org
http://web.basemark.com/

Note: You need to compare Firefox 64 binaries and later as this is the first release built by clang by Mozilla.

I don't see any restrictions on using clang in the guidelines - what is the concern that would need FESCo approval?

On Thu, 2018-12-06 at 20:03 +0000, Kevin Fenzi wrote:

https://fedoraproject.org/wiki/Packaging:Guidelines#Compiler

It seems strange that the C/C++ guidelines wouldn't mention this.

@stransky, does upstream still plan to support gcc?

On Thu, 2018-12-06 at 20:03 +0000, Kevin Fenzi wrote:

https://fedoraproject.org/wiki/Packaging:Guidelines#Compiler

It seems strange that the C/C++ guidelines wouldn't mention this.
@stransky, does upstream still plan to support gcc?

AFAIK yes, they use it for some memory/sanity testing. At least until Mozilla moves the tests from gcc-plugin they need to support it. We're also going to use gcc on second arches platforms where clang is not well supported yet.

We don't throw gcc away (firefox.spec has a single pref to switch compiler) but rather we're going to use the best tool for the actual task. When gcc proves faster on x86_64 or at least matches clang we may switch back to gcc - but recently the performance difference is significant.

The guidelines don't seem to offer any wiggle room here: "Packages may only build with an alternative compiler to gcc if upstream does not support gcc."

Perhaps this should be raised with the Fedora packaging committee instead of FESCo, since they oversee the packaging guidelines.

CC @tibbs

The guidelines don't seem to offer any wiggle room here: "Packages may only build with an alternative compiler to gcc if upstream does not support gcc."
Perhaps this should be raised with the Fedora packaging committee instead of FESCo, since they oversee the packaging guidelines.
CC @tibbs

No, this one is definitely in our wheelhouse. The guidelines are clear. What @stransky is asking for is an exception, which requires FESCo approval.

Frankly, for a piece of the OS as critical as Firefox (for many users, it's the only application they use regularly), I think an exception for a significant performance increase is warranted.

I'm +1 to allowing it.

Since I was pinged....

"We" (FPC/FESCo/Fedora as a whole) could certainly consider relaxing that requirement for the use of gcc. My recall of the history is a bit murky and I'm a bit short on time this morning so I can't provide an easy summary of why we decided to add that language, but it was certainly in the early days of clang being in the distro. I would wager that part of it was to avoid the "ooo, shiny" effect (which is closely related to the "-O9 because fasterer" effect. I am pretty certain that the decision came from FESCo as FPC wouldn't have undertaken a restriction of that magnitude on its own.

Years later, though, things are obviously a bit different. We don't even have gcc in the buildroot by default these days.

Figuring out how to handle "mass rebuilds" (and whether full distro rebuilds are even needed for a change to one of the many compilers) would be the main fallout from something like this. One would hope that major clang updates are no more painful than major gcc updates but I have no idea if that's actually true.

Metadata Update from @zbyszek:
- Issue tagged with: meeting

a month ago

This was discussed in todays' FESCo meeting:
AGREED: The exception for firefox is granted (+6, +1, 0)

Metadata Update from @zbyszek:
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

a month ago

Metadata Update from @zbyszek:
- Issue untagged with: meeting

a month ago

I don't see why Firefox gets yet another Firefox-only exception.

There are only 2 possibilities: either Clang is really so much better than GCC, then it should become the preferred compiler distro-wide (and GCC used only for those packages that don't build with Clang), or it is not, then we should consistently stick to GCC. Allowing every package to pick their favorite compiler risks running into binary (in)compatibility bugs sooner or later. Allowing this only for Firefox is unfair to other web engine implementations (Chromium, QtWebEngine, WebKitGtk, QtWebKit).

@kkofler There's a clear benchmark in this particular case. But it is not enough to generalize to other programs or the whole distro. Firefox is a very important and widely used program, and 10-20% speedup is not something to be sniffed at. If other programs face the same consideration and would benefit from clang, let's consider them case by case.

The last time I looked into using clang to build stuff in Fedora (two years ago), the debuginfo generated by it was pretty broken (debuginfo stuff was missing enough things that debugsource subpackages would not generate properly). Has this been resolved since?

We can always ask @mjw about debuginfo stuff ;)

On Mon, 2018-12-10 at 16:58 +0000, Igor Gnatenko wrote:

ignatenkobrain added a new comment to an issue you are following:
We can always ask @mjw about debuginfo stuff ;)

I haven't looked recently at the debuginfo that llvm generates, but in
the past it was often incomplete, or completely disabled because it was
using too much memory/changed code generation/crashed the compiler. And
not all tools could handle what llvm generated. But I don't know of the
current status.

There is the issue that clang/llvm doesn't generate asynchronous unwind
tables on all architectures (.eh_frame), which impacts generating
backtraces by various tools. The system compiler (gcc) makes sure they
are always generated on all architectures.

In general llvm seems to miss things like variable tracking assignment.

For GCC there is extensive documentation how/if certain optimization
passes impact debuginfo generation:
http://www.fsfla.org/~lxoliva/writeups/gOlogy/gOlogy.html

I am not aware of something similar for llvm.

See https://blog.mozilla.org/nfroyd/2018/05/29/when-implementation-monoculture-right-thing/#comment-2809 for details, GCC actually with LTO/PGO creates smaller and faster Firefox.
In addition to that, there are security issues (llvm doesn't implement e.g. _D_FORTIFY_SOURCE=2 properly) and as mentioned above debuginfo issues. I think it is a serious mistake to grant this exception.

In general, this sounds like a mistake to allow Firefox to transition from GCC to Clang.

See https://blog.mozilla.org/nfroyd/2018/05/29/when-implementation-monoculture-right-thing/#comment-2809 for details, GCC actually with LTO/PGO creates smaller and faster Firefox.
In addition to that, there are security issues (llvm doesn't implement e.g. _D_FORTIFY_SOURCE=2 properly) and as mentioned above debuginfo issues. I think it is a serious mistake to grant this exception.

Please use the Fedora Firefox package and produce faster binaries than the ones which are shipped by Mozilla. Then we can talk about gcc performance, I'll happily revert the change and build it with gcc with your setup. I already tired that but I failed...so I feel free to take the task.

Well, it should be compared to what you can achieve in the distro if you build with clang, not to what Mozilla ships, they might not build with -fstack-protector-strong, -D_FORTIFY_SOURCE=2, -fstack-clash-protection etc.
Do you have a src.rpm where you enable the clang LTO and PGO?
From what I've gathered so far I need something like:
ac_add_options --enable-lto
ac_add_options MOZ_PGO=1
ac_add_options --enable-tests
ac_add_options --enable-optimize="-g -O3"
and tweak probably RPM_OPT_FLAGS in the spec file too.
Is the src.rpm setup to be able to run the tests during build (against Xvfb)?

In Fedora -fstack-clash-protection is a no-op on clang.

On Mon, Dec 10, 2018 at 06:04:54PM +0000, Neal Gompa wrote:

In general, this sounds like a mistake to allow Firefox to transition from GCC to Clang.

We just need a bit more input to know.

It isn't clear to me whether the missing (security) features
in clang were known or not when the decision was made. Or how
much benchmarking was actually done.

And it would be good to see the actual benchmarks used for the
performance claims. It seems that depending on which benchmark
you look at one or the other compiler produces smaller and
faster binaries.

If we could to have (scratch) builds of the package done
with both the system compiler and the alternative compiler
then people can run the benchmarks to see if there is
actually any speed difference.

Then we could see if that speed difference is big enough to
warrant giving up some of the features and compatibility with
the default system toolchain.

On Mon, Dec 10, 2018 at 06:04:54PM +0000, Neal Gompa wrote:

Well, it should be compared to what you can achieve in the distro if you build with clang, not to what Mozilla ships, they might not build with -fstack-protector-strong, -D_FORTIFY_SOURCE=2, -fstack-clash-protection etc.

Sure. But I don't believe those can make so huge performance difference. I did the benchmarks on mozilla nightly built directly from mozilla repository, with mozilla default flags without any Fedora specific tweaks, both with clang 7 and gcc 8.

Do you have a src.rpm where you enable the clang LTO and PGO?

You can take this package:
https://koji.fedoraproject.org/koji/taskinfo?taskID=31403563
and flip clang/pgo switches.

But I recommend to take latest nightly and do the tests there as it uses the same flags for gcc/clang and the build/benchmark is easier that with rpm packages.

From what I've gathered so far I need something like:
ac_add_options --enable-lto
ac_add_options MOZ_PGO=1
ac_add_options --enable-tests
ac_add_options --enable-optimize="-g -O3"
and tweak probably RPM_OPT_FLAGS in the spec file too.

I would not insist on "--enable-lto" as it causes run-time firefox crash
(https://bugzilla.mozilla.org/show_bug.cgi?id=1495742)

MOZ_PGO should do the trick but I failed to produce anything faster with that - there may be some catch here or maybe the profiling data is not correctly used.

Is the src.rpm setup to be able to run the tests during build (against Xvfb)?

The src rpm does not have that yet.

In the spec file I see -fno-delete-null-pointer-checks, has that bug been fixed and can this option be dropped, or has it been just disabled without actually trying to fix the codebase?
If that option works around something, usually it means calling method on a NULL object or something similarly undefined.

RHBZ#1495742 talks about PGO, not LTO, so which one it is? MOZ_PGO should make more difference than LTO, assuming the train run is reasonable. Did you have --enable-tests and X server?

In the spec file I see -fno-delete-null-pointer-checks, has that bug been fixed and can this option be dropped, or has it been just disabled without actually trying to fix the codebase?
If that option works around something, usually it means calling method on a NULL object or something similarly undefined.

That's perhaps something from the past and can be removed.

1495742 talks about PGO, not LTO, so which one it is? MOZ_PGO should make more difference than LTO, assuming the train run is reasonable. Did you have --enable-tests and X server?

Yes, MOZ_PGO enables PGO builds which should bring more significant performance boost. and yes, I did the builds which --enable-tests and with running X server, seeing the test actually running, the PGO build was successfully completed without errors.

On Mon, Dec 10, 2018 at 06:04:54PM +0000, Neal Gompa wrote:
We just need a bit more input to know.
It isn't clear to me whether the missing (security) features
in clang were known or not when the decision was made.

Can you be more specific which security features are missing and ideally file a #BZ for that with details? Thanks.

To be clear here, the next security Firefox update is still going to be built with gcc. We're going to produce testing clang builds after that for testing.

On Wed, 2018-12-12 at 09:45 +0000, Martin Stransky wrote:

To be clear here, the next security Firefox update is still going to be built with gcc. We're going to produce testing clang builds after that for testing.

For packages build with default toolchain rpmbuild makes sure to add
various security hardening flags:

https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/master/f/buildfla=
gs.md

As pointed out by others at least -fstack-clash-protection is a noop
with clang and -D_FORTIFY_SOURCE=3D2 isn't implemented properly. Someone
with knowledge of the alternative toolchain might know about others
like -fcf-protection, -fasynchronous-unwind-tables, -fstack-protector-
strong, -D_GLIBCXX_ASSERTIONS, -fexceptions and -Werror=3Dformat-security=
=20
or the linker flags -z relro, -z defs and -z now.

Also binaries build with the default toolchain get watermarked by
annobin so tools can check all security hardening is done/covers all
code: https://fedoraproject.org/wiki/Toolchain/Watermark

I am not sure this is done when you use an alternative compiler and
linker. So we might not even know which security hardening features are
missing.

@stransky:

To be clear here, the next security Firefox update is still going to be built with gcc. We're going to produce testing clang builds after that for testing.

If you switch to a different compiler (which I think should not be done at all given the negative feedback from the toolchain maintainers), this is a major change that should only be done in Rawhide. Such a change has no business ending up in a security update to a released distribution, ever.

I'm sorry I haven't made much progress so far, but have been in talks with Honza Hubicka, who has posted a blog post about it:
http://hubicka.blogspot.com/2018/12/firefox-64-built-with-gcc-and-clang.html
I'm building in f28/f29/f30 gcc with those fixes - gcc-8.2.1-6.fc{28,29,30} and once those are built, will try building firefox myself in mock.

To me, this looks like clear evidence that this exception was granted based on false premises: The claimed 10% performance gain was the one argument that swayed the decision in favor of granting the exception, but this benchmark shows that GCC-built binaries are actually faster on all benchmarks: one single benchmark (JetStream) is won by the old GCC 6, all the others by GCC 8.

Thus, this exception has no valid basis and hence ought to be rescinded.

In light of the discussion on this ticket over this week, and in particular the very detailed blog post Honza Hubicka wrote to analyze the performance and security claims, I think it might be good for us to rediscuss this ticket. Can we add it to Monday's agenda?

Metadata Update from @bowlofeggs:
- Issue status updated to: Open (was: Closed)

a month ago

Metadata Update from @zbyszek:
- Issue tagged with: meeting

a month ago

I'd like to have free hands to use the best compiler available and not to be restricted by one of them to produce the best binaries for our users. I think the free competition is important here as the recent status is that Fedora has slow Firefox binaries compared to Mozilla.

I feel that the recent interest of gcc folks in Firefox is driven by the clang competitor and if Firefox is vendor-locked again nothing will be improved. I'm going to file actual bugs against gcc (broken/non-working PGO, FF crashes) and if that's fixed I'll happily use gcc to build Firefox at Fedora.

So let's me rephrase a bit the ticket title - "Allow Firefox to use the best compiler available to build Firefox binaries, remove the gcc lock-in to allow use clang if gcc proves to be inefficient."

To me, this looks like clear evidence that this exception was granted based on false premises: The claimed 10% performance gain was the one argument that swayed the decision in favor of granting the exception, but this benchmark shows that GCC-built binaries are actually faster on all benchmarks: one single benchmark (JetStream) is won by the old GCC 6, all the others by GCC 8.
Thus, this exception has no valid basis and hence ought to be rescinded.

KEvin, if that can be used for Fedora I'll happily use it - I don't favor any particular compiler, my goal is to provide the best user experience for Fedora users.

The fact is that last time I tried gcc8 + PGO the final binaries crashed and when I removed the problematic parts the final binaries has inferior speed compared to Mozilla ones. Note that Honza uses OpenSUSE and I was not able to replicate his setup/results on Fedora.

I'll be happy if Jakub has more luck with it and can persuade Fedora gcc8 pro produce fast binaries ;-)

The rule to use GCC is not "vendor lock-in", it is there to ensure that the distribution consistently uses the same compiler-level security features, among other things. (Clang lags significantly behind GCC on security hardening.) And nobody is being "locked in", nobody forces Fedora to choose GCC, this is a deliberate choice. But it has to be made at the level of the whole distribution (Fedora) and not by every single package (and especially not by the maintainers of one package with a special exception for that one package only).

From the linked blog note, it's clear that gcc is improving and is a viable competitor, but it's also clear that patches are required to actually get a working binary. Firefox in Fedora must be built with the distro compiler, so right now clang works better. Once Honza's work filters down to the compiler that is released in Fedora, things might change again.

I'd like to have free hands to use the best compiler available and not to be restricted by one of them to produce the best binaries for our users.

That's my thinking too. The resolution was to "allow" clang to be used, not to mandate it. It's great to hear that @stransky is open to switching back later.

That's my thinking too. The resolution was to "allow" clang to be used, not to mandate it. It's great to hear that @stransky is open to switching back later.

Actually we still use gcc for Firefox 64 and I'm going to switch to clang only if it proves to be better for the Fedora builds.

@zbyszek:

but it's also clear that patches are required to actually get a working binary. Firefox in Fedora must be built with the distro compiler, so right now clang works better.

As I understand it, the Fedora GCC maintainers are willing to apply any required patches. Backported patches are commonplace in Fedora GCC packages.

That's my thinking too. The resolution was to "allow" clang to be used, not to mandate it.

But there is no rationale to allow Firefox to do something that no other package in Fedora is allowed to do without a strong reason. The one such reason that was brought forward turned out to be false. So why should we allow such an unfair privilege to this package?

Please stop repeating that "one such reason that was brought forward turned out to be false" and "this exception was granted based on false premises". Not only you are wrong, but also the words you use are unpleasant.

I think it's pretty clear, but I'll repeat it: the benchmark compared firefox as it was in Fedora then with the upstream binary. It is a valid comparison. Even if somebody else made a different benchmark with a different version of the compiler with different options, doesn't make the first one "false", "wrong", or anything else, except possibly "outdated".

Honza Hubicka is currently testing if -fno-semantics-interposition (the default behavior of clang, but not of gcc) doesn't affect significantly the non-LTO builds.

As for security, -fstack-protector* is another thing not properly implemented in clang, it leaves the stack guards in registers so makes it easier for exploits to store the value back.

In my view, the valid considerations that weren't brought forth before voting were the following:

  • Most (if not all) of the security hardening doesn't work in Clang. Some of it is a no-op, some of it just completely fails.
  • None of our gcc plugin based enhancements work with clang, and we have no equivalent for clang.
  • Debuginfo generation with Clang is broken. I know this because one of the distributions I help develop uses Clang for the system compiler (OpenMandriva), and I was forced to rollback some of the debuginfo enhancements since they don't work with clang-produced debuginfo (I am co-maintainer of the RPM and DNF stack in OpenMandriva). The produced DWARF data is not complete enough for the debuginfo package generation, as configured in Fedora, to work.

@zbyszek:

Please stop repeating that "one such reason that was brought forward turned out to be false" and "this exception was granted based on false premises". Not only you are wrong, but also the words you use are unpleasant.
I think it's pretty clear, but I'll repeat it: the benchmark compared firefox as it was in Fedora then with the upstream binary. It is a valid comparison. Even if somebody else made a different benchmark with a different version of the compiler with different options, doesn't make the first one "false", "wrong", or anything else, except possibly "outdated".

The comparison is correct for what it compares: Firefox upstream binaries vs. Firefox Fedora binaries. The problem is that this is not what we need compared, which is Firefox Fedora binaries built with GCC vs. built with Clang.

Judging from the evidence gathered so far, there are 2 likely possible outcomes: either the GCC binaries will be approximately the same speed if not slightly faster, or the Clang binaries will only be faster for one reason: missing security features. Some of the security features we enable in Fedora builds are a tradeoff vs. performance, but they are absolutely worth it, especially for an application designed to browse random sites (with JavaScript code) from the entire Internet.

I don't see any evidence so far that switching the compiler is a magic wand to gain 10-20% performance without compromising part of the security. The fact that the upstream binary, built with much fewer security hardening features, is faster (and that compared to the existing Fedora binary that does not yet enable LTO and PGO, not to what seems to be possible to achieve with GCC) is undoubtedly true, but not useful information in the context of the decision to make here.

Some updates here:

  • I don't see any issues with clang/debuginfo on my test builds, works normally in gdb. We also use mozilla crash reported and submit debug symbols to mozilla so we don't depend much on Fedora Abrt.

  • Jakub works on updated/fixed gcc in Fedora, from my initial testing the builds roughly match the clang performance. If it goes well we don't need to switch to clang at all.

While I don't agree that the original decision was made based on "false premises", I do believe that it was made without access to all of the necessary information. I agree that it needs to be reconsidered. I for one was not aware at how many of the security protections in GCC were not available in clang. I agree with Kevin here that for such a vital tool that regularly interacts with untrusted content, switching away from a security-hardened compilation environment would be a mistake.

I formally propose: "Firefox, being a critical component that connects to untrusted data sources, must continue to compile using gcc in order to produce code less vulnerable to attack."

Upstream Firefox is more than welcome to make a different choice around security vs. performance, but I for one would rather Fedora be known as "That one OS that didn't get hit by the latest punny security vulnerability."

Honza posted an update to "Firefox 64 built with GCC and Clang" https://hubicka.blogspot.com/2018/12/firefox-64-built-with-gcc-and-clang.html as "Even more fun with building and benchmarking Firefox with GCC and Clang" https://hubicka.blogspot.com/2018/12/even-more-fun-with-building-and.html

Since Honza has access to the upstream benchmarking setup now it should be easier to keep the gcc builds smaller and faster than the clang builds.

He also notes that it is just hard to reproduce the "official builds" whatever compiler you do use. There are some recommendations in his last blog post on "Communication with maintainers of Firefox packages in individual Linux distros".

On Thu, 2019-01-03 at 13:40 +0000, Stephen Gallagher wrote:

Firefox, being a critical component that connects to untrusted data
sources, must continue to compile using gcc in order to produce code
less vulnerable to attack.

I am also inclined to go this direction. +1

I formally propose: "Firefox, being a critical component that connects to untrusted data sources, must continue to compile using gcc in order to produce code less vulnerable to attack."

+1

Upstream Firefox is more than welcome to make a different choice around security vs. performance, but I for one would rather Fedora be known as "That one OS that didn't get hit by the latest punny security vulnerability."

This is IMHO one of the advantages of a distribution compared to upstream. AFAIK a while ago upstream nginx had a problem that was mitigated by hardened compilation flags in downstream distributions but was not on the upstream distributed builds.

I'm fine with rescinding the exception granted in https://pagure.io/fesco/issue/2020#comment-545724.

I don't think we need to swing in the other direction, and assert that gcc must be used, because that's already what the guidelines say, and I don't see a reason to make an empty pronouncement like that.

-0.

It's not a guideline so I'm not bothered by the wording. It's just reverting the exception.

+1 to the proposal.

+1 to reverting the exception in this case

Metadata Update from @bowlofeggs:
- Issue untagged with: meeting

18 days ago

+1 to revert, I think we can close this.

Metadata Update from @sgallagh:
- Issue close_status updated to: Rejected
- Issue status updated to: Closed (was: Open)

8 days ago

Login to comment on this ticket.

Metadata