#2923 Re-vote for Change proposal: Add -fno-omit-frame-pointer to default compilation flags
Closed: Accepted a year ago by churchyard. Opened a year ago by ngompa.

This ticket is a re-vote for #2817, per discussion in #2917 with support from @zbyszek and @decathorpe.

Change proposal: Add -fno-omit-frame-pointer to default compilation flags

Fedora will add -fno-omit-frame-pointer to the default C/C++ compilation flags, which will improve the effectiveness of profiling and debugging tools.

Owners, do not implement this work until the FESCo vote has explicitly ended.
The Fedora Program Manager will create a tracking bug in Bugzilla for this Change, which is your indication to proceed.
See the FESCo ticket policy and the Changes policy for more information.

Vote proposal

This Change is implemented for Fedora Linux 38 and we evaluate whether to retain it by Fedora Linux 40. This Change must be implemented in a manner which packages are able to trivially opt-out of retaining frame pointers during compilation so that packages that take larger performance hits can easily revert.


I asked somewhere, but I don't think I got an answer to this, so I'll ask again: With more and more parts of the "core" system being built with Rust (some GTK apps, librsvg2, the GPG backend in RPM as of Fedora 38, some Virt-related libraries, etc.), are there plans to also change Rust compiler flags to include frame pointers? I'm pretty sure our current flags (i.e. release mode, opt-level=3, debuglevel=2) result in rustc not including frame pointers - but there is a flag to force inclusion of frame pointers regardless of optimization options.

I asked somewhere, but I don't think I got an answer to this, so I'll ask again: With more and more parts of the "core" system being built with Rust (some GTK apps, librsvg2, the GPG backend in RPM as of Fedora 38, some Virt-related libraries, etc.), are there plans to also change Rust compiler flags to include frame pointers? I'm pretty sure our current flags (i.e. release mode, opt-level=3, debuglevel=2) result in rustc not including frame pointers - but there is a flag to force inclusion of frame pointers regardless of optimization options.

I think it'd be reasonable to include a knob for Rust too. IIRC, Rust compiler flags are macroized in a similar fashion, right?

More or less. It should be possible to tweak them accordingly and add an opt-out mechanism, similar to how debuglevel and opt-level can be set.

Either way, with all the discussions around this topic, I'm +1 for the new proposal. If things don't work out, reverting the change after a release cycle or two sounds good to me.

I think it'd be reasonable to include a knob for Rust too. IIRC, Rust compiler flags are macroized in a similar fashion, right?

Agreed, I don't see a reason to have Rust diverge here, and we'd be happy to implement this Change for Rust as well.

Also for the record, it looks like Golang had frame pointers enabled since 2016 (see https://github.com/golang/go/issues/15840 and https://go-review.googlesource.com/c/go/+/61511).

Put up https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/230 and https://pagure.io/fedora-rust/rust2rpm/pull-request/237 for the bulk of the implementation of this Change. Note that these do not change the default yet, they just add a convenient knob (%_include_frame_pointers) for doing so.

There were reasonable justifications on both sides of the original Change, but I ultimately found the arguments and data in favor convincing enough to vote for it. As before:

+1

Has anything changed in the change proposal since? The wiki says "This page was last edited on 29 November 2022, at 19:58"

I don't feel comfortable re-voting this in the middle of the holiday season without even announcing this again on the devel list.

This Change must be implemented in a manner which packages are able to trivially opt-out of retaining frame pointers during compilation so that packages that take larger performance hits can easily revert

This goes without saying, +1.

...we evaluate whether to retain it by Fedora Linux 40...

I have 3 questions:

  • Who is we in this sentence?
  • How do they evaluate it?
  • Why so late?

Has anything changed in the change proposal since? The wiki says "This page was last edited on 29 November 2022, at 19:58"

We weren't sure if it was ok to update the wiki as the banner on it says the page was "preserved for historical record". Happy to do so if consensus is that we can reuse it for this (or make a new one if not). One notable update is the actual implementation I linked to in https://pagure.io/fesco/issue/2923#comment-833412

...we evaluate whether to retain it by Fedora Linux 40...

I have 3 questions:

  • Who is we in this sentence?

The Change authors and FESCo

  • How do they evaluate it?

Benchmarks, indications from reviews, and user+developer feedback

  • Why so late?

It gives appropriate time for the change to soak, for software to improve, and see how we look on the other side.

I'm the author/maintainer of Sysprof since I took it over for Søren Sandmann Pedersen years ago. It is a system-wide profiler that makes it very easy to see what is going on with your system.

It provides:

  • Callgraphs (based on stacktraces obtained via perf, previously via a custom kernel module).
  • A binary capture format which makes it very convenient to work with capture files to write custom tooling, unlike perf.dat or pprof.
  • Accessory collection of counters (cpu, net, mem, cpufreq, energy consumption, battery charge, etc), logs, files, and marks with visualizers for them.
  • A "control fd" that can be sendmsg()'d to peers or inherited by subprocesses w/ SYSPROF_CONTROL_FD=n set allowing them to request a ring buffer (typically per-thread) to send accessory information which is muxed into the capture file.
  • Memory profilers (via LD_PRELOAD) which use the above to collect allocation records and callgraphs to visualize temporaries, leaks, etc.
  • Integration with language runtimes like GJS (GNOME's SpiderMonkey used by GNOME shell and various applications) to use timers and SIGPROF to unwind/collect samples and mux into captures.
  • Integration with platform libraries like GLib, GTK, Mutter, and Pango which annotate the recordings with information about the GPU, input events, background operations, etc.
  • The ability to decode symbols at the tail of a recording and insert those mappings (including kallsyms) into the snapshot. This allows users to give me a .bz2 recording I can open locally without the same binary versions or Linux distribution.

This has been incredibly useful for us in GNOME because we can take someone who is having a problem, run Sysprof, and they can paste/upload the callgraph and we can address it quickly. It's a major contributor to why we've been able to make GNOME so much faster in recent releases.

Where this all breaks down is when we have poor/unreliable stack traces.

For example, most people want to view a callgraph upside down, starting from application's entry points. If you can only unwind a few frames, you're out of luck here because you can't trace deep enough to reach the instruction pointer for main() (or similar).

You can, of course, ask perf to give you some 8Kb of stack data on every sample. Sysprof does thousands of samples per second, so this grows pretty quickly and even more so the time to unwind it takes longer than reading this email. Nobody does this unless someone shows up with a
pile of money.

It's such a problem that I made GLib/GTK avoid using libffi marshalers internally (which has exception unwind data but no frame pointers) so that it wouldn't break Linux's frame-pointer unwinder.

Beyond that, we started building the Flatpak org.freedesktop.Sdk with -fno-omit-frame-pointers so that we could actually profile software when writing it. (what a concept!) GNOME OS also is compiled with frame pointers so when really tricky things come up, many of us just use that instead of Fedora.

Yes there are cases where leaf functions are missed or not 100% correct, but it hasn't been much of an issue compared to truncated stacks or stacks with giant holes at library boundaries.

It's not that frame pointers are great or anything, it's that reliability tends to be the most important characteristic. So many parts of our platform can cause profiling to give inaccurate results.

I'm not advocating for frame pointers, I'm advocating for "Works OOTB". Fixing Python and BoringSSL to emit better instructions in the presence of frame pointers is minuscule compared to the alternatives suggested thus far.

The necessity we have when doing desktop engineering is that a lot of problems occur in the interaction of components rather than one badly behaved component on it's own.

Seeing profiling data across all applications, which may already be running, but also may be spawned by D-Bus or systemd during the profiling session is a must. Catching problematic processes during their startup (or failure to startup) is critical.

That means that pre-computing unwind tables is inherently unreliable for us unless we can stall any processes until unwind tables are generated and uploaded for an eBPF unwinder. This may be possible, but in and of itself will skew some sorts of profiling results due to the additional latency.

I suspect that doing something like QEMU's live migration strategy may be an option, but again with all the caveats that it is going to perturb some sorts of results.

  1. Load eBPF program to cause all remapping of pages that are X^W to SIGSTOP. Notify an agent to setup unwind tables.
  2. Load unwind or DWARF data for all X^W pages mapped, generate system-wide tables
  3. Handle incoming requests for SIGSTOP'd processes
  4. Upload new unwind table data
  5. Repeat from #3

Even if this were a solution today, it has a number of situations that it flat out doesn't handle well.

  • Startup of new processes incur latency, which for some workloads relying on fork()/exec() may perturb results.

  • Static binaries are a thing now. Even beyond C/C++ both Rust and golang are essentially statically linked and increasing in use across all our tooling (podman, toolbx, etc) as well as desktop applications (gtk-rs).

This poses a huge issue. The amount of unwind data we need to load increases significantly because we can't rely on MAP_SHARED from shared libraries to reduce the total footprint.

Again, we're looking for whole system profiling here.

The tables become so large that they push out resident memory from the things you're trying to profile to the point that you're really not profiling what you think you are.

Containers, if allowed to version skew or if we're unable to resolve mappings to the same inode, present a similar challenge (more below in Podman and how that is a disaster on Fedora today).

Thankfully from the Flatpak perspective, it's very good at sharing inodes across the hard-link farm. However application binaries are increasingly Rust.

ORC overhead in the kernel comes in about 5Mb of extra memory at a savings of a few hundred Kb of frame pointer instructions. The value, of course, is that your instruction cache is tighter. Imagine fleets that are all static binaries and how intractable this becomes quickly. Machines at capacity will struggle to profile when you need it most.

  • Unwinding with eBPF appears to currently require exfiltrating the data via side-channels (BPF maps) rather than from the perf event stream. This can certainly be fixed with the right unwinder hooks in the kernel, but currently requires agents to not only setup unwind tables but to provide access to the unwind stacks too. My personal viewpoint is that these stacks should be part of the perf data stream, not a secondary data stream to be merged.

  • If we can't do this thousands of times per second, it's not fast enough.

  • If an RPM is upgraded, you lose access to the library mapped into memory externally as the inode and CRC has changed for a particular path. You can't built unwind tables, so accounting in that process breaks.

  • Podman does this thing (at least for user-namespace containers on Fedora) where all the image content is served via FUSE. That means when you try to resolve the page table mappings in user-space to find the binary to locate symbols from, you are basically out of luck.

Sysprof goes through great pains by parsing layers of Podman's JSON image information to discover what the mapping should have been.

We have to do the same for flatpak, but thankfully /.flatpak-info contains everything we need to do that symbol resolution.

Subvolumes and OSTree deployments further complicate this matter because of indirection of mounts/mountinfo. We have to resolve through subvol=* for example to locate the correct file for the path provided in /proc/pid/maps.

Again, since we need to build unwind tables up-front, this needs to be resolved when the profiler starts up. We can't rely on /proc/$pid/mem because there is no guarantee the section will be mapped or that we'll be able discover which map it was without the ELF header.

In a world without frame-pointers, you have very little luck at making profiles work here unless you can resolve all of these issues.

There has been a lot of talk about how we can do new unwinders, and that is great. But the issues above are real ones today and will become even bigger issues in the upcoming years.

Again, I don't think any of us like frame pointers, just that they are generally lower effort to make these Rube Goldberg machines work.

I guess this re-vote is because folks looked at a similar change ( https://fedoraproject.org/wiki/Changes/Add_FORTIFY_SOURCE%3D3_to_distribution_build_flags ) and thought this needed revisiting on the basis that that change was approved while this one was not? I agree there are similarities between the two, but there's also a lot of differences. The reason I voted -1 for this change in the last vote was mostly down to the fact that our tools team was not in favor. They were in favor of / are actually driving the FORTIFY_SOURCE-3 change. So I am still -1 here, but it's a weak -1 and I could be convinced if the tools folks were convinced to support the change or some other new information or compromise came along.

Metadata Update from @ngompa:
- Issue tagged with: meeting

a year ago

Since there was a -1, I've tagged it for the next FESCo meeting.

I also don't understand the "This Change is implemented for Fedora Linux 38 and we evaluate whether to retain it by Fedora Linux 40" part. If you're not confident that this change is a good idea, will solve more problems than it'll cause, and directly benefit Fedora users, then it should be incubated elsewhere, IMO. This is a Change Proposal, not a Try Something For Two Releases And Hope It Works Proposal.

This is a Change Proposal, not a Try Something For Two Releases And Hope It Works Proposal.

I'm pretty sure we've done stuff like this before as Changes/Features in the past. Sometimes on purpose, sometimes by accident. System Python is an example of a Change that went in and wound up getting reverted later.

If you're not confident that this change is a good idea, will solve more problems than it'll cause, and directly benefit Fedora users, than it should be incubated elsewhere, IMO.

The Change authors are confident it's a good idea and will solve more problems than it causes. But the compromise proposal was so that FESCo and the Change authors would be tracking the longer term effects explicitly and it would get reverted for Fedora Linux 40 in the unlikely scenario it didn't work out.

I also don't understand the "This Change is implemented for Fedora Linux 38 and we evaluate whether to retain it by Fedora Linux 40" part. If you're not confident that this change is a good idea, will solve more problems than it'll cause, and directly benefit Fedora users, then it should be incubated elsewhere, IMO. This is a Change Proposal, not a Try Something For Two Releases And Hope It Works Proposal.

I get what you're saying. However, the usual mechanism we have for Fedora Changes (i.e. evaluation deadline and contingency mechanism) don't really work here, since the effects of the change will only become apparent over time (i.e. after GA), and reverting it would require a mass rebuild - which isn't possible for stable branches. So, evaluating the change in Fedora 38 and 39 and possibly reverting for Fedora 40 is "just" pushing the evaluation time frame and contingency mechanism way back, since the Change would be near impossible to honestly evaluate otherwise.

And also, the change "not working" would in this case mean that there's a more noticeable slowdown, e.g. 3%, but things still work. I think that that most of us expect a much smaller slowdown, or maybe even no statistically significant slowdown, but we can't really check this before the mass rebuild. So we want to allow a full release after the mass rebuild to evaluate the effects. The difference in approach is that that changes which actually break things need to be reverted immediately, and with this one we can take things slow.

After learning more about the opt-out process in the meeting, I can jump on and add +1 here.

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/DC5D7TOTAYLQYJVY6XNFCKL57HZ3XS57/

https://meetbot.fedoraproject.org/fedora-meeting/2023-01-03/fesco.2023-01-03-17.01.html

  • AGREED: APPROVED (+6,1,-1) This Change is implemented for Fedora
    Linux 38 and we evaluate whether to retain it by Fedora Linux 40.
    This Change must be implemented in a manner which packages are able
    to trivially opt-out of retaining frame pointers during compilation
    so that packages that take larger performance hits can easily
    revert. (mhroncok, 17:20:38)
  • Change owners please coordinate the change with the Python
    Maintainers before changing the defaults (mhroncok, 17:21:20)

Metadata Update from @churchyard:
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

a year ago

This Change must be implemented in a manner which packages are able to trivially opt-out of retaining frame pointers during compilation so that packages that take larger performance hits can easily revert.

Can FESCO clarify, do package maintainers need a valid reason to opt out (e.g. proof of performance regressions) or can they just arbitrarily opt out?

Let me quote the packaging guidelines:

Adding to and overriding or filtering parts of these flags is permitted if there’s a good reason to do so; the rationale for doing so must be documented in the specfile.

https://docs.fedoraproject.org/en-US/packaging-guidelines/#_compiler_flags

@churchyard I guess what I'm really asking is: Soes "I think it will cause performance regressions" qualify as a good reason or is proof required?

The Change owners would also appreciate a heads up (though it's obviously not required) so that we can investigate and ideally address the reason for the opt out.

Soes "I think it will cause performance regressions" qualify as a good reason or is proof required?

I don't think a bare "I think" is a particularly good justification for any technical decision. If there really is a case where this causes a significant difference in performance, I'm sure the Change Owners and many other people would be very interested in an actual measurement so that they can dig into the details and maybe solve the issue.

Said. Big corporations won and all Fedora users will hurt. This is a VERY bad precedent.

@xvitaly Your comment is not helpful.

Your comment is not helpful.

Already rejected proposal was submitted because big corporations weren't happy with the results. This is a VERY BAD precedent for Fedora.

For the record, @xvitaly, it was not. It was because I wasn't happy with the results. And I pushed for this to benefit desktop Linux developers, who are generally operating on unfunded mandates to make the best experiences they can. I want to equip them with the best tools in Fedora, and this is a feature that both GNOME and KDE would benefit from in the course of development, profiling, and performance tuning.

Soes "I think it will cause performance regressions" qualify as a good reason or is proof required?

I don't think a bare "I think" is a particularly good justification for any technical decision. If there really is a case where this causes a significant difference in performance, I'm sure the Change Owners and many other people would be very interested in an actual measurement so that they can dig into the details and maybe solve the issue.

To this end, I just filed https://bugzilla.redhat.com/show_bug.cgi?id=IncludeFramePointers which we'll use to track packages opting out so things don't get lost (I'll put this in the docs as well).

Can the change proposal be updated to reflect this new decision?

Can the change proposal be updated to reflect this new decision?

I'm planning to, just waiting for bcotton to process the approval and update the banner on the wiki (right now the page is still marked as archived).

@tstellar Change proposal on the wiki has been updated

I guess this re-vote is because folks looked at a similar change ( https://fedoraproject.org/wiki/Changes/Add_FORTIFY_SOURCE%3D3_to_distribution_build_flags ) and thought this needed revisiting on the basis that that change was approved while this one was not? I agree there are similarities between the two, but there's also a lot of differences.

I don't think there are any similarities between the two proposals beyond the involvement of the tools folks due to it being their area of expertise. The benefit they provide are very different. The nature of slowdown is completely different.

it was my conjecture (as author of the feature) that there could be slowdowns in some corner cases due to _FORTIFY_SOURCE=3 and later I even proved that there's no general slowdown, through SPEC results that I shared on the devel list as well as in the proposal. The slowdown because of the -fno-omit-frame-pointer proposal is broad and shows up in every benchmark one would care to run.

The reason I voted -1 for this change in the last vote was mostly down to the fact that our tools team was not in favor. They were in favor of / are actually driving the FORTIFY_SOURCE-3 change. So I am still -1 here, but it's a weak -1 and I could be convinced if the tools folks were convinced to support the change or some other new information or compromise came along.

AFAIK, there has been no change in opinion among the tools folks (which includes me) on this. Using the _FORTIFY_SOURCE=3 change as the reason to reverse the decision on this proposal is a mistake and a misrepresentation of facts.

For context when we measured impact of -fstack-clash-protection -fstack-protector-strong -Wp,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS, we saw ~1.3% slowdown in SPEC2000 (SPECfp 2017 had one outlier test cam4 that ran 2.5x slower in our test due to what appears to be a hardware issue and excluding that one test, the results match with SPEC2000) and those mitigations provide significant security hardening in exchange for that 1.3%.

Given that -fno-omit-frame-pointer shows a 2+% slowdown all over on x86, the question for FESCo was if the tradeoff is worthwhile for being able to trace applications in production given the magnitude of the slowdown. FESCo is well within their rights to vote either way based on those facts; the question has nothing to do with _FORTIFY_SOURCE=3.

The way this was done is so wrong:

  • There was a vote. You were not happy with the outcome.
  • So you first tried to complain in the original ticket (#2817) about this. It was clear that the consensus in that ticket was to not reconsider at this time.
  • So instead, you filed up a new ticket (this one), with no public discussion (e.g., on the mailing list), and also with no link in the original ticket, thereby excluding participants of the existing discussion that you lost (who did not get any notification that the decision was being reconsidered before it was too late and hence no chance to chime in).
  • And then you expedited the issue to the next meeting, only 4 days after it was opened, again precluding any kind of discussion or feedback.
  • You apparently also did not even invite the toolchain team, the experts on this issue, since judging from @fweimer's reply on the mailing list and from @siddhesh's reply above, they were caught by surprise by this sudden complete reversal just as completely as I was.
  • I also do not see anything that has changed since this was last discussed.
  • And as in the last discussion, I still believe that 2 releases, i.e., a whole year, is way too long an evaluation period. If anything, this needs to be evaluated in Rawhide only with the option to revert it with a mass rebuild before feature freeze. It does not make sense to ship pessimized code to end users for a whole year.

When the Change proposal came in for FORTIFY_SOURCE=3 (which introduces
similar pressure), the resulting discussion prompted the re-vote.

I do not see how this is a fair comparison because FORTIFY_SOURCE is a security feature whereas this one is not. Of course, stricter FORTIFY_SOURCE comes at a performance penalty, but it improves security for end users. Frame pointers, on the other hand, bring no value whatsoever to end users, only to developers.

In addition, if you believe the penalty is too high, then the outcome should have been to reconsider the FORTIFY_SOURCE=3 change instead.

Coming back from vacation seeing this new issue to reverse the decision made on ticket #2817 only being announced on the mailinglist on the day of the vote without including various people involved with the original issue seems really odd. Especially given that it is apparently justified by accepting a completely different issue, which is really not comparable at all, as already explained by @siddhesh and others, who apparently also weren't included on this ticket.

There was an agreement on moving forward without frame pointers. And as far as I could see there was even an healthy discussion about alternative ways to get faster and more accurate unwinding/backtracing between the profiling and compiler/tools hackers.

I don't mind if you would re-try to get this change in for f39 or f40 if it turns out those discussions about alternative unwinders didn't result in faster/better profilers.

But trying to do it while multiple stackholders were away and unaware of this because it wasn't really announced doesn't feel good.

And as far as I could see there was even an healthy discussion about alternative ways to get faster and more accurate unwinding/backtracing between the profiling and compiler/tools hackers.

There was no useful discussion of alternatives. In fact, the toolchain/compiler people insulted the performance people and the Change authors on the mailing list and in redhat-rpm-config PRs.

I don't mind if you would re-try to get this change in for f39 or f40 if it turns out those discussions about alternative unwinders didn't result in faster/better profilers.

Nothing was going to come from it. It didn't in the past decade, it wasn't going to in the next year.

But trying to do it while multiple stackholders were away and unaware of this because it wasn't really announced doesn't feel good.

I'm sorry about not putting a link in the original ticket. It did not occur to me to do that. But I will not apologize for filing this when I did. Our schedule makes it so the deadlines for these things land during the holidays.

The only people we hadn't heard from in FESCo meetings past and on the mailing list were people who actually use frame pointers. We got those people commenting in #2817, this ticket, and one such individual showed up for last Tuesday's FESCo meeting.

In addition, if you believe the penalty is too high, then the outcome should have been to reconsider the FORTIFY_SOURCE=3 change instead.

The thing is, I don't believe the penalty is too high. I actually believe the opposite: we value computer performance too much against human performance. Human energy is considerably more expensive and we should attempt to accommodate any means of reducing the effort to improve the quality of software as long as the computer performance hit isn't too high. The only place where it was too high was Python, and we've remediated that.

For context when we measured impact of -fstack-clash-protection -fstack-protector-strong -Wp,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS, we saw ~1.3% slowdown in SPEC2000 (SPECfp 2017 had one outlier test cam4 that ran 2.5x slower in our test due to what appears to be a hardware issue and excluding that one test, the results match with SPEC2000) and those mitigations provide significant security hardening in exchange for that 1.3%.

Given that -fno-omit-frame-pointer shows a 2+% slowdown all over on x86, the question for FESCo was if the tradeoff is worthwhile for being able to trace applications in production given the magnitude of the slowdown. FESCo is well within their rights to vote either way based on those facts; the question has nothing to do with _FORTIFY_SOURCE=3.

Actually, this demonstrates my point quite well about why I was upset about this. Taking a ~1.3% hit for security is "reasonable", but taking a ~2% hit for tracing/inspection/observability was considered "unreasonable". In my view, we were basically saying human effort is free or not worth consideration, which is definitely not true.

The case that changed my mind about this in the original discussion was @hergertme and @catanzaro laying out why it's so hard for GNOME's performance to improve. And I found corroborating information on the KDE side. That is what swayed my opinion on this. As a community, we already underinvest in desktop Linux because there is little commercial interest in it. The fact that desktop Linux developers are hamstrung from reaching performance and responsiveness levels similar to Windows and Mac desktops by the lack of this stuff is galling. That we've known about the problem for over a decade and done nothing about it is even worse.

I also did my own background research as part of looking into this for game development and found out how much this is used for improving performance of video games on other platforms. The lack of it on Linux means that this kind of analysis is impossible to do and leads to Linux ports not being able to be as performant as their Windows counterparts. Game development is already pretty hard, making it nearly impossible to make native games performant because the observability is not there is just rubbing salt on the wound.

Maybe none of the compiler people care about these things. I guess to them, their role ends as soon as the code is released. But anyone who cares about user experience should care about this stuff, IMO.

There was no useful discussion of alternatives.

There was. There was discussion on what could be done to unwind in the kernel, etc. There was no drop-in alternative that works right now, but there was discussion on what could be done to resolve this.

In fact, the toolchain/compiler people insulted the performance people and the Change authors on the mailing list and in redhat-rpm-config PRs.

Where have the toolchain/compiler people insulted anybody?

If you have issues with my and/or @xvitaly's comments: neither of us is a Fedora toolchain person. (I do have experience in toolchain development from my volunteer work on TIGCC, but I am not part of the Fedora toolchain team.)

I don't mind if you would re-try to get this change in for f39 or f40 if it turns out those discussions about alternative unwinders didn't result in faster/better profilers.

Nothing was going to come from it. It didn't in the past decade, it wasn't going to in the next year.

The people at stake apparently were not previously aware that this was an issue, or how bad the issue was. (In their view, backtraces without frame pointers were a solved problem.) The discussion has changed that. You should have given the toolchain people some time to come up with a solution.

Enabling frame pointers, on the other hand, is the best way to ensure that people will not be motivated on working on something better, because frame pointers "work". At the expense of everyone else.

I'm sorry about not putting a link in the original ticket. It did not occur to me to do that. But I will not apologize for filing this when I did. Our schedule makes it so the deadlines for these things land during the holidays.

I still do not see why that deadline mattered at all, i.e., why this could not have waited for Fedora 39. The goal of deadlines is to ensure that disruptive changes go in early in the schedule, not to rush them in just before the deadline ignoring any kind of feedback period guidelines. Feedback periods, such as the one-week minimum discussion on the mailing list for every change proposal, also exist for a reason. So, why was it not possible to rediscuss this with everyone at stake and then make a decision for Fedora 39, not 38?

The only people we hadn't heard from in FESCo meetings past and on the mailing list were people who actually use frame pointers. We got those people commenting in #2817, this ticket, and one such individual showed up for last Tuesday's FESCo meeting.

But the people opposed to frame pointers had no chance to respond to the changed situation, and in particular, to point out that the comparison to the FORTIFY_SOURCE change that you used as justification is not a fair comparison.

Actually, this demonstrates my point quite well about why I was upset about this. Taking a ~1.3% hit for security is "reasonable", but taking a ~2% hit for tracing/inspection/observability was considered "unreasonable". In my view, we were basically saying human effort is free or not worth consideration, which is definitely not true.

But that is not what we are saying. What we are saying is that it makes sense to take a performance hit for all end users for something that actually matters to end users (security), but not for something that only matters to developers (profiling), which can be addressed by shipping rebuilt packages to developers specifically. We do not and should not ship debug binaries to end users, it just does not make sense. They can easily be opted in. (As I stated: it can be as easy as sudo dnf --enablerepo=framepointers distro-sync with no changes to client-side tooling and only minimal scripting (mass rebuild script that appends a repotag to Release) on the server side.)

Maybe none of the compiler people care about these things. I guess to them, their role ends as soon as the code is released. But anyone who cares about user experience should care about this stuff, IMO.

The job of the compiler people is to inform about the broad slowdown and give their recommendation. Thanks to me being upset about folks trying to derail my _FORTIFY_SOURCE=3 change proposal as the false pretext to reopen this discussion, we even got numbers for slowdown due to all security flags (and also showed that _FORTIFY_SOURCE=3 has no broad slowdown) so that there's more objective data to make what is unfortunately a subjective decision.

At a 2-10% performance degradation for profiling support you're essentially proposing to choose one set of users over another, you can't pretend that it's going to make all users happy. With security on the other hand, all users benefit so there's no real comparison.

In such cases, instead of rushing to a decision I personally prefer to maintain status quo and continue discussing alternatives.

@ngompa: Oh, and this:

Actually, this demonstrates my point quite well about why I was upset about this. Taking a ~1.3% hit for security is "reasonable", but taking a ~2% hit for tracing/inspection/observability was considered "unreasonable".

is also untrue because Vladimir Makarov's measurements, quoted by @siddhesh, actually show that:

Please note the performance update in the wiki page. Vladimir Makarov from my team ran SPEC2000 and SPEC2017 and found no difference in performance between _FORTIFY_SOURCE=2 and _FORTIFY_SOURCE=3. In fact, some tests in SPEC2000 ran slightly faster with _FORTIFY_SOURCE=3, which I'd love to study in future since it goes against what I had hypothesized.

The 1.3% you are talking about are compared to no _FORTIFY_SOURCE at all, which is a configuration Fedora has not been shippping for decades.

So the _FORTIFY_SOURCE=3 change will have no performance impact, whereas this one will have at least a 2% performance hit on Blender and GCC (unless they opt out), with very little data available on other real-world applications.

At a 2-10% performance degradation for profiling support you're essentially proposing to choose one set of users over another, you can't pretend that it's going to make all users happy. With security on the other hand, all users benefit so there's no real comparison.

That's not true either. Every security change has a cost, because it changes how code is compiled and creates breakage for code compilation. We've just broadly accepted that trade-off doesn't matter.

And if all compilations or all 3D renderings take 2% longer, that also wastes real people's time.

There was no useful discussion of alternatives.

There was. There was discussion on what could be done to unwind in the kernel, etc. There was no drop-in alternative that works right now, but there was discussion on what could be done to resolve this.

The problem is that while possible alternatives for improving profiling were discussed, none of them exist or are usable yet, while using frame pointers already works. This was the fact that ultimately made me change my mind.

Once (or rather, if) any of the alternatives that don't rely on frame pointers become viable at some point in the future, we can discuss switching to that alternative, but until then, I'd rather have better support for doing profiling on Fedora than doing nothing to improve the situation.

Human energy is considerably more expensive and we should attempt to accommodate any means of reducing the effort to improve the quality of software as long as the computer performance hit isn't too high.

But the most human-resource-effective way to improve overall system performance is never to profile and manually optimize individual applications, but to work on compiler optimization, as the toolchain team has been doing all this time. And this change actually explicitly undoes one such optimization.

At a 2-10% performance degradation for profiling support you're essentially proposing to choose one set of users over another, you can't pretend that it's going to make all users happy. With security on the other hand, all users benefit so there's no real comparison.

Even if we don’t all agree on what the correct choice was, let’s at least agree on the facts. The typical impact in benchmarks was 0.5%–2.5%, not 2%-10%. (Here I’m ignoring crypto benchmarks, which showed ~0% impact, likely dominated by hand-written assembly inner loops.) Python was indeed at 10%, but someone dug into this, explained why the main interpreter loop was disproportionately impacted, and made a reasonable case that we can expect this to be a real outlier rather than the tip of an unseen iceberg of larger performance regressions. I apologize that I don’t have a link to that comment handy at the moment.

I don't mind if you would re-try to get this change in for f39 or f40 if it turns out those discussions about alternative unwinders didn't result in faster/better profilers.

Nothing was going to come from it. It didn't in the past decade, it wasn't going to in the next year.

The people at stake apparently were not previously aware that this was an issue, or how bad the issue was. (In their view, backtraces without frame pointers were a solved problem.) The discussion has changed that. You should have given the toolchain people some time to come up with a solution.

Enabling frame pointers, on the other hand, is the best way to ensure that people will not be motivated on working on something better, because frame pointers "work". At the expense of everyone else.

Sorry no, I know that this is not true. I'm aware that there have been discussions with them with years. Additionally, the eBPF/bpftrace people also complain about this because they rely on frame pointers too. It's been a well-documented hole in real-time observability. That's why all Go code has had frame pointers since 2016.

Did you really think this came from nowhere? The only reason the pressure is so high now is because we're being out-competed on this stuff.

IMHO, the best solution would be to stuff a DWARF unwinder into the kernel whether Linus wants it or not. It is not like we have never shipped downstream kernel patches where they proved useful. Then perf can just do the DWARF unwinding directly in the kernel with little to no performance hit and nobody will need frame pointers anymore.

(And I'm not saying it should be coded in BPF, I'm saying it should be coded in C and plugged straight into the kernel backtracing code. I know Linus has vetoed that upstream, so ship it as a downstream patch.)

That will never happen. Nobody wants to write kernel code that they know will never have a chance landing upstream unless they absolutely have to. And we don't.

Well, Linus has already had to backpedal on some issues due to public pressure. And there is a whole bunch of downstream kernel patches out there. Even more if you include all the out-of-tree modules (e.g., v4l2loopback).

And well, they would absolutely have to if Fedora were not so willing to accept a performance hit for all users just because some people do not like the most efficient solution for political reasons.

Even more if you include all the out-of-tree modules (e.g., v4l2loopback).

This is a bad example for your point, since it resulted in PipeWire developing a completely different mechanism for cameras that things are shifting over to so they can get what that module did.

And well, they would absolutely have to if Fedora were not so willing to accept a performance hit for all users just because some people do not like the most efficient solution for political reasons.

Do you really think so? I'm quite skeptical. I only know of one example of where Linus eventually gave in, and it was pretty much because the Linux world was at gunpoint over it.

I mean, they would absolutely have to implement the patch downstream if Fedora were not so happy to accept the performance-killing alternative. Whether it would eventually end up upstream or not is another question. Though, if all distros ship it because it is the only way to get reasonable profiling, I think upstream will not have much choice, and even if Linus keeps strict, why would it matter if all the major distros ship the patch anyway?

And PipeWire doing the camera emulation in userspace will never be a complete replacement for the v4l2loopback kernel module because the latter can be used to simulate a camera to all applications, even proprietary ones, even if they deliberately do not use abstraction layers because they want to talk to a physical camera. It may work for Intel's proprietary camera drivers (sadly, since those are exactly the reason the kernel module was not accepted to begin with), but not for the more interesting use cases.

Even if we don’t all agree on what the correct choice was, let’s at least agree on the facts. The typical impact in benchmarks was 0.5%–2.5%, not 2%-10%. (Here I’m ignoring crypto benchmarks, which showed ~0% impact, likely dominated by hand-written assembly inner loops.) Python was indeed at 10%, but someone dug into this, explained why the main interpreter loop was disproportionately impacted, and made a reasonable case that we can expect this to be a real outlier rather than the tip of an unseen iceberg of larger performance regressions. I apologize that I don’t have a link to that comment handy at the moment.

Please update the proposal with the thread where there's agreement that the python related slowdown is an outlier; the proposal currently quotes the performance impact on CPython being 1-10%.

Please update the proposal with the thread where there's agreement that the python related slowdown is an outlier; the proposal currently quotes the performance impact on CPython being 1-10%.

@anakryiko analysis on Python is linked at the end of https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer#Benchmarking_of_the_performance_impact

@kkofler said:

Feedback periods, such as the one-week minimum discussion on the mailing list for every change proposal, also exist for a reason.

I agree on the importance of feedback periods. I think the mandatory feedback period should exist for both new changes and reconsidered (e.g. this change) or updated (e.g. DNF5) changes.

Please update the proposal with the thread where there's agreement that the python related slowdown is an outlier; the proposal currently quotes the performance impact on CPython being 1-10%.

@anakryiko analysis on Python is linked at the end of https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer#Benchmarking_of_the_performance_impact

Thanks, that's an accurate analysis of the slowdown, but I'm not sure if the conclusion that _PyEval_EvalFrameDefault is an outlier directly follows from it. The conclusion sounds like "that's not how code should be written so it's an outlier", which is very different from "that's not how code is typically written" because the latter is definitely not true. I've seen plenty of humongous functions with ridiculous nesting in my lifetime; heck I have a t-shirt with a single if condition that spans a dozen lines :)

The other argument of pure python matrix multiplication (vs offloading to c/c++ libraries) being bad programming practice also depends on how code should be written but one could defer to the python community's consensus on that. What is the python community's position on punishing the performance of this code pattern for the purpose of improved profiling?

There was no useful discussion of alternatives. In fact, the toolchain/compiler people insulted the performance people and the Change authors on the mailing list and in redhat-rpm-config PRs.

Do you have some specific examples of when this happened?

To be clear, Python is opting out of this Change for 3.11 (see https://src.fedoraproject.org/rpms/python3.11/pull-request/95). @daandemeyer has also posted a writeup with the wider Python community to gather additional feedback on potential improvements down the road.

Hi Neal,

On Mon, 2023-01-09 at 13:42 +0000, Neal Gompa wrote:

And as far as I could see there was even an healthy discussion
about alternative ways to get faster and more accurate
unwinding/backtracing between the profiling and compiler/tools
hackers.
=20
There was no useful discussion of alternatives. In fact, the
toolchain/compiler people insulted the performance people and the
Change authors on the mailing list and in redhat-rpm-config PRs.
=20
=20

That is not my experience. But I might have missed someone insulting
someone else. Lets not do that. My INBOX is full of emails where people
are discussing technical solutions to doing fast backtraces using
eh_frame unwinders. Now admittedly those discussions are not all on
fedora mailinglists. But I think this isn't really a fedora specific
issue. This is something that needs to be solved upstream.

I don't mind if you would re-try to get this change in for f39 or
f40 if it turns out those discussions about alternative unwinders
didn't result in faster/better profilers.
=20
Nothing was going to come from it. It didn't in the past decade, it
wasn't going to in the next year.=20
=20
=20

I don't know why you are saying that. It certainly is on several
people's lists for 2023. It is on mine. I do have some exeprience with
eh_frame unwinders, in kernel using systemtap, and fast ones in
valgrind. I am actually happy we are taking a new fresh look at them
now. There are also improvements in both the upcoming glibc and gcc
versions that will improve performance of taking backtraces. Maybe we
aren't there yet, but people are certainly working on it.

But trying to do it while multiple stackholders were away and
unaware of this because it wasn't really announced doesn't feel
good.
=20
I'm sorry about not putting a link in the original ticket. It did not
occur to me to do that. But I will not apologize for filing this when
I did.

I get that you don't want to apologize when people are saying you did
things deliberately. Lets assume good intent. But you did invent a
whole new process. Which in hindsight took a lot of people by surprise.
Making them feel like they were cut out of the discussion and decission
making process.

Lets just acknowledge that the process was flawed and we'll need to
rethink how to handle when FESCO has decided on a Change Proposal and
someone wants to resubmit that same Change Proposal for the the same
Fedora release. IMHO it cannot just be that a new ticket is created
without at least updating the Change Proposal, involving the people
subscribed to the original Change Proposal FESCO ticket and resetting
the discussion periods.

Now we have a flawed proposal that is still being discussed and needs
to be corrected and updated in the middle of people preparing and
testing a mass rebuild. Lets just back it out for now. So that we can
first discuss how this kind of "revote" in the same fedora release
cycle works.

Cheers,

Mark

We have two FESCo meetings before the mass rebuild starts, so I think there is still sufficient time for further discussion. Nothing is truly locked in until the mass rebuild begins.

https://meetbot.fedoraproject.org/fedora-meeting/2023-01-10/fesco.2023-01-10-17.00.html

Let's exclude i686 and s390x completely from the fnoomit frame pointer thing for F38 (+7,0,-0) (mhroncok, 17:57:16)

https://meetbot.fedoraproject.org/fedora-meeting/2023-01-10/fesco.2023-01-10-17.00.html

Let's exclude i686 and s390x completely from the fnoomit frame pointer thing for F38 (+7,0,-0) (mhroncok, 17:57:16)

Put up https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/235 for this.

Frame pointers seem to break the unwinder on powerpc64le, but looking into details
makes me wonder why anybody would want -fno-omit-frame-pointer on ppc64le when on that arch there is a mandatory stack pointer based backchain in the ABI, which is the only thing one can reliably walk. See https://pagure.io/releng/issue/11218#836140 for some details.
The ABI even doesn't have a standard frame pointer register (r31 is just one of many registers that gcc chooses to use as frame pointer) and certainly doesn't have a standardized save slot in the frame. I don't see how any profilers could use anything but the stack pointer and its saved 0(1) backchain and 16(1) saved link register for the non-DWARF unwinding.

FTR when we approved to revert this on s390x and i686 we've done so under the impression that the flag is no-op on aarch64 and ppc64le.

Considering it is not true, I'd revert on ppc64le as well.

I agree. It looks like the ppc64le situation was an oversight, and it should probably be opted out as well.

As for aarch64, there indeed -fno-omit-frame-pointer is on by default unless explicitly overridden with -fomit-frame-pointer (but the question is why to add it explicitly when it is the default already). Unlike the ppc64le one can't unwind without DWARF or other unwind info just from the stack pointer though.

I agree with @churchyard and @decathorpe. Thank you for bringing this up.

Thanks for surfacing this. I put up https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/241 to exclude ppc64le for the time being, and we can revisit things for f39.

Login to comment on this ticket.

Metadata